Visualization Grammar — VEGA

VEGA:

A visualization grammar, a language for:
* — creating, saving and sharing interactive visualization designs
* — describe visual appearance and interactive behaviour of visualization in json
* — reactive signals that dynamically modify a visualization in response to input
event streams

Key Semantics:

The key semantics are:
* — width, height, padding, autosize (all are for specifying the size of the
visualization)
* — data (an array of data definitions, can define type, name, stream, url, and values of the
data type)

  • — scales (Configurations for as to map columns of data to pixel positions or
    colors, or type of representation(for ex: categorical==> bands etc)).

  • — axes (Configuration of axes)

  • — marks (Graphical primitives, which are used to encode data. Has properties
    position, size, shape, color. Examples are: dot, circle, rectangle(bar-chart),
    star etc..)

  • — Have sub properties encode which marks the graphical primitives
  • — Encode’s Sub property enter and exit configure interactive parts when
    the mark is added or removed.
  • — marks sub property hover, update configure overall interactive parts
  • — each of the hover, update properties can be triggered/linked to signals
    and changed accordingly
  • — A special type of mark called group is present and can contain other
    marks(for composition of graphical primitives to create complex ones)

  • — signals (act as dynamic variables, or as event-listeners to use js parlance)

  • — Has sub property event streams
  • — Can set dynamically evaluated variables as values on events as
    defined.
  • — Events can be mouse over, mouse out, click,drag etc..
  • — Event streams
  • — Has sub properties source, type, marktype, between, consume, filetr etc.
  • — Each sub property decides which mark to change/update, based on which
    event type/user-action/data-change
  • — Event streams also have CSS-style selectors

  • — Legends

  • — Can create legends for the visualizatinos
  • — customize them with sub properties type, orient, fill, opacity, shape

  • — Transforms

  • — As the name implies it can transform data streams
  • — Has sub properties ilke filter, stack, aggregate, bin, collect, fold,
    impute etc

Share: Harry Potter and the Methods of Rationality

There was a legendary episode in social psychology called the Robbers Cave experiment. It had been set up in the bewildered aftermath of World War II, with the intent of investigating the causes and remedies of conflicts between groups. The scientists had set up a summer camp for 22 boys from 22 different schools, selecting them to all be from stable middle-class families. The first phase of the experiment had been intended to investigate what it took to start a conflict between groups. The 22 boys had been divided into two groups of 11 –
– and this had been quite sufficient.
The hostility had started from the moment the two groups had become aware of each others’ existences in the state park, insults being hurled on the first meeting. They’d named themselves the Eagles and the Rattlers (they hadn’t needed names for themselves when they thought they were the only ones in the park) and had proceeded to develop contrasting group stereotypes, the Rattlers thinking of themselves as rough-and-tough and swearing heavily, the Eagles correspondingly deciding to think of themselves as upright-and-proper.
The other part of the experiment had been testing how to resolve group conflicts. Bringing the boys together to watch fireworks hadn’t worked at all. They’d just shouted at each other and stayed apart. What had worked was warning them that there might be vandals in the park, and the two groups needing to work together to solve a failure of the park’s water system. A common task, a common enemy.
Harry had a strong suspicion Professor Quirrell had understood this principle very well indeed when he had chosen to create three armies per year.

Kallada travels… Unbelievable bad service.

I am traveling by kallada travels today to my hometown. It’s a nonA/C sleeper. And it is expected to leave Madiwala by 8.45 pm. It left around 9.00 pm…Hmm play 15 minutes delay is understood with Bangalore traffic.
However we’ve been waiting at Chandapura for an hour for two more passengers…It is currently 23:16 hrs and they are now moving after picking up those 2 passengers..

This is just fucking lousy, careless, arrogant service. All questions to the guys (apparently in-charge) are deflected with a boss or manager says so…

Never taking these travels again..And redBus might as well block these travels.

My stock market –dabblings.

Over the last couple of years, I’ve been dabbling, and really just buying on impulse and random
reading online stock tips and forums. At the year-end while filing taxes and tallying up I realized(not
surprisingly, I might add) I’ve lost money(Thanks to the bull market, only little).
Which is when I realized, I’ve been half-assing the amount of research, I should do before investing
in stock market, and what’s worse, I’ve been falling prey to the fallacy ” a little knowledge is a
dangerous thing”.
So this is an attempt to hide the crime and in the process, build a system to avoid committing the
crime in the future.

Before I begin, some of the sources, I’ve been half-assing for research,(but good sources
nevertheless) are:

This is a first of a series of posts:

Most of the data, I used(and will use for the series) in the following analysis was picked up from investr(thanks to r/hapuchu,
for sharing the data), but can be picked up by crawling the webpages of companise for quarterly
and/or annual reports, and then parsing the pdf to consume them.

 

Some Caveats and Exceptions:

  • –I’m writing this around the end/last week of february.
  • — These are all stocks I traded in starting in 2nd half of 2015.
  • — I’ve done some stock investment during the 2006-2009, made some money, but due to
    bad(nah had no clue about it)
    portfolio/cashflow management, had to sell a bunch of them in 2009, which put overall
    returns negative and stopped trading, leaving whatever was left. But learnt the lesson, not to put any amount of money I’m not ok with losing into stocks.

  • — I’m working in the IT industry, and have spent some spare-time reading Finance, but
    nowhere near dedicated or focussed. (Not sure that kinda reading is good.)

  • — Most of the energy stocks are from when I decided I’ll go thematic on renewable energy
    and bought them, but lost patience/nerve when the stocks went down and eventually sold off

Direct-Equity[^1] Portfolio Opinions:

  • Way too many stocks
  • Way too disorganized and unfocused and under researched
  • Not enough Focus on long-term(think companies that’ll stay for > 100 years)
  • Balance long-term (black-swan)focus with dividend-based focus(For ex this)

Ok here’s a list of the stocks I’ve traded:

Reasoning:

  • I’ve seen the share price of this it has been hovering around 1000 for the
    10 years I’ve seen this stock, so this can be part of a stable portfolio.

Flaws:

  • This is absurd, as for all I know, the stock could have split 10 times in
    those 10 years, which means the stock has risen or merged, which means it
    has fallen. I haven’t checked it but the point is that it is a fallacy.

Outcome:

  • It has fallen a little bit, but I’m still keeping it and might even buy
    more.

  • Geometric Ltd.

Outcome:

Reasoning:

  • Not much, just that I’ve like Infy stocks in the past, and they seemed to
    be
    on a downtick

Flaws:

  • Well it’s just an impulse buy, and not better than gambling.

Outcome:

  • It has recovered and got back to trading better, but that’s just luck.

 

Reasoning:

  • Can’t even remember, where but it was some analyst rating and read

Flaws:

  • Belief in expert fallacy.

Outcome:

  • Loss. Panicked and sold at 200. It seems to be doing a little better, but
    even now it would be a loss for me to sell, however the system of
    investment was wrong.

  • HDFC Bank

Outcome:

Outcome:

Reasoning:

  • Can’t even remember, where but it was blog/reddit thread

Flaws:

  • Belief in expert fallacy.
  • Belief in crowd decision fallacy??

Outcome:

  • Up by about 20% lucky bull ride

Reasoning:

  • Analyst recommendation

Flaws:

  • belief-in-Expert-bias

Outcome:

  • Down by about 1/7th (Caveat: Down only because they had a rights issue
    that I missed)

  • Manappuram Finance

Outcome:
* Up by about 1/8th

Outcome:

Outcome:

  • Gained a bit.(can’t be bothered to track sale date and say how much)

  • Orient Green Power

Outcome:
* Lost a bit.

Outcome:
* Lost money

Outcome:
* Gain a little bit

Reasoning:

  • I’ve had good experience buying it during the IPO and making money(
  • I’ve a bias for the non-renewable sector ‘s future prospects.

Fallacy:

  • Using inductive reasoning when there’s no reason(IPO is different from regular trading)
  • Prior Bias (Ideally should have built a prediction, and accounted for non-renewable energy’s future bias I have)

Outcome:
* Lost a fair amount of money

Outcome:

Outcome:

Reasoning:

  • Building a dividend portfolio  and saw good ratings about it on investr’s magic formula
  • Bought a scooter and decided, I could buy some auto stocks

Outcome:

Reasoning:

  • I was in a automobile theme, and maruti has a big brand in India
  • Also was thinking of future plans for a car, and maruti was an automatic pick

Flaws:

  • It has a relatively higher P/E

Outcome:

Outcome:

Reasoning:

  • Was in a automobile theme,
  • Bought a TVS Scooter

Outcome:

Reasoning:

  • I wanted to pick up some in airlines(theme idea) and found that indigo has
    a high P/E so picked up spice jet(based on investr score)

  • Dabur India

Outcome:

Outcome:

Outcome:

Outcome:

Outcome:

Outcome:

Outcome:

  • Slightly up

Moral/TODO:

  • Shorten the number of stocks and focus the money into a few
  • Build a internal system for analyzing companies before invest in the future

[^1] — I might eventually broaden the scope of blog posts, but don’t expect it for a looong, loong time(count in decades)…

igot– > not exactly fraudster..(just quasi or almost)

TL; DR: The company igot is not exactly a fraudster, but any money/BTC you transact through them, is not expected to be paid out in anything close to reasonable time period.(aka mine’s pending since July 2016, for other examples see here.).

The company igot is a bitcoin exchange for people wanting to buy, sell, trade in bitcoin(BTC) with whatever currency they would like to.trade in.

 

Back in 2015, I had bought some bitcoins and was trading in them for a while selling when it rose, and buying back when it fell..  Eventually I had around 1.1 BTC, half of which I decided to cash-in sometime around July 2016, and sold and initiated a transaction with igot to debit the money to my account.

 

After a week of no debit transaction on my bank account I raised a support ticket hoping to get some resolution, but that was not responded to for a month, and only got a response once I threatened to go to the banking ombudsman in india(an authority for mediating complaints against banks.) . transactions_feb_24_2017 supportrequest2 supportrequest1 of the same.

Now this support request (and a email) response promises to start processing pending transaction from September 2016 and End by November 2016.

And in october I raised the ticket again and supportrequest2of vague, promise that never materialized.

 

At this point, I give up and wait for november to roll in and when it does, I try to contact them, but the support ticket/menu option had vanished. So I go back to the mail thread and mail them back with the following result. email_nov_3

And this just promises more in a few days time beginning the transaction processing. Note how the previous promise said the transactions would have been done by this time(Nov. 1- 2016).

Also note how the tone of the email has changed from (apologetic-sorry-promise to we-don’t-care-if-you-want-to-go-legal).

This is when I started realizing, that may be I’ve been dealing with dishonest, don’t-care-about-clients type of businessmen/management.

I do not know what is the right future action to take but I’m stuck for now with this blog post. Atleast for any one else, googling to evaluate the company, don’t do it. They are not reliable enough people to route your money through.

 

UPDATE: Ok I give up. They’re just frauding people and failing to communicate to old customers all together. Most likely because have no intentions of paying back the old customers. Seems like now they have a new website and page. For a long time, they’ve been talking about a resolution centre for old customers, but now they launch a complete new website. 

I just have no idea how to take them to task. Anyone with cyber crime division in India contact me please.

Statistical moments —

Inspired by this blog from paypal team. Moment is a physics concept(or atleast I encountered it first in physics, but it looks it has been generalized in math to apply to other fields).

If you followed that wikipedia math link above, you’ll know the formula for moment is
\mu_n = \int\limits_{-\infty}^{+\infty} (x-c)^n f(x)\ dx\
where x — the value of the variable
n — order of the moment (aka nth moment, we’ll get to that shortly)
c — center or value around which to calculate the moment.

However if you look at a few other pages and links they ignore that part c.. and of course use the summation symbol.**

The reason they don’t put up ‘c’ there is they assume moment around the value 0. As we’ll see below this is well and good in some cases, but not always.

The other part n- order of the moment is an interesting concept. It’s just raising the value to nth power. To begin with if n is even the the negative sign caused by differences goes away. So it’s all a summary and becomes a monotonically increasing function.

I usually would argue that the ‘c’ would be the measure of central tendency like mean/median/mode and a sign of fat-tailed/thin-tailed distributions is that the moments will be different if you choose a different c and the different moments change wildly.

The statsblog I linked above mentions something different.

Higher-order terms(above the 4th) are difficult to estimate and equally difficult to describe in layman’s terms. You’re unlikely to come across any of them in elementary stats. For example, the 5th order is a measure of the relative importance of tails versus center (mode, shoulders) in causing skew. For example, a high 5th means there is a heavy tail with little mode movement and a low 5th means there is more change in the shoulders.

Hmm.. wonder how or why? I can’t figure out how it can be an indication of fat-tails(referred by the phrase importance of tails in the quote above) with the formula they are using. i.e: when the formula doesn’t mention anything about ‘c’.

** — That would be the notation for comparing discrete variables as opposed to continuous variables, but given that most of real-world application of statistics uses sampling at discrete intervals, it’s understandable to use this notation instead of the integral sign.

Types of Statistical Regression Models

Inspired by this and this

Types of Regression:
Attributes of a regression model:
There are 4 key attributes:

1. No. of Independent variables
2. Type of Dependent variables
3. Shape/Curve of the Regression line
4. No. of Dependent variables

Disclaimer: This is by no means exhaustive. Just an attempt to write around key techniques

  1. Linear —

* – Shape of regression==> straight line.
* – (no. of variables is assumed 1 independent vs 1 dependent)
* – Type of output/dependent variable(Numerical)
* – Type of input/independent variable(Numerical)
* – More variables can be used by collapsing them into one(some formula like weighted sum)
* – Or just by adding more variables to the line eqn(and modifying the convergence algorithm): Y = a + b * x1 + c * x2 + …. + xn

2. Logarithmic —

* – Shape of regression ==> exponential/logarithmic curve
* – (no. of variables is assumed 1 independent vs 1 dependent)
* – Type of output/dependent variable(Numerical)
* – Type of input/independent variable(Numerical)
* – More variables can be used by collapsing them into one(some formula like weighted sum)

3. Logistic —

* – Shape ==> S-shaped logistic curve
* – (no. of variables is assumed 1 independent vs 1 dependent)
* – Type of output/dependent variable(Binary-aka 2 categories)
* – Type of input/independent variable(Numerical)
* – More categories in output variable can be used by collapsing them into one(some formula dividing the output curve)

4. Polynomial —

* – Shape ==> Some complex polynomial function of order >= 2
* – (no. of variables is assumed 1 independent vs 1 dependent)
* – Type of output/dependent variable(Numerical)
* – Type of input/independent variable(Numerical)

5. Stepwise —

* – no. of variables is assumed multiple(n)- independent vs 1 dependent
* – Multiple steps, with each of them selecting specific variables based on variance, R-square, t-tests etc..
* – Forward selection , selects predictors/independent variables
* – Backward elimination , starts with all input variables and eliminates them by above mentioned methods.

6. Ridge

* – no. of variables is assumed multiple(n)- independent vs 1 dependent
* – adds a shrinkage (lambda) parameter (to regression estimates) to solve multicollinearity between independent variables
* – also called as l2 regularization.. it’s a regularization method

7. Lasso

* – Least Absolute Shrinkage and Selection Operator
* – also called as l1 regularization.. it’s a regularization method.
* – It shrinks coefficients to zero (exactly zero), which certainly helps in feature selection.
* – If group of predictors are highly correlated, lasso picks only one of them and shrinks the others to zero.

8. ElasticNet

* – Hybrid of Ridge and Lasso method, uses l1 and l2 prior as regularizer
* – It encourages group effect in case of highly correlated variables
* – There are no limitations on the number of selected variables
* – It can suffer with double shrinkage

9. Multi-variate —

* – no. of variables is assumed multiple(n)- independent vs 1 dependent

  1. Multi-variable —

* – no. of variables is assumed multiple(n)- independent vs multiple(n) dependent

Update: These are hardly exhaustive. A closer read and reflection will point out that adding multiple predictors, changing the function, or shape of the relationship, or changing the algorithm to find the right(aka converged) coefficients are easy ways to add combinatoric explosions.

Update2:  I made a jupyter notebook, demonstrating the differences between some of these it can be found here. Also there’s a completely different approach seen here. 

Update3:  Here is a link detailing how to check how good the model worked. Of course there’s the goodness of fit test I wrote some time back.

Chi-Square — goodness of fit Test

Pre-Script: This was inspired/triggered by this post.

For a long time, I’ve in the past taken a  “religiously blind”TM stance in the frequentists vs Bayesians automatically. (as evident in this post for example) For the most part it was justified in the sense that I didn’t understand the magic tables and the comparisons and how the conclusions were made. But I was also over-zealous and assumed the Bayesian methods were better by default. After realizing it I wrote a blog post (around the resources I found on the topic). This process convinced me that while the standard objections about frequentist statistical methods being used in blind faith by most scientists, may be true, there’s enough power they provide in many situations where Bayesian method would become computationally unwieldy. i.e: in cases where a sampling theory approach would still allow me to make conclusions with rigourous methods based uncertainty estimates, where Bayesian methods would fail.

So without further ado, here’s a summary of my attempt at understanding the Chi-Square test. Okay, first cut Wikipedia: . Ah.. Ok.. abort mission .. that route’s a no-go.. Clearly the Wikipedia Defn:

A chi-squared test, also referred to as a

{\chi}^2

test (or chi-square test), is any statistical hypothesis test wherein the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true. Chi-squared tests are often constructed from a sum of squared errors, or through the sample variance. Test statistics that follow a chi-squared distribution arise from an assumption of independent normally distributed data, which is valid in many cases due to the central limit theorem. A chi-squared test can be used to attempt rejection of the null hypothesis that the data are independent.

Has too many assumptions. It’s time to go back and read what’s Chi-squared Distribution first, and may be not just Wikipedia, but also that statistics textbook, that I’ve been ignoring for some time now.

Ok the definition of Chi-squared distribution looks straight forward, except for the independent standard normal part. I know what independent means, but have a more vague idea of standard and normal variables More down the rabbit-hole.
Ok that wikipedia link points here. So it basically assumes the k-number of variables are :

  • a, Independent of each other,
  • b, Are drawn from a population that follows the Standard Normal Distribution.

That sounds fairly rare in practice, but can be created by choosing and combining variables wisely(aka feature engineering in ML jargon). So ok. let’s go beyond that.

The distribution definition is simple. Sum of squares. according to wikipedia, but my text book says, it’s something like (X - \frac{\mu(X)}{\sigma(X)})^2 .
Hmm..

The textbook talks about Karl Pearson’s Chi-Square Test so I’ll pick that one to delve deeper.
According to the textbook, Karl Pearson proved* that the Sum of squares of ( \frac{(Observed - Expected)}{Expected})^2 follows a Chi-Squared Distribution.

The default Null Hypothesis or H0 in a Chi-Square test is that the difference between observed and theoretical/expected(according to your theory) values have no difference.
So clearly that magic comparison values are really just some p-percentage of significance you need on the ideal Chi-square distribution and seeing if your calculated value is less or more.
Conclusion comes from whether calculated value is less .If it’s less it means the Null Hypothesis is true by chance at the given significance level. Or to write it in Bayesian Terms P(Observations | H0) == P(chance/random coincidence)**
If it’s more well here’s what I think it means. P(Observations | H0) != P(chance |random coincidence) and we are p%*** confident about this assertion.

P.S.1: At this point the textbook goes into conditions where a Chi-Squared test is meaningful, I’ll save that for later.
P.S.2: Also that number k is called degrees of freedom, And I really need to figure out what it means in this context. I know what it means in the field of complexity theory and dynamical systems, but in this context I’ll have to look at the proof or atleast math areas the proof draws upon to find out. #TODO for some time. another post.

  • — According to the Book the Chi-Squared test does not assume any thing about the distribution of Observed and Expected values and is therefore called non-parametric or distribution-free test. I have a difficult time imagining an approach to a proof that broad, but then I’m not much of a mathematician, for now I’ll take this at face value.

** — I almost put 0.5 here before realizing that’s only to for a coin-toss with a fair coin.

*** — The interpretation what this p-value actually means seems to be thorny issue. So I’ll reserve it for a different post.

Continuity of a function.

Most of us, would have studied(likely in high school) about the idea of functions being continuous.

As the wikipedia section states, we end up with 3 conditions for a function over an interval [a,b].

  • The function should be defined at a constant value c
  • The limit has to exist.
  • The value of the limit must equal to c.

 

Now this is a perfectly useful notion for most of the functions we encounter in high school. But there are functions that would satisfy these three conditions, but still won’t be helpful for us to move forward. And I just discovered while reading up for my self-educating on statistics. One moment there I was trying to understand what’s the beta distribution, or for that matter what sense does it make to talk about a probability density function, I mean understand what’s probability, but how can it have density and that sort of thing.. I lose focus a few seconds, and find myself tumbling down a click-hole to find a curiouser idea about 3 levels/types(ordered by strictness) of continuity of a function namely

 

The last one being what we studied and what I described above.
Now let’s get Climb up one more step on the ladder of abstraction and see what’s uniform continuity.
Ah we have five more types of continuous there namely

Ok I won’t act vogonic and try to understand or explain all of those . I just put them out there to tease feel free to click your way in.

To quote first line from the uniform continuous wiki:

In mathematics, a function f is uniformly continuous if, roughly speaking, it is possible to guarantee that f(x) and f(y) be as close to each other as we please by requiring only that x and y are sufficiently close to each other; unlike ordinary continuity, the maximum distance between f(x) and f(y) cannot depend on x and y themselves. For instance, any isometry (distance-preserving map) between metric spaces is uniformly continuous.

So what does this mean and how does it differ from the ordinary continuity? Well they say it up there that the maximum distance between f(x) and f(y) cannot depend on x and y themselves. i.e: the dist.function: df(f(x), f(y)) has neither x or y in it’s expression/input/right hand side.

The more formal definition can be quoted like this:

.
Given metric spaces (X, d1) and (Y, d2), a function f : X → Y is called uniformly continuous if for every real number ε > 0 there exists δ > 0 such that for every x, y ∈ X with d1(x, y) < δ, we have that d2(f(x), f(y)) < ε.

Now why would this be relevant or useful and why is it higher/stricter than ordinary continuity. Well note that it doesn’t say anything about an interval. The notion of ordinary continuity is always defined on an interval in the input space and clearly is confined to that. i.e: it is a property that is local to the given interval in input/domain space and may or may not apply on other different intervals.

On the other hand if you can say the function is uniformly continuous you’re effectively saying the function is continuous at all intervals.

Now how do we find a more general definition(i.e: absolute continuity?) Well look at the 3 conditions we defined at the start of this blog post. The first two can be collapsed to say the function must be differentiable over the given interval [a,b]. The third is the distance/measure concept we used in uniform continuous definition to remove the bounds on the interval and say everywhere. So obviously for the absolute continuous definition we do the next step and say the function must be uniformly continuous and differentiable everywhere.(aka uniformly differentiable).

Ok all of this is great, except where the hell is this useful. I mean are there function that belong to different continuous classes, so that these definitions/properties and theorems built on these can be used to differentiate and reason about functions. Turns out there are . I’ll start with something i glimpsed on my way down the click-hole, the cantor distribution. . It’s the exception that causes as to create a new class of continuous. It’s neither discrete nor continuous.

It’s distribution therefore has no point mass or probability mass function or probability density function.* Therefore throws a lot of the reasoning/theorem systems for a loop.

For the other example i.e: something ordinarily continuous but not uniformly continuous see here. . It’s a proof by contradiction approach.

* — Ok, I confess, the last point about point mass and probability mass/density function still escapes me. I’ll revisit that later, perhaps this time with the help of that excellent norvig’s ipython notebook on probability.