Pre-Script: This was inspired/triggered by this post.
For a long time, I’ve in the past taken a “religiously blind”TM stance in the frequentists vs Bayesians automatically. (as evident in this post for example) For the most part it was justified in the sense that I didn’t understand the magic tables and the comparisons and how the conclusions were made. But I was also over-zealous and assumed the Bayesian methods were better by default. After realizing it I wrote a blog post (around the resources I found on the topic). This process convinced me that while the standard objections about frequentist statistical methods being used in blind faith by most scientists, may be true, there’s enough power they provide in many situations where Bayesian method would become computationally unwieldy. i.e: in cases where a sampling theory approach would still allow me to make conclusions with rigourous methods based uncertainty estimates, where Bayesian methods would fail.
So without further ado, here’s a summary of my attempt at understanding the Chi-Square test. Okay, first cut Wikipedia: . Ah.. Ok.. abort mission .. that route’s a no-go.. Clearly the Wikipedia Defn:
A chi-squared test, also referred to as a
test (or chi-square test), is any statistical hypothesis test wherein the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true. Chi-squared tests are often constructed from a sum of squared errors, or through the sample variance. Test statistics that follow a chi-squared distribution arise from an assumption of independent normally distributed data, which is valid in many cases due to the central limit theorem. A chi-squared test can be used to attempt rejection of the null hypothesis that the data are independent.
Has too many assumptions. It’s time to go back and read what’s Chi-squared Distribution first, and may be not just Wikipedia, but also that statistics textbook, that I’ve been ignoring for some time now.
Ok the definition of Chi-squared distribution looks straight forward, except for the independent standard normal part. I know what independent means, but have a more vague idea of standard and normal variables More down the rabbit-hole.
Ok that wikipedia link points here. So it basically assumes the k-number of variables are :
- a, Independent of each other,
- b, Are drawn from a population that follows the Standard Normal Distribution.
That sounds fairly rare in practice, but can be created by choosing and combining variables wisely(aka feature engineering in ML jargon). So ok. let’s go beyond that.
The distribution definition is simple. Sum of squares. according to wikipedia, but my text book says, it’s something like ( .
Hmm..
The textbook talks about Karl Pearson’s Chi-Square Test so I’ll pick that one to delve deeper.
According to the textbook, Karl Pearson proved* that the Sum of squares of follows a Chi-Squared Distribution.
The default Null Hypothesis or H0 in a Chi-Square test is that the difference between observed and theoretical/expected(according to your theory) values have no difference.
So clearly that magic comparison values are really just some p-percentage of significance you need on the ideal Chi-square distribution and seeing if your calculated value is less or more.
Conclusion comes from whether calculated value is less .If it’s less it means the Null Hypothesis is true by chance at the given significance level. Or to write it in Bayesian Terms P(Observations | H0) == P(chance/random coincidence)**
If it’s more well here’s what I think it means. and we are p%*** confident about this assertion.
P.S.1: At this point the textbook goes into conditions where a Chi-Squared test is meaningful, I’ll save that for later.
P.S.2: Also that number k is called degrees of freedom, And I really need to figure out what it means in this context. I know what it means in the field of complexity theory and dynamical systems, but in this context I’ll have to look at the proof or atleast math areas the proof draws upon to find out. #TODO for some time. another post.
- — According to the Book the Chi-Squared test does not assume any thing about the distribution of Observed and Expected values and is therefore called non-parametric or distribution-free test. I have a difficult time imagining an approach to a proof that broad, but then I’m not much of a mathematician, for now I’ll take this at face value.
** — I almost put 0.5 here before realizing that’s only to for a coin-toss with a fair coin.
*** — The interpretation what this p-value actually means seems to be thorny issue. So I’ll reserve it for a different post.