This is a long standing debate/argument and like most polarized arguments, both sides have some valid and good reasons for their stand. (There goes the punchline/ TLDR). I’ll try to go a few levels deeper and try to explain the reasons why I think this is kind of a fake argument. (Disclaimer: am just a math enthusiast, and a (willing-to-speculate) novice. Do your own research, if this post helps as a starting point for that, I’d have done my job.)
- As EY writes in this post about how bayes theorem is a law that’s more general and should be observed over whatever frequentist tools we have developed?
- If you read the original post carefully, he doesn’t mention the original/underlying distribution, guesses about it or confidence interval(see calibration game)
- He points to a chapter(in the addendum) here.
- Most of the post otherwise is about using tools vs using a general theory and how the general theory is more powerful and saves a lot of time
- My first reaction to the post was but obviously there’s a reason those two cases should be treated different. They both have the same number of samples, but different ways of taking the samples. One sampling method(one who does sample till 60% success) is a biased way of gathering data .
- As a different blog and some comments point out, if we’re dealing with robots(deterministic algorithmic data-collector) that precisely take data in a rigourous deterministic algorithmic manner the bayesian priors are the same.
- However in real life, it’s going to be humans, who’ll have a lot more decisions to make about considering a data point or not. (Like for example, what stage of the patient should be when he’s considered a candidate for the experimental drug)
- The point I however am going to make or am interested in making is related to known-unknowns vs unknown-unknowns debate.
- My point being even if you have a robot collecting the data, if the underlying nature of the distribution is unknown-unknown(or for that matter depends on a unknown-unknown factor, say location, as some diseases are more widespread in some areas) mean that they can gather same results, even if they were seeing different local distributions.
- A contiguous point is that determining the right sample size is a harder problem in a lot of cases to be confident about the representativeness of the sample.
- To be fair, EY is not ignorant of this problem described above. He even refers to it a bit in his 0 and 1 are not probabilities post here. So the original post might have over-simplified for the sake of rhetoric or simply because he hadn’t read The Red queen.
- The Red queen details about a bunch of evolutionary theories eventually arguing that the constant race between parasite and host immune system is why we have sex as a reproductive mechanism and we have two genders/sexes.
- The medicine/biology example is a lot more complex system than it seems so this mistake is easier to make.
- Yes in all of the cases above, the bayesian method (which is simpler to use and understand) will work, if the factors(priors) are known before doing the analysis.
- But my point is that we don’t know all the factors(priors) and may not even be able to list all of them, let alone screen, and find the prior probability of each of them.
P.S: Here’s a funny Chuck Norris style facts about Eliezer Yudkowsky.(Which I happened upon when trying to find the original post and was not aware of before composing the post in my head.) And here’s an xkcd comic about frequentists vs bayesians.
UPDATE-1(5-6 hrs after original post conception): I realized my disclaimer doesn’t really inform the bayesian prior to judge my post. So here’s my history/whatever with statistics. I’ve had trouble understanding the logic/reasoning/proof behind standard (frequentist?) statistical tests, and was never a fan of rote just doing the steps. So am still trying to understand the logic behind those tests, but today if I were to bet I’d rather bet on results from the bayesian method than from any conventional methods**.
UPDATE-2(5-6 hrs after original post conception): A good example might be the counter example. i.e: given the same data(aka in this case frequency of a distribution, nothing else really, i.e: mean, variance, kurtosis or skewness) show that bayesian method gives different results based on how it(data) was collected and frequentist doesn’t. I’m not sure it’s possible though given the number of methods frequentist/standard methods use.
UPDATE-3 (a few weeks after original writing): Here’s another post about the difference in approaches between the two.
UPDATE-4 (A month or so after): I came across this post with mentions more than two buckets, but obviously they are not all disjoint sets(buckets).
UPDATE-5(Further a couple of months after): There’s a slightly different approach to splitting the two cultures from a different perspective here.
UPDATE-6: A discussion in my favourite community can be found here.
** — I might tweak the amount I’d bet based on the results from it .