Full Disclosure: I’m biased against facebook due to my previous experiences having an account on facebook. And have strong interests in net neutrality.
Now on to the survey that FB PR team cites. taken from here.
I was reading Mahesh murthy’s post about the big ‘net neutrality vs free basics’ debate and came across this link that FB seems to cite for claiming most indians want the “Free Facebook” plan. So I tried to dig in to the survey details and here are my criticisms.
Survey methodology A national survey was conducted in November and December 2015 with a representative sample of 3,094 adult residents of India. The margin of error is 1.8%. The survey was conducted door-to-door throughout the country of India. The sample reflects the full diversity of the regions and states of India, and was conducted with a full representation of large cities, small cities, towns, villages, and rural areas. The sample reflects the population of India with respect to age, gender, and other demographic variables.
I’ll withold comments on the summary part. But let’s talk about the Methodology:
Representative sample of adult residents of India:
The country’s population is 1 billion. ~3100 people out of it is not representative no matter what magic sampling method you use.
Margin of error is 1.8%:
Ermmm.. Thats’ interesting, but can you please tell us how you came up with that number?
The sample reflects the full diversity of the regions and states of India, and was conducted with a full representation of large cities, small cities, towns, villages, and rural areas. The sample reflects the population of India with respect to age, gender, and other demographic variables.
Now, I’m convinced you guys have read the book “How to lie with statistics?”* Seriously, can you provide some split of how many of those 3094 survey participants live in a small city, town, village, rural area, large city. Also can you provide us the split/pivot tables based on age, gender and these other demographic variables you used to decide it was a representative sample.
Alright, that’s it based on what they have provided. Now more on what they’ve not provided or what I’d like to see if I’m expected to make a policy decision based on a survey.
Null Hypothesis they decided to test [based on which they should have designed the survey questionnaire].
Sample size determination: Now it’s not clear from the press release but may be this is what they refer to as error margin. Namely they chose a confidence interval of 1.8% to arrive at a sample size?
No mention of what, if any statistical test they used. My suspicion is that they ran some but didn’t like the p-values and confidence interval, they gave, so decided not to report.
Perhaps the most important, the actual phrasing of the questions was decided before hand or just verbally improvised. Given the language diversity, it might be acceptable to verbally improvise, but that would simply double[or more precisely no.of.languages x times] the ideal sample size.
By the way, the number other categorical variables [like they mention, age, gender, urban/rural/town/city , etc..] each of them would multiply the ideal sample size**.
Also they mention in their statistics some split up by college graduates, though there’s no mention of diverse sample by education level.
UPDATE-19-May-2016: Guardian came out with a new article about what happened.
UPDATE-19-June-2016: As of today, I have a facebook account that’s about 10 days old, and is used as a blog posts distribution tool. (Same for linkedin, they were getting into dark patterns enough that I had to delete and recreate my profile there for clean-up of random community memberships)
* — Or are just clueless.. But given that they seem to be a research firm, they might as well be doing illegal business.
** — One way to calculate this ideal samplesize would be to use a calculator like this.