>> Thursday, March 1, 2012
I think I've mentioned statistics before and why tend to be leery. Maybe on one of my other blogs. Fortunately, Relax Max provided an example on his blog post. Fun with numbers! Go there first if you want to see the problem and solve it without seeing the answer 'cause I'll be doing some math below but bear with me.
Suppose that a barrel contains many small plastic eggs. Some eggs are painted red and some are painted blue. 40% of the eggs in the bin contain pearls, and 60% contain nothing. 30% of eggs containing pearls are painted blue, and 10% of eggs containing nothing are painted blue. What is the probability that a blue egg contains a pearl?This is quite solvable for everything but the total number of eggs.With four variables and four equations, it's workable to solve the ratios of red(R), blue(B), pearl(P) and empty(E) eggs.
Now, I could show the steps, but I'm too lazy, but, from this, we can deduce that the pearl:empty ratio of blue eggs is 2:1 and of red eggs is 14:27. The ratio of pearl:empty for the total, of course is 2:3 and the ratio or blue:red eggs is 9:41
That doesn't count the eggs BUT, unless we accept the notion that there are partial eggs involved, we can figure out a least possible number of eggs. When I did this in my head on the way home, I misdid the proportions of red and blue to 8:41 and realized we'd need a factor of 49, but I realized my mistake later and revised it to factors of 50, including 100. However, that's wrong, too. If we have to keep ourselves to whole numbers (which seems logical), the per red egg proportions have to be right, too and you can't do that with a total number of red eggs of 30 or 60.
My calculations give me a minimum number of eggs of 6150; however, if any of you find a smaller number I'm more than willing to compare notes.
So, why the math problem? Are any of you still with me? Because, with all those numbers I've tossed out like I'm knowledgeable and have said something significant and absolute, there are these important little things called caveats and I have to thank soubriquet for pointing out (on RM's blog) that, without making some pretty important assumptions, you can't deduce anything. He/she pointed out a few (I'll make those green) and I'll note all the others ones I think of absolutely necessary to reach any numerical conclusion:
- 100% of the plastic eggs in the bin are painted red or blue (why would you paint plastic eggs?). There are no other colors of eggs in the barrel (or, if there were, they've been repainted red or blue)
- Only a blue painted egg is counted as blue, no matter what the original plastic color
- The eggs painted red and blue are in the bin
- We're dealing with whole eggs, not fractions of eggs that would be counted as "empty"
So what? Right? Except we, the people, are constantly bombarded with statistics all the time, oftimes touted as "facts". What we're generally not bombarded with are the caveats and assumptions which go into just about every scientific "fact" (limitations in detection, for example) and, to an even greater degree, all those more nebulous "facts" many treat like absolutes.
"XX% of the YY populace thinks ZZ," for example.
Here are some of the assumptions necessary for that last statement to mean anything:
Whatever sample they used is representative of the whole. Let's face it, chances are this is an estimate using a "sample" of the YY populace since few organizations have the wherewithal to ask any entire population (with the possible exception of say, Polar Bear Club members or something) a question. Chances are, they chose a "representative sample" that might be representative and, just as easily, might be completely unrepresentative of the whole. Let's also note that few organizations have ever asked any sample and received 100% feedback, so it's not just the sample asked but the sample that was willing to reply. I don't know about you, but just thinking those that are willing to answer polls have the same mindset as those who won't bother seems suspect. Geography might make a difference. Social class, race, gender, sexual orientation, political affiliation, those might make differences, too. If the sample is too restrictive or weighted, you won't get a meaningful answer unless that bias is included in the YY descriptor. However, narrowing down the YY descriptor makes the answer more meaningless to everyone else as well.
The wording of the question would be accurate enough to capture the opinion stated (or the thought expressed means what was asked). Seems obvious but I've been on the receiving end of a few questions that were as misleading and answerless as the famous "So, do you still beat your wife?" question. Asking a mixed political group "Do you think Democrats in Congress/the Senate/your State government have failed to stand up to Republicans enough?" will get you different result than asking "Are Democrats causing our political problems?" but either can be lumped as "disapprove of Democrats." Given how different the questions are and how differently the mitigations are depending which camp you're in, lumping them as the same just shows you how misleading a statistic can be. Or let's try this: "Would you want your child to marry someone of the same gender?" vs. "Do you think people should be able to marry someone of the same gender?" Think they'd get the same answer ("approve of gay marriage")? Before you get too spun up on these intolerant parents, remember that not wanting your child to be gay doesn't mean you wouldn't accept them if they were any more than not wanting your child to be autistic means you'd smother them in their sleep if they were. It's all in how the question is worded which may or may not have anything to do with how the conclusion is worded. But, for the statistic to have meaning, we have to assume they are the equivalent.
People questioned say what they mean. Any time you take a poll, you are also dependent on the respondee to tell you what they actually think in a way that represents what they'll do/think/did. A poll that asks something relatively benign ("Do you like Chinese food?") is more apt to get accuracy than a poll that asks something sensitive ("Have you had sex with someone other than your spouse?") but some people won't answer correctly anyway. Anonymity might buy back some of that, but anyone who thinks a blind poll will garner only untarnished truth when asking "Have you ever had sexual fantasies about a relative?" or "Would you ever engage in premarital sex?" is probably delusional.
There's more, of course, especially if we move from what seems like easy-peasy ground of what actually is (and we've already seen how misleading statistics can be for that) to predicting things based on trends.
Talk about voodoo.