6Apr/1132

Significant

by Jeff

Image text: 'So, uh, we did the green study again and got no link. It was probably a--' 'RESEARCH CONFLICTED ON GREEN JELLY BEAN/ACNE LINK; MORE STUDY RECOMMENDED!'

In this comic, Megan and Cueball come to see some scientists to investigate their theory that jelly beans cause acne.  The scientists of course do not want to stop playing the addictive game Minecraft (which has been referenced in previous xkcds).  I can attest to the fact that it is impossible to stop playing that game.

When the scientists come back and say there is no link between jelly beans and acne, Megan and Cueball have heard that it is specific colors that cause acne.  So, the scientists tear themselves away from Minecraft again and study each and every color (including mauve and turquoise - I didn't know those were real colors for jelly beans).  Green is the only color that has any link.  So, obviously the news blows the coverage out of control, even though as we see in the image text that the green jelly beans study may have been a coincidence.

But, of course, the news coverage in all caps in the image text blows it out of proportion again.

EDIT: Commenters Marco and Aron B. have the real joke in the comments, hit the link and check them out.

Comments (32) Trackbacks (0)
  1. More to the point, the scientists studied 20 different colors, and only one showed a link at the 95% confidence level. The 95% confidence level MEANS that there is a 95% chance that the link is not random. So by definition we would expect 1 out of 20 studies to show a link at this level by pure random chance.

    This is a big problem in correlation research generally, because scientists study many different possible causal factors, and then report the ones that show significance. Without a plausable causation mechanism it’s not clear what value such studies have, given that 1/20 will randomly show a correlation at (p>.05). For more: http://www.fallacyfiles.org/multcomp.html

  2. That’s not really the joke. The joke is that each study had a 5% chance of coincidence, and 20 studies were made, so on average one of them would have wrong results.

    • Oops, me and Aron B. posted more or less at the same time. Just clarifying: My comment was directed at the main post. Aron is correct.

      • Modest correction: you don’t simply do 20 x (0.05) and get a probability of 1 that there would be at least one significant P value. The formula is (1 – (0.95)**20); so there’s actually a 64% chance of at least one significant P value. (This assumes each test is independent, which may not be true in this case.)

        • He didn’t say that the probability is 1. He said that on average one out of twenty will be false.

          Just like on average one in two coin flips will be tails-doesn’t always happen, but as you approach an infinity number of flips ~1 in 2 will be tails.

  3. Marco and Aron B. – Thanks for hitting the nail on the head.

  4. SCIENCE!!! It’s the best lie we have….

  5. Somewhat disappointed Bonferroni wasn’t mentioned at all.

  6. Actually, I think Megan was one of the scientists. The dark-haired lady commissioning the study had a ponytail, and Megan usually wears her hair straight.

  7. Ummmmmmmm…the panel where green is associated with acne the association is p0.05. I know nothing about statistics, but could this signify a reciprocal relationship and indicate that there’s no link for green as well?

    • Hmmmmmmm…something about the punctuation marks screwed up my entry.
      All the panels show a “p is greater than 0.05″ except for the green one which shows “p is less than 0.05″. I know nothing about statistics, but could this signify a reciprocal relationship and indicate that there’s no link for green as well?

      • nope, that’s the way p ist defined.
        You essentially compare the result to one that would have been resulted by chance. Say, you throw a die 600 times and you get 111 times a “6″. Is the die loaded?
        One way to get this answer is to set a “real, random” die as (virtual) comparison with appropriate distribution (say, normal distribution) and then look how probable it is to get 111 times “6″ in 600 throws by chance.
        If this propability (hence, p) is lower than a certain threshold (mostly 5% for convenience, in certain cases way lower) one can assume with great confidence that this result was not just by chance.

        The problem is: this confidence is only reaonable if you do just one, independent, research. If enough people throw dies (or enough tests with different colours of jelly beans are done) some of the times even results with p lower than 5% (1%, 0,1%) just happen by chance. Or: if you do enough tests you once in a while *will* get a p lower than 5% (1%, 0,1%). If you ten throw away all other results it looks like you’ve found something.

        That’s the problem with epidemiological studies, especially if you want to find out, if an agent has *any* negative impact on health. Say, you want to find out if living near a nuclear plant causes cancer. The simplest approach is to get cancer data of all people living near the plant of interest and look for cancers higher than “average” in population. Now you “find” that living between 5 and 15km near Plant A causes leukemia with p smaller than 0.01. Does this prove “nuclear plants are unsafe, they are the cause of cancer in the population”? Nope, because *at the same time* you tested 0-5km, 15-20km radii etc, plus different sorts of cancer (lymph, bone, thyroid, brain tumors, “cancer combined”) with perhaps 50 or 100 different combinations. But the headlines (or perhaps even your paper, if you have an agenda) will cry exactly this statement…
        Just yesterday I heard “living east of a nuclear plant means higher probability of getting cancer. What are the odds that north, west and south were also tested with negative results and nobody has any idea of *why* east should be special? :D

        The same can be said for many astrological “results”: “Capricorns live longer than Lions” (also tested, but no result, for the other 12*11-1 combinations…)

        Or, in short “correlation isn’t causation”

        • Hi. Thanks for the explanation, but my poor feeble brain (reference to Phineas Freak in The Fabulous Furry Phreak Brothers!) didn’t understand it! At least not completely. I do know some stuff about statistics. I do understand the idea of correlation is not causation, the “Texas Marksman Problem” (looking for one kind of association and missing the fact that there are equally many non-associate results, and the general idea that “p”, although I didn’t know it was called p, was the statistical certainty of the results (something about 62%, 95%, 99.7% sticks in my mind as certain natural ranges or something). I was hoping my discovery meant the cartoon meant something different that it appeared to be to everyone else! Alas, twas not to be!

    • Testing: “”. Let’s see if that works.

    • Now you are about to learn something about statistics (or at least, how experiments and scientists use statistics).

      When a scientist conducts an experiment like those in the comic, they compare the test outcome (the rate of acne among green jellybean eaters) with a control outcome (the rate of acne among folks who don’t eat green jellybeans). Since there is always a certain amount of randomness and error in such tests, they compute the probability p that the observed test outcome could have come about purely by chance assuming that the hypothesis they are testing (green jellybeans give acne) is false. If the probability p is small enough, they conclude the result is “significant”.

      By saying that for mauve jellybeans p is greater than 0.05 (a common cutoff for significance), they are stating that the results weren’t statistically significant. On the otherhand, the results for green jellybeans (p less than 0.05) are statistically significant.

      One would expect that 1 out of 20 tests would give results with a p less than or equal to 0.05 if the hypothesis was false. If one didn’t, one could reasonably suspect that ones method of calculating p values was flawed.

      • Ah. I see. The statement that p is less than 0.05 is the confirmation of an association greater than the specified probability range. (I do know some vague things about statistics, but I wouldn’t use this knowledge to try to prove something!)
        I thought that xkcd was making a joke about well wrung out research causing a stir because of a typo that both the researchers and the media missed.
        Thanks!

  8. I think people are missing the < (less than) 0.05

  9. I feel like there’s another punchline in the roll over text since, if they were testing with a batch of middle/highschool kids they’d see the correlation but once another group of people were tested, naturally the results didn’t correlate.

  10. So… What’s the difference between something existing with a probability > 0.05 and something NOT existing with a probability < 0.05? I think this is a comment on the media going nuts when someone says the same thing a different way.. The second test had the same results (again the misunderstanding by the media) but now the results are conflicted according to the media.

    • Aha! Got it! I did notice the ” no link”, “found link” difference between the panels but it didn’t occur to me that this was the “reciprocal” relationship I was trying to read in. Much funnier now!

  11. I once had a grad student consult me who had run 120 chi-squares on his dissertation data and was trying to interpret the eight positive associations (p<.05) therein. If only he had consulted a statistician first …

  12. Is there any possibility that maybe the joke is a reference to green m&m’s being an aphrodisiac? and a side affect of the surge in hormones is acne?

  13. I think that the actual joke is much less nit-picky statistics related than everyone is presuming.

    The researchers are saying that there is “no link between jelly beans and acne (p>.05)” for all of the colors. Then for green they decide to switch it up and say “there IS a link between jelly beans and acne (p<.05)" which is the exact same thing as they were saying earlier! (note that the sign at the p has changed to less than)

    The joke is that the media takes the statement about the green jelly beans and uses it to create panic, even though it is no different than the other tests, just phrased differently.

  14. Aha. BUT they are using a double negative — they are NOT rejecting the null hypothesis instead of accepting it. Which is the same thing. See Matt’s comment above — maybe he explained it slightly better.

  15. My limited understanding of statistics indicate that to decide whether you accept or reject the null hypothesis (in this case we can assume the null hypotheses are “[colour x] jelly beans DO NOT cause acne”.

    Again, as I understand it, “p-value” is the probability that the result you get from the experiment would happen anyway, ASSUMING the null hypothesis.

    In a coin flip, if my null hypothesis is “this coin is fair”, and I flip one coin and it’s heads, the p value for that test is 0.5. If I flip two times and both come up heads, p = .25 (the odds [at least] two heads will come up on two tosses). If I toss three times and two come up heads, the odds of at least two heads coming up are p = … I think 0.5 is the odds of at least 2 heads out of three tosses.

    In any event, you choose a significance level – the level at which you’re comfortable saying ‘the results are significant enough to assume a correlation’ – in the case of this comic, the significance level was 0.05 or 5%. So something has to be only 5% likely to happen for the scientists to believe it happened due to the test factor and not due to coincidence.

    So if there was 5.1% probability that 20 kids out of 100 people in the Green Bean test group would get acne, and 4.9% probability that 21 out of 100 would get acne, for the scientists to be able to say “[Colour x] beans are linked to having acne”, 21 would have to show acne in the Green Bean group.

    Note that p > 0.05 doesn’t translate into “there’s a 5% chance we’re wrong”. What it translates into is – the results we got for this experiment happen more than 5% of the time.

    With a coin flip, if I want to test a link between flipping a coin with my eyes shut and having a heads come up, an experiment will never create a false positive because flipping one coin, even if a heads comes up, is 50% probability no matter what my eyes are doing. not less than 5%. Keep in mind, that the results would still say p > 0.05 – whether it was 50% odds or 5.1%.

    If I flip 100 coins though, and heads comes up 80 times, [without doing the math - I'm assuming] there is less than 5% probability of flipping 80 heads out of 100 flips. Therefore, there may be a link between shutting my eyes and flipping more heads. A prudent scientist would test this many times to rule out coincidence, but certainly by coincidence, I COULD flip 80 heads out of 100. It’s just unlikely. It may have nothing to do with my eyes, but because it’s so unlikely, there may be a link. Retesting would confirm or deny this.

    The reason the > flips for green is because however many , the reason they can claim that there is a link beween green beans and acne is because p 0.05) in the comic is the result of the experiment, and the statement “There is no link…” is the English language translation of that mathematical result. In other words, the scientist saying “There is no link… BECAUSE the results of the experiment were: p > 0.05″ – he isn’t saying “there is no link (and the probability we’re wrong is p > 0.05)”

    I think the comic is just simplifying though and basically making a commentary that in the long run, the odds end up being that if you do enough tests, the odds of a false positive becomes almost inevitable.

  16. Its common knowledge that the green ones make you horny. That means that the horny kids get acne. Which makes sense for one to have acne because of all the personal touching and sweating.


Leave a comment


Anti-Spam Protection by WP-SpamFree

No trackbacks yet.

Pages

Facebook

Blogroll

Categories

Meta