Explain xkcd: It's 'cause you're dumb.
The panel satirises the common misunderstanding of the concept of percentage. Quoting a percentage change without mentioning the base probability that this ratio acts on is meaningless (outside of arithmetic for arithmetic's sake). Most everyday communication, however, succumbs to such incompleteness. In the aftermath of this ambiguity, people tend to conflate relative and absolute changes.
If the probability of a shark attack at the North beach is 5 per million, then the probability of shark attack at the South beach is still not more than 6 per million. The difference between these values is not enough to normally justify choosing one beach over the other, even though a "20% greater" chance sounds significant when stated out of this larger context.
Cueball parodies the concern by noting that by going to a beach three times instead of two, their chances of attack by dogs with handguns in their mouths (a ludicrous and unrealistic scenario as dogs cannot buy guns and are not likely to pick one up off the ground) increases by 50%. If the chance of the dog attack is one per billion on each visit to the beach, then the chance of attack increases over multiple visits regardless; it's still one in a billion for any specific visit. This does not change the overall improbability of there ever being a dog swimming with a gun in its mouth.
Beret Guy misunderstands Cueball's probability, exhibiting the Gambler's fallacy by believing that since they haven't been attacked in their first two trips, the chance of attack by dogs with handguns is higher on their third outing.
This is a common misunderstanding of statistics. While the overall probability of an attack in three trips would be higher than in a single trip, it doesn't change the fact that in each individual trip, the probability is still the same; whether or not they managed to avoid being attacked in their first two trips, the results of these trips do not factor into the probability equation of the third trip.
This also can be illustrated by coin flips: if one flips a coin ten times in a row, no matter what the result of each previous flip is (even if it were nine heads in a row), the odds of getting heads on the tenth coin flip remains 50%. In other words, past experience does not impact subsequent flips.
The caption clarifies Cueball's point, but without sarcasm.
Then again, the title text objects to this point (that a tiny risk increased by 50% is still tiny). If this 50% increment is done repeatedly, the risk can get arbitrarily high, while the statement says that it is still tiny. This can be compared to the Sorites paradox (the "paradox of the heap"), which involves a "heap" of sand from which grains of sand are removed individually. If one assumes that, after removing a single grain, a heap of sand is still considered a heap of sand, and that there are a limited number of grains of sand in the heap, then one is forced to accept the conclusion that it can still be considered a heap of sand even if there is only a single grain of sand (or even none at all).
- [Three figures are standing around. Two have beach towels. Ponytail is looking at her cell phone. One of them is Beret Guy.]
- Ponytail: We should go to the north beach. Someone said the south beach has a 20% higher risk of shark attacks.
- Cueball: Yeah, but statistically, taking three beach trips instead of two increases our odds of getting shot by a swimming dog carrying a handgun in its mouth by 50%!
- Beret Guy: Oh no! This is our third trip!
- Reminder: A 50% increase in a tiny risk is still tiny.
add a comment! ⋅ add a topic (use sparingly)! ⋅ refresh comments!
I think this is to address the old chestnut of "<something> will double your risk of getting cancer!", or the like, where the risk of getting that cancer (in this example) is maybe 1 in 10,000, so doubling the risk across a population wouldmake that a 1 in 5,000 risk to your health... which you may still consider to be an acceptable gamble if it's something nice (like cheese!) that's apaprently to blame and you'd find abstinence from it gives a barely marginal benefit for a far greater loss of life enjoyment. Also, this sort of figure almost always applies towards a specific form of cancer, or whatever risk is being discussed, meaning you aren't vastly changing your life expectancy at all. In fact, the likes of opposing "red wine is good/bad for you" studies can be mutually true by this same principle (gain a little risk of one condition, lose a little risk from another). (Note: I don't know of any particular "cheese gives you cancer!" stories doing the rounds, at the moment. I bet they have done, but I only mention it because I actually quite like cheese. And I probably wouldn't give it up under the above conditions.)
It's also possible that this covers the likes of "<foo> in <country> is 10 times more dangerous than it is <other country>" statements. Perhaps only ten incidents happened in the former, and a single instance in the latter, out the whole of each respective country. Or a single incident occured in both, but the second country is ten times the size, so gets 'adjusted for population' in the tables. And, besides which, that was just for one year and was just a statistical blip that will probably revert-towards-the-mean next year.
Finally, for a given risk of some incident happening on the first two trips, with no 'memory' or build-up involved, it pretty much is half-as-likely-again for the incident to have happened (some time!) in three separate trips. (Not quite, if those that lose against the odds and get caught by the incident the first or second trip never get to have a (second or) third trip... but for negligable odds like thegiven example, of the dog with the handgun, it's near-as-damnit so.) 220.127.116.11 11:12, 16 August 2013 (UTC)
Where did "dogs with shotguns" come from? I only saw "handgun" in the comic. Besides, I interpreted the risk as being hit by a negligent discharge from the handgun, not being deliberately attacked by the dog. Also, since probabilities are the set of real numbers between 0 and 1 inclusive, there are an uncountable number of them. "A x% increase in a tiny risk is still tiny" is an inductive statement, which means it could only be used to argue that a countable set of numbers is tiny. 18.104.22.168 12:24, 16 August 2013 (UTC)
- If induction base is uncountable, you can prove it for the whole [0; 1]. For example your induction base may be "every risk under 0.00000000000000000001% is tiny". --DiEvAl (talk) 12:38, 16 August 2013 (UTC)
- Aha you caught me. I also realized that if a number is tiny, any number smaller than it is also tiny. So if we can prove that 1 is tiny, then we can prove that all numbers between 0 and 1 (known as probabilities) are tiny. Diszy (talk) 15:46, 18 August 2013 (UTC)
I think it's worth mentioning that this comic doesn't distinguish between percentages and percentage points. --DiEvAl (talk) 12:35, 16 August 2013 (UTC)
- I think it does. It never uses percentage points, and never claims to.Mumiemonstret (talk) 12:09, 10 April 2015 (UTC)
Is it the case that doing something three times increases risk by 50% over two times inherently? I feel like this is the case, but it's early, here. Also, I'm not sure Randall is attacked by a dog, he may be using it as a diversion. I think that he's done this before. Theo (talk) 12:56, 16 August 2013 (UTC)
- (First, good point, DiEvAl, about the percentages/percentage-points. I knew I'd missed something out in my first thoughts. I actually tend to assume against percentage points, which is somewhat the opposite from what I've seen in the general public.)
- Actually, depends on how you count it. But I was using the "encounter 'n' incidents per trip", "encounter '2n' incidents per two trips", "encoutner '3n' incidents per three trips" measure, where 3n==2n+50%. But that works best with a baseline of >>1 incidents per trip assumed. In reality, if the chance is a fractional 'p' for an occurance in one instance, it's (1-p) that it didn't occur thus (1-p)n that it didn't occur in any of 'n' instances and 1-(1-p)n that it did (at least once, possible several times or even all). Not so simple, but for p tending to zero it 'does' converge on 1.5 times for across three what you'd expect for two (albeit because 0*1.5=0). Like they say, "Lies, Damn Lies...", etc. ;) 22.214.171.124 14:22, 16 August 2013 (UTC)
I don't think Randall is being attacked by a dog at all. What he's saying is that if you are going to think getting attacked by a shark is so likely, then you better be watching out for that never-gonna-happen dog scenario too. Jillysky (talk) 13:56, 16 August 2013 (UTC)
Is 0.000001% really "one in a million"?
- If 1% = 1 in 100, then
- 0.1% = 1 in a 1,000
- 0.01% = 1 in a 10,000
- 0.001% = 1 in a 100,000
- 0.0001% = 1 in a 1,000,000
- 0.00001% = 1 in a 10,000,000
- 0.000001% = 1 in a 100,000,000
Would it be more accurate to leave off the % sign?
Assuming I'm right, I think it'd be less confusing to leave it and reduce the numbers by a couple orders of magnitude.
--Clayton 126.96.36.199 14:36, 16 August 2013 (UTC)
If the chance of the dog attack is 0.000000001% (one in a billion) on each visit to the beach, then the chance of attack over two visits is 0.000000002% whereas in three visits it becomes 0.000000003%
Um, no. Following that logic, if I go to the beach a billion times then I will get shot by a dog that is packing. Rather, each visit to the beach has it's own odds, like the rolling of dice? On any particular visit there's a one-in-a-billion chance. And that's true on each subsequent visit as well. Tuesday's visit to the beach isn't twice as dangerous just because I was at the beach on Monday. CFoxx (talk) 16:26, 16 August 2013 (UTC)
- For each visit that is the case. Because it's one visit, that's true. However, if (time not being a factor) one were to have a billion visits planned, the odds over all would be increased. Pretty sure that overall this means that you got the joke faster than I did. Thanks for the clarification! Theo (talk) 17:06, 16 August 2013 (UTC)
- The odds overall may increase with multiple visits. But not, at least, at the rate listed. Otherwise that billionth trip (if one survived that long as one is likely to do) would be certain death. CFoxx (talk) 17:30, 16 August 2013 (UTC)
- Correct. Technically, the odds we are worried about are the "probability of being shot one or more times by a dog". So if the probability is 1/10^9 for any given day, than the odds of not being shot are (10^9-1)/10^9 for any given day, and the odds of not being shot over three days are (10^9-1)^3/10^27, and then the odds of being shot one or more times are 1-((10^9-1)^3/10^27), which is roughly 2.999999997000000001/10^9. That is close, but slightly less, than 3/10^9. 188.8.131.52 18:01, 16 August 2013 (UTC)Toby Ovod-Everett
- Absolute incorrect: You always have to look at the single event. More events do not belong together, you always have the same probability at each single event. So, even 10 billion events may or may NOT result in a disaster. Math isn't easy.--Dgbrt (talk) 19:17, 16 August 2013 (UTC)
- I believe what CFoxx was saying is that if the odds of something happening on any given day are one in three, then the odds of that thing happening at least once during a four day period is NOT 4/3rds! I was pointing out that the proper way to calculate the odds for a four day period is to say that the odds of it not happening on any given day are two in three. You take that probability and raise it to the fourth power, giving the odds that it won't happen at all during a four day period of 16/81, thus the odds that it will happen during that four day period is 65/81. I then did that same calculation for the 1 in a billion chance per day and applied it to the three day period, and recognized that he was correct that the true probability of the event happening one or more times over a three day period was not three times the probability of it happening on any given day, but also noted that the difference for a 1 in a billion chance over a small period is pretty close to the simplistic (but incorrect) approach. My rough estimate for the "one in a billion per day" event happening one or more times during a billion day period is 63.21%.184.108.40.206 21:33, 16 August 2013 (UTC)Toby Ovod-Everett
- Wow, we still have many great scientists here!--Dgbrt (talk) 21:46, 16 August 2013 (UTC)
- THANK YOU, Toby! CFoxx (talk) 18:09, 17 August 2013 (UTC)
Just a thought: is the title text a reference to the Sorites paradox? --AJ 220.127.116.11 17:25, 16 August 2013 (UTC)
Rats! I made the newbie mistake of editing something before I found the discussion page. I looked for it, honest I did! I see that UTC has already brought up what I referred to as "Cueball's error" in my (pre-log-in) edit. I did find it hard to believe I'd be the first xkcd fan to notice this error. I think this is worth addressing in the explanation, though I of course won't take offense if someone wants to obliterate my edit and start over. (CLSI) -- CLSI (talk) (please sign your comments with ~~~~)
Maybe he means this: Florida man shot by his dog, police say http://usnews.nbcnews.com/_news/2013/02/26/17107343-florida-man-shot-by-his-dog-police-say?lite -- Jb (talk) (please sign your comments with ~~~~)
Saying that unfortunately Cueball is mistaken in his calculations because he said 50% instead of 49.99999992% is a bit of an exaggeration. Xhfz (talk) 20:19, 16 August 2013 (UTC)
In regards to the "flipping a coin and having it come up with heads 9 times in a row being no indication of future results" thing, I have to throw out that that is a common misunderstanding in basic logic; it's an example that people throw out all the time without really considering the real-life implications. With a truly fair coin, the situation as described is certainly true. But the odds of a fair coin coming up heads 9 times in a row is 512-to-1 against. That coin is overwhelmingly likely not a fair coin. I would say the odds of that coin flipping heads on the 10th flip is pretty damn close to unity. Hoopy Frood (talk) 17:00, 25 August 2013 (UTC)
- Chaos at explain section
Please stop adding this, it does not explain the comic, it only belongs to this discussion page:
- Note that the 50% figure is an approximation. Assuming the odds of being attacked by a dog is x, the odds of being attacked by a dog at least once in two visits is 1 - (1-x)2. The odds of being attacked at least once in three visits is 1 - (1-x)3. Therefore, if one visit has one in a billion probability of attack, then two visits have not 2 in a billion, but 1.999999999 in a billion. Similarly, three visits have a probability of 2.999999997 in a billion. Saying 50% instead of 49.99999992% is a reasonable approximation.
- Unfortunately, Cueball is mistaken in his calculations. This is easier to see with an event that has greater probability, such as a coin toss. Assuming the odds of getting heads in one flip is .5, the odds of getting heads at least once in two flips is .75 (i.e., 1 minus [.5 X .5], the odds of getting tails both times), and the odds of getting heads at least once in three flips is .875 (1 minus [.5 X .5 X .5], the odds of getting three tails in a row). Getting heads in three flips is not 50% more likely than getting heads in two flips. With very low probabilities (such as the probability of attack by a dog swimming with a handgun), Cueball's calculation gives an extremely close approximation of the actual probability, but one can't apply the same logic to events of just any probability.
- Cueball says *statistically* the risk of some bizarre event increases 50%. This is essentially correct as many have pointed out that 49.99999999 is not really statistically different than 50. What is likely bothering a lot of people (including myself) is that the explainxkcd description states "If the chance of the dog attack is one per billion on each visit to the beach, then the chance of attack over two visits *is* two per billion whereas in three visits it *becomes* three per billion." There are no weasel words like "approximately", "about", "around", etc. This reminds people of flatly incorrect uses of probabilities like the one you describe. But surely the probability of getting heads from a fair coin toss is not on a similar order of magnitude as the probability that a swimming dog shoots someone with a handgun. S (talk) 00:40, 17 August 2013 (UTC)
- What is likely bothering a lot of people (including myself) is that the explainxkcd description states "If the chance of the dog attack is one per billion on each visit to the beach, then the chance of attack over two visits *is* two per billion whereas in three visits it *becomes* three per billion." There are no weasel words like "approximately", "about", "around", etc. Exactly. Explanations here have been very helpful in explaining some of the more scientific aspects of things Randall includes. Noting this one makes a (albeit slight) mistake in that regard is appropriate. (And the irony of incorrectly using probabilities in explaining a comic about how people do that is amusing.) CFoxx (talk) 18:15, 17 August 2013 (UTC)
I had to think of http://xkcd.com/1102/ when reading the first paragraph of the explainxkcd description. (The context is different, but the dubious use of percentages is the same.) S (talk) 00:40, 17 August 2013 (UTC)
I believe Cuball's calculation is way off. The odds of a dog attack should increase by 50% when looking at two beach trips rather than one. But the odds of an attack occurring with 3 visits should only increase by about 16.67%. This can be seen by analyzing a fair dice roll or a coin toss. Unless I am missing something, even with extremely small probabilities, this will hold. Can anyone write a proof to show otherwise? 18.104.22.168 (talk) (please sign your comments with ~~~~)
- As far as I understand it: doing something twice doubles your chance of getting the desired outcome. For example, you want to role a dice and get a six. If you role it twice, you have double the chance of getting at least one six. If you role it three times you have triple the chance of getting a six; in other words you increase it from two chances to three chances, which is an increase of 50%. 22.214.171.124 (talk) (please sign your comments with ~~~~)
- It doubles the likely number of sixes, but does not double the chance of getting at least one six. This is because there is a small chance of getting two sixes, and while that counts as two sixes for the number of occurrences, it still only counts as one chance of getting at least one six. The easiest way to visualize this is to look at the probability that you won't get a six in any given roll of the die, which is 5/6ths. Each time you roll, the probability you won't get a six at all goes down by 5/6ths. So the probability for two rolls is 25/36ths, and thus the probability of getting one or more sixes in two rolls is 11/36ths. This is 1/36th less than 2/6ths, and 1/36th is the probability of getting two sixes. Similar (although more complicated) logic applies to rolling it three times, for which the probability of getting at least one 6 is 91/216ths (not 108/216ths, as the naive approach would imply). As others (CFoxx) have pointed out, if you roll a die 6 times, there is still a chance you won't get any sixes. If you roll it a million times, it is still possible (albeit very, very, very unlikely) that you wouldn't get any sixes! As far as the 50% and 16.67% figures given by the original poster, I believe those were calculated for events that have a 50% probability for each event. The increase in probability from 1 to 2 events where 1/x is the probability looks like (1-(1-1/x)^2)/(1/x)-1, which is (1-(1-2/x+1/x^2))*x-1 or (2/x-1/x^2)*x-1 or (2-1/x)-1 or 1-1/x. Thus for an event like a fair coin toss, the increase in probability for two tosses over one toss is 1/2. For a 6-sided die, the increase in probability is 5/6th. For a 1/billion, the increased probability for one or more occurrence for two events compared with one event is 0.999999999. Finally, the probability of the second event being the desired event is always the same. It is unchanged by the first event. It is the probability of either (or both) of the events being desired that we are calculating here. If the first die roll is a six, the probability of the second being a six is still 1/6. If the first die roll is not a six, the probability of the second being a six is still 1/6 (assuming a fair die). But the probability of either or both being a six is the absence of any information about the two rolls is not 2/6, but rather 11/36! 126.96.36.199 17:06, 21 August 2013 (UTC)Toby Ovod-Everett
I shared this comic with risk-assessor friends in Massachusetts and got the following responses:
"Tee-hee. If you change the beach to Chatham, however, it's just not as funny!" (Cape Cod beaches have new signs warning of great white shark attacks: http://www.bostonglobe.com/magazine/2013/08/17/chatham-bold-attempt-become-new-england-great-white-shark-capital/TtfcEZsAo6PN7lUoBKe1kO/story.html)
"Or in our line of work, we worry (in MA) if the risk of cancer is 0.00002 but not if it is 0.00001 or less, which, as the base rate of cancer is around 40%, means that we're worried about a cancer incidence rate of 0.40002 but not 0.40001. And one could almost argue that it'd be pretty hard to distinguish these two, and even that if we presented risks in this form to the general public, they might wonder why we're so concerned..."
"Makes you wonder what the risk was for that Marlin coming on board that boat in Florida - http://www.wfla.com/story/23239959/350-pound-marlin-jumps-in-boat-landing-on-crew?"
I guess it all depends on your point of view. One might argue that the "gambler's fallacy" is the primary driver of lottery income, which, according to the North American Association of State and Provincial Lotteries: "During fiscal year 2012 (which for most jurisdictions ended June 30) U.S. lottery sales totaled $78 billion ($US). Canadian sales reached $9.3 billion ($Can)." (http://www.naspl.org/index.cfm?fuseaction=content&menuid=14&pageid=1020). Is "Remember to Play all Lottery Games Responsibly" an oxymoron? -- Hoopy (talk) (please sign your comments with ~~~~)
I am troubled with this paragraph: "This also can be illustrated by coin flips: if one flips a coin 10 times in a row, no matter what the result of each previous flip is (even if it were nine heads in a row), the odds of getting heads on the next coin flip remains 50%. In other words, past experience does not impact subsequent flips."
This paragraph does not specify the use of a fair coin. If 9 flips all come up heads, then there is strong statistical evidence that the probability of getting a head in a flip is not 50% (P=1/2^9=1/512~0.2%). It is still true that "past experience does not impact subsequent flips", but in this case, our judgment about the true probability should change in light of new data. 188.8.131.52 10:27, 13 May 2014 (UTC)
Just a note, (may have been mentioned) the third trip has the same odds as trip one and two, the odds do not increase with past results (not that it matters with such low odds). 184.108.40.206 (talk) (please sign your comments with ~~~~)