Difference between revisions of "552: Correlation"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
m (categories added)
m (Explanation: wiki link)
(18 intermediate revisions by 15 users not shown)
Line 1: Line 1:
 
{{comic
 
{{comic
| number    = 0552
+
| number    = 552
 
| date      = March 6, 2009
 
| date      = March 6, 2009
 
| title    = Correlation
 
| title    = Correlation
 
| image    = Correlation.png
 
| image    = Correlation.png
| imagesize =
 
 
| titletext = Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.
 
| titletext = Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.
 
}}
 
}}
  
== Description ==
+
==Explanation==
 +
This comic focuses on the apparent difficulty people have in understanding the difference between {{w|Correlation and dependence|correlation}} and {{w|Causality|causation}}. When two variables (like blood cholesterol levels and heart disease) are positively correlated, it means that as one variable increases so does the other, whereas a negative correlation means that as one variable increases, the other decreases. The human brain is very good at seeing patterns and deducing rules, and the seemingly natural conclusion is that that the one is leading to the other. In the example, that high blood cholesterol causes heart disease.
  
This comic focuses on the difficulty of many people to grasp the difference between {{w|correlation}} and {{w|causality|causation}}. When two variables (like death and age) are highly correlated, many often make the assumption that one is leading to the other. However, this is not always the case. Take for example a scenario where the number of people carrying umbrellas and the likelihood of rain are highly correlated. Here it would seem ridiculous to believe that carrying an umbrella makes it more likely to rain. This is because when two variables are correlated it does not provide evidence that one variable has caused the other. They are merely correlated, or their trends move in relation to each other. A positive correlation would mean that as one variable increases so does the other, while a negative correlation means that as one variable increases the other decreases.  
+
This may well be true.  The positive correlation is certainly not an argument '''against''' such a conclusion. But it is only one type of evidence, and is certainly not proof.
  
In this situation [[Cueball]] is explaining to Megan his realization that correlation is not the same thing as causation. He further explains that his belief changed after taking a statistics class. [[Megan]], however, then makes the seemingly obvious leap and declares that his realization was the result of taking the statistics course. Cueball’s final response of “Well, Maybe.is fitting because there is no way to know if the statistics class caused his opinion to change or, instead, the two are merely correlated, as many variables would have changed during that semester, each of which could have potentially influenced his view of the topic. In order to determine causation a control group is required, which experiences all of the same variables as the experimental group minus the one variable that you believe is responsible for the change.
+
The relationship between diet and blood chemistry and heart disease is a complex one, but simpler examples abound. For example, if you tallied the sales of sunglasses and incidence of skin cancer by region, you would probably find that there is a high positive correlation. That is, in locations where many people buy sunglasses, there are also many cases of skin cancer. Here it would seem silly to believe that wearing sunglasses can cause skin cancer, but this is exactly the same thinking that allowed us to conclude that blood cholesterol causes heart disease.  Correlations do have the ability to mislead us.   In this example, both sunglasses and skin cancer are directly affected by a third factor (specifically, a climate where many people expose themselves to the sun).
  
The image text is referring to the idea that while {{w|correlation does not imply causation|correlation does not mean causation}}, it does often enough that it makes the distinction blurry for non-scientists. For example, in this case the statistics course is a likely candidate for leading to his change in knowledge. Well, maybe.
+
In essence, when two variables are correlated it does not provide evidence that one variable has caused the other. All it says is that their trends move in relation to each other. The correlation could be due to causality, but it could equally be due to other factors, or it could even be a random result.
  
{{Comic discussion}}
+
In this situation [[Cueball]] is explaining to Megan his realization that correlation is not the same thing as causation. He further explains that his belief changed after taking a {{w|statistics}} class. [[Megan]], then makes the seemingly obvious leap and declares that his realization was the result of taking the statistics course. Cueball's final response of "Well, Maybe." is a joke on Megan's behalf. Of course Cueball would know whether his new knowledge is caused by the course, but he points out that Megan can't be certain about the causation.
  
 +
The title text plays on two meanings of the word ''imply'': have as consequence, or insinuate. In the statement {{w|correlation does not imply causation}}, ''correlation'' is here seen as a person, giving you subtle hints where to look for the cause. This is a metaphor for research, where the correlation must be investigated further, perhaps in a wider scope or with the consideration of more variables, so that the reason for it is understood. For example, {{w|Barry Marshall}} and {{w|Robin Warren}} noticed that the presence of ''{{w|Helicobacter pylori}}'' was highly correlated with duodenal ulcer patients.  They investigated further.  Result:  the Nobel Prize in Medicine.
 +
 +
In addition, the title text's reference to waggling eyebrows and gesturing furtively while mouthing "look over there" is clearly a reference to the movie {{w|Ferris Bueller's Day Off}}, in which the character of Cameron Frye tries to alert Ferris that Ferris's father is in the next cab over, and they are about to be discovered ditching school. What Randall is saying with this reference is that Correlation (if it were a character in a movie) is desperately trying to draw attention to Causation without openly stating this intention, and perhaps that correlation is a good place to start when looking for causation.
 +
 +
==Transcript==
 +
:[Cueball is talking to Megan.]
 +
:Cueball: I used to think correlation implied causation.
 +
 +
:[Cueball lift his hand while continuing to talk to Megan.]
 +
:Cueball: Then I took a statistics class. Now I don't.
 +
 +
:[Back to the same situation as the first frame.]
 +
:Megan: Sounds like the class helped.
 +
:Cueball: Well, maybe.
 +
 +
{{comic discussion}}
 
[[Category:Comics featuring Cueball]]
 
[[Category:Comics featuring Cueball]]
 
[[Category:Comics featuring Megan]]
 
[[Category:Comics featuring Megan]]
 +
[[Category:Statistics]]

Revision as of 19:54, 13 June 2015

Correlation
Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.
Title text: Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

Explanation

This comic focuses on the apparent difficulty people have in understanding the difference between correlation and causation. When two variables (like blood cholesterol levels and heart disease) are positively correlated, it means that as one variable increases so does the other, whereas a negative correlation means that as one variable increases, the other decreases. The human brain is very good at seeing patterns and deducing rules, and the seemingly natural conclusion is that that the one is leading to the other. In the example, that high blood cholesterol causes heart disease.

This may well be true. The positive correlation is certainly not an argument against such a conclusion. But it is only one type of evidence, and is certainly not proof.

The relationship between diet and blood chemistry and heart disease is a complex one, but simpler examples abound. For example, if you tallied the sales of sunglasses and incidence of skin cancer by region, you would probably find that there is a high positive correlation. That is, in locations where many people buy sunglasses, there are also many cases of skin cancer. Here it would seem silly to believe that wearing sunglasses can cause skin cancer, but this is exactly the same thinking that allowed us to conclude that blood cholesterol causes heart disease. Correlations do have the ability to mislead us. In this example, both sunglasses and skin cancer are directly affected by a third factor (specifically, a climate where many people expose themselves to the sun).

In essence, when two variables are correlated it does not provide evidence that one variable has caused the other. All it says is that their trends move in relation to each other. The correlation could be due to causality, but it could equally be due to other factors, or it could even be a random result.

In this situation Cueball is explaining to Megan his realization that correlation is not the same thing as causation. He further explains that his belief changed after taking a statistics class. Megan, then makes the seemingly obvious leap and declares that his realization was the result of taking the statistics course. Cueball's final response of "Well, Maybe." is a joke on Megan's behalf. Of course Cueball would know whether his new knowledge is caused by the course, but he points out that Megan can't be certain about the causation.

The title text plays on two meanings of the word imply: have as consequence, or insinuate. In the statement correlation does not imply causation, correlation is here seen as a person, giving you subtle hints where to look for the cause. This is a metaphor for research, where the correlation must be investigated further, perhaps in a wider scope or with the consideration of more variables, so that the reason for it is understood. For example, Barry Marshall and Robin Warren noticed that the presence of Helicobacter pylori was highly correlated with duodenal ulcer patients. They investigated further. Result: the Nobel Prize in Medicine.

In addition, the title text's reference to waggling eyebrows and gesturing furtively while mouthing "look over there" is clearly a reference to the movie Ferris Bueller's Day Off, in which the character of Cameron Frye tries to alert Ferris that Ferris's father is in the next cab over, and they are about to be discovered ditching school. What Randall is saying with this reference is that Correlation (if it were a character in a movie) is desperately trying to draw attention to Causation without openly stating this intention, and perhaps that correlation is a good place to start when looking for causation.

Transcript

[Cueball is talking to Megan.]
Cueball: I used to think correlation implied causation.
[Cueball lift his hand while continuing to talk to Megan.]
Cueball: Then I took a statistics class. Now I don't.
[Back to the same situation as the first frame.]
Megan: Sounds like the class helped.
Cueball: Well, maybe.


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

It is stated that Cueball is doubting, due to his newly found sceptism, which I believe is incorrect.

By stating that "the class helped", Megan is inferring there is a causal relation between Cueball taking a statistics class and him no longer believing correlation implies causation. However, Cueball is replying "well maybe" to indicate there is only a correlation between them, showing he correctly understood the distinction. 173.245.53.104 15:29, 13 May 2014 (UTC)

Another interpretation that explains why the comic is funny is as follows. Cueball replies "well maybe" because he has learned not to infer causation from correlation. But it is also clear that the statistics class caused him to think this way. This pokes fun at a tendency to apply the principle that correlation does not imply causation even when there is direct evidence for causation. 173.245.52.127

The way I see it, it's a paradox. -- The Cat Lady (talk) 22:14, 16 August 2021 (UTC)