Editing 2560: Confounding Variables
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 8: | Line 8: | ||
==Explanation== | ==Explanation== | ||
− | + | {{incomplete|Created by a MISLEADING DATASET - Please change this comment when editing this page. Do NOT delete this tag too soon.}} | |
− | |||
− | |||
In statistics, a ''confounding variable'' is a third variable that's related to the independent variable, and also causally related to the dependent variable. An example is that you see a correlation between sunburn rates and ice cream consumption; the confounding variable is temperature: high temperatures cause people go out in the sun and get burned more, and also eat more ice cream. | In statistics, a ''confounding variable'' is a third variable that's related to the independent variable, and also causally related to the dependent variable. An example is that you see a correlation between sunburn rates and ice cream consumption; the confounding variable is temperature: high temperatures cause people go out in the sun and get burned more, and also eat more ice cream. | ||
− | One way to control for a confounding variable by restricting your | + | One way to control for a confounding variable by restricting your dataset to samples with the same value of the confounding variable. But if you do this too much, your choice of that "same value" can produce results that don't generalize. Common examples of this in medical testing are using subjects of the same sex or race -- the results may only be valid for that sex/race, not for all people. |
− | There can also often be multiple confounding variables. It may be difficult to control for all of them without narrowing down your | + | There can also often be multiple confounding variables. It may be difficult to control for all of them without narrowing down your dataset so much that it's not useful. So you have to choose which variables to control for, and this choice biases your results. |
− | In the final panel, | + | In the final panel, Blondie suggests a sweet spot in the middle, where both confounding variables and your control impact the end result, thus making you "doubly wrong". "Doubly wrong" result would simultaineously display wrong correlations (not enough of controlled variables) and be too narrow to be useful (too many controlled variables). |
− | Finally she admits that no matter what you do the results will be misleading, so statistics are useless | + | Finally she admits that no matter what you do the results will be misleading, so statistics are useless. |
− | In the title text, the ''residual'' refers to the difference between any particular data point and the graph that's supposed to describe the overall relationship. The collection of all residuals is used to determine how well the | + | In the title text, the ''residual'' refers to the difference between any particular data point and the graph that's supposed to describe the overall relationship. The collection of all residuals is used to determine how well the curve fits the data. If you control for this by selecting only points with the same residual you'll get a perfect correlation, but the results are meaningless because you're ignoring all the data points that don't agree with your hypothesis. |
==Transcript== | ==Transcript== | ||
− | :[ | + | {{incomplete transcript|Do NOT delete this tag too soon.}} |
− | : | + | :[Blondie is holding a pointer and pointing at a board with the word Statistics and with some graphs] |
− | + | :Blondie: If you don't control for confounding variables, they'll mask the real effect and mislead you. | |
− | + | :[Just Blondie, still holding the pointer, with her finger in the air] | |
− | :[ | + | :But if you control for too MANY variables, your choices will shape the data and you'll mislead yourself. |
− | + | :[Blondie with the pointer to her side] | |
− | + | :Somewhere in the middle is the sweet spot where you do both, making you doubly wrong. | |
− | :[ | + | Stats are a farce and truth is unknowable. See you next week! |
− | |||
− | |||
− | |||
{{comic discussion}} | {{comic discussion}} | ||
− | + | [[Category: Statistics]] | |
− | + | [[Category:Comics featuring Blondie]] | |
− | [[Category:Statistics]] | ||
− | [[Category: |