Editing 1725: Linear Regression

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 10: Line 10:
 
{{w|Linear regression}} is a method for modeling the relationship between multiple variables. In the simplest case, it can be used for two variables wherein the model determines a "{{w|least squares|best-fit}}" line through a {{w|scatter plot}} of the datasets, together with a {{w|coefficient of determination}}, usually denoted ''r''<sup>2</sup> or ''R''<sup>2</sup>. When only two variables are included in the regression, ''R''<sup>2</sup> is merely the square of the correlation between the two variables. ''R''<sup>2</sup> is a number between 0 and 1 that indicates how well one variable can be used to predict the value of another. A value of 1 means perfect correlation, while a value close to 0 indicates a weak relationship between the variables.
 
{{w|Linear regression}} is a method for modeling the relationship between multiple variables. In the simplest case, it can be used for two variables wherein the model determines a "{{w|least squares|best-fit}}" line through a {{w|scatter plot}} of the datasets, together with a {{w|coefficient of determination}}, usually denoted ''r''<sup>2</sup> or ''R''<sup>2</sup>. When only two variables are included in the regression, ''R''<sup>2</sup> is merely the square of the correlation between the two variables. ''R''<sup>2</sup> is a number between 0 and 1 that indicates how well one variable can be used to predict the value of another. A value of 1 means perfect correlation, while a value close to 0 indicates a weak relationship between the variables.
  
βˆ’
A constellation is a pattern created by linking the apparent positions of stars as seen in the sky from Earth. (Astronomers, in technical contexts, usually refer to these as {{w|Asterism_(astronomy)|asterisms}}, reserving "{{w|Constellation_(astronomy)|constellations}}" for the 88 regions into which the sky is divided, each named for the most prominent asterism it contains, although "constellation" is used informally in place of "asterism" by even seasoned astronomers.) Different civilizations have recognized different constellations, and one could create their own constellations by connecting assorted points, the way Randall connected points in his plot to make "Rexthor."
+
A constellation is pattern created by linking the apparent positions of stars as seen in the sky from Earth. (Astronomers, in technical contexts, usually refer to these as {{w|Asterism_(astronomy)|asterisms}}, reserving "{{w|Constellation_(astronomy)|constellations}}" for the 88 regions into which the sky is divided, each named for the most prominent asterism it contains, although "constellation" is used informally in place of "asterism" by even seasoned astronomers.) Different civilizations have recognized different constellations, and one could create their own constellations by connecting assorted points, the way Randall connected points in his plot to make "Rexthor."
  
 
In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is low correlation between the two. The data points are so widely scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, [[Randall]] suggests we should be suspicious of any conclusions drawn from this data.
 
In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is low correlation between the two. The data points are so widely scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, [[Randall]] suggests we should be suspicious of any conclusions drawn from this data.
Line 16: Line 16:
 
The comic is somewhat misleading, since the data in the graph actually has an ''R''<sup>2</sup> of 0.02, only a third of what Randall claims.  An example of published research with an ''R''<sup>2</sup> of 0.06 where the association in the graph is noticeable (if not strong) can be found [http://www.i-jmr.org/2012/1/e1/ here] (figure 2 has ''r'' = 0.25 which corresponds to ''R''<sup>2</sup> = 0.06). In addition, it is hard to see the association in the comic's graph because relatively few points are plotted. In a data set with 1000 observations and ''R''<sup>2</sup> = 0.06, any association between the two variables would be quite clear.
 
The comic is somewhat misleading, since the data in the graph actually has an ''R''<sup>2</sup> of 0.02, only a third of what Randall claims.  An example of published research with an ''R''<sup>2</sup> of 0.06 where the association in the graph is noticeable (if not strong) can be found [http://www.i-jmr.org/2012/1/e1/ here] (figure 2 has ''r'' = 0.25 which corresponds to ''R''<sup>2</sup> = 0.06). In addition, it is hard to see the association in the comic's graph because relatively few points are plotted. In a data set with 1000 observations and ''R''<sup>2</sup> = 0.06, any association between the two variables would be quite clear.
  
βˆ’
The lines connecting the stars in this "constellation" create a crude illustration of a person with an outstretched arm holding up a dog, which could be a reference to the film {{w|Life is Beautiful}} where a waiter carries a dog on his tray without realizing. The name "Rexthor the Dog Bearer" spoofs the fact that numerous Greek-derived constellation names have both a proper name and an epithet (for example, "Orion, the Hunter"). The fact that "Rex" is an archetypal dog name (but also meaning {{w|Rex (title)|king}} as in king of the dinosaurs <i>Tyrannosaurus rex</i>), adds to the humor.
+
The lines connecting the stars in this "constellation" create a crude illustration of a person with an outstretched arm holding up a dog, which could be a reference to the film {{w|Life is Beautiful}} where a waiter carries a dog on his tray without realizing. The name "Rexthor the Dog Bearer" spoofs the fact that numerous Greek-derivef constellation names have both a proper name and an epithet (for example, "Orion, the Hunter"). The fact that "Rex" is an archetypal dog name (but also meaning {{w|Rex (title)|king}} as in king of the dinosaurs <i>Tyrannosaurus rex</i>), adds to the humor.
  
 
The 95% {{w|confidence interval}} in statistics is such a range of an estimate, that it is expected to contain the real value (the estimated population parameter) 95% of the time. The confidence interval is a standard method to provide evaluation of the estimation error in statistics. On the right panel the resulting estimate seems to be a drawing, so the 95% confidence interval would be a set of drawings, expected to contain the correct drawing in 95% of samples where it is calculated. According to the title text, the interval in this particular sample also includes a cat and a teapot, so we can only make extremely vague statements in order to maintain 95% confidence.
 
The 95% {{w|confidence interval}} in statistics is such a range of an estimate, that it is expected to contain the real value (the estimated population parameter) 95% of the time. The confidence interval is a standard method to provide evaluation of the estimation error in statistics. On the right panel the resulting estimate seems to be a drawing, so the 95% confidence interval would be a set of drawings, expected to contain the correct drawing in 95% of samples where it is calculated. According to the title text, the interval in this particular sample also includes a cat and a teapot, so we can only make extremely vague statements in order to maintain 95% confidence.

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)