Difference between revisions of "1725: Linear Regression"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Explanation: +wikilinks, more detail)
Line 13: Line 13:
 
{{w|Constellation}}s are patterns created by linking the apparent positions of stars. One could create fake constellations by connecting assorted points.
 
{{w|Constellation}}s are patterns created by linking the apparent positions of stars. One could create fake constellations by connecting assorted points.
  
In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is a slightly significant correlation between the two. However, the data points are so widely-scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, [[Randall]] suggests we should be suspicious of any conclusions drawn from this data.
+
In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is insignificant correlation between the two. However, the data points are so widely-scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, [[Randall]] suggests we should be suspicious of any conclusions drawn from this data.
  
 
==Transcript==
 
==Transcript==

Revision as of 14:50, 26 August 2016

Linear Regression
The 95% confidence interval suggests Rexthor's dog could also be a cat, or possibly a teapot.
Title text: The 95% confidence interval suggests Rexthor's dog could also be a cat, or possibly a teapot.

Explanation

Linear regression is a method for modeling the relationship between two sets of data, assuming that the two have a linear correlation (as opposed to, say, a quadratic correlation or no correlation whatsoever). The model determines a "best-fit" line through a scatter plot of the datasets, together with a coefficient of determination, usually denoted r2 or R2. This is a number between 0 and 1, which indicates how close the points are to lying on a line. A value of 1 means perfect correlation, while values close to 0 indicate little or no correlation.

Constellations are patterns created by linking the apparent positions of stars. One could create fake constellations by connecting assorted points.

In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is insignificant correlation between the two. However, the data points are so widely-scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, Randall suggests we should be suspicious of any conclusions drawn from this data.

Transcript

Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.

panel 1:a scatter plot with many dots and a slightly rising like is presented. the line is red, and the bottom of the panel reads R2=0.06 in red.

panel 2: the same plot, but with new red lines making a constellation of a stick man holding a dog/teapot/cat/???.

It says at the bottom of this panel "REXTHOR,THE DOG-BEARER. beneath both panels is the text "I don't trust linear regressions when its harder to guess the direction of the correlation from the scatter plot than to find new constellations in it."

the title text reads"The 95% confidence interval suggests Rexthor's dog could also be a cat, or possibly a teapot."

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

It also seems likely that the teapot refers to the Utah Teapot (https://en.wikipedia.org/wiki/Utah_teapot). It was one of the first complex 3D objects defined for CGI rendering, and has seen countless uses since. Notably in the Pipes screensaver, and early SIGGRAPH papers where it was rendered along side the 5 platonic solids as if it belonged with them. Dkfenger (talk) 17:10, 26 August 2016 (UTC)

I'm not sure I follow. How do you reach that conclusion? Given that the concept of constellations (and thus stars) is clearly shown in the comic, it seems much more likely to me that he was referring to Russell's Teapot and not to a computer rendering (if there was any reference at all). The fact that that shape could abstractly resemble a teapot may be all that there is to it. :) KieferSkunk (talk) 18:06, 26 August 2016 (UTC)

I think that the teapot is a reference to the constellation Sagittarius. This seems most likely to me as the reference is to a constellation that looks like a teapot despite ostensibly being something else. Sagittarius is a constellation that is supposed to be an archer, but many people see it as a teapot instead. (http://www.space.com/30274-constellation-sagittarius-archer-dipper-teapot.html) Harperska (talk) 19:27, 26 August 2016 (UTC)

I think it looks like a alcohol drink with the little umbrella sticking out. Mikemk (talk) 06:25, 27 August 2016 (UTC)

Based on what is an R^2 value of 0.06 significant??? I'm removing that. Djbrasier (talk) 20:59, 26 August 2016 (UTC)

Oops, misread it! I read "insignificant" as "significant". Djbrasier (talk) 21:00, 26 August 2016 (UTC)

The teapot mention may just be a joke, not a reference. 141.101.98.114 (talk) (please sign your comments with ~~~~)

did someone check if it really was a Rsquared of 0,06?141.101.104.67 20:56, 27 August 2016 (UTC)

Asuming the top left of the image as 0/0 and measuring in pixels I get f(x)=-0,135x + 124,8 with R²=0,0197, calcuated with LibreOffice. The line in the image has f(x)=-0,094x+125. If I change a single point by one or two the R² value varies from 0,0195 to 0,0199. If I substract 10% of the x value from the y value R² increases to 0,0574. So I think R²=0,06 is a little bit inaccurate, but not completely wrong. --162.158.83.228 19:01, 2 September 2016 (UTC)
I think R^2 = 6% is very inaccurate if the true R^2 = 2%. 108.162.219.56 00:07, 3 September 2016 (UTC)

Does anybody know of any real-world examples of a similarly low R^2 given in genuine research? It would be worth mentioning their existence if we can find one. Cosmogoblin (talk) 18:03, 28 August 2016 (UTC)

In published research? I don't recall any. In submissions for review? At least twice. And of course one case where this comic could and should be used as an educational drawing - student reports, master's theses, etc. I've seen "conclusions" drawn from weaker data in those, far too many times for my mental health...--162.158.86.119 09:32, 30 August 2016 (UTC)

Rex is also Latin for king, which may be related in the context of constellations. 172.68.11.81 (talk) (please sign your comments with ~~~~)

This is irrelevant to the humor of the comic, but I fixed the paragraph on confidence intervals because it contained at least three misinterpretations (I have a MSc in statistics). The phrasing can be improved if needed. Don't worry though, even experienced statisticians get it wrong sometimes... 162.158.234.40 09:43, 10 April 2018 (UTC)