1725: Linear Regression
Title text: The 95% confidence interval suggests Rexthor's dog could also be a cat, or possibly a teapot.
Linear regression is a method for modeling the relationship between multiple variables. In the simplest case, it can be used for two variables wherein the model determines a "best-fit" line through a scatter plot of the datasets, together with a coefficient of determination, usually denoted r2 or R2. When only two variables are included in the regression, R2 is merely the square of the correlation between the two variables. R2 is a number between 0 and 1 that indicates how well one variable can be used to predict the value of another. A value of 1 means perfect correlation, while a value close to 0 indicates a weak relationship between the variables.
Asterisms are patterns created by linking the apparent positions of stars as seen in the sky from Earth. Strictly, "Rexthor" is an asterism, as a constellation is the region of sky containing the asterism, although "constellation" is used informally in place of "asterism" by even seasoned astronomers. Different civilizations have recognized different constellations (the modern IAU, for example, lists 88 "official" constellations), and one could create their own constellations by connecting assorted points.
In this comic, a set of data has had linear regression and some form of statistical analysis applied to it, indicating that there is low correlation between the two. The data points are so widely scattered that (as noted in the comic) it is easier to connect the data points in a constellation-like pattern than it is to determine whether the correlation is negative or positive (without looking at the trendline, of course). Because of this, Randall suggests we should be suspicious of any conclusions drawn from this data.
"Rexthor the Dog bearer" seems to be a spoof on Thor, a Norse god who wields a hammer. By replacing his hammer with a dog and adding "Rex" (an archetypal dog name), Randall creates a comical, dog-bearing version of Thor.
The 95% confidence interval in statistics is such a range of an estimate, that the probability of the real value (the estimated population parameter) to lie inside the range is at least 95%. The confidence interval is a standard method to provide evaluation of the estimation error in statistics. On the right panel the resulting estimate seems to be a drawing, so the 95% confidence interval would be a set of all drawings derived from the sample such that the probability of the right drawing to be among them is at least 95%. According to the title text among these drawings you can find a cat and a teapot as well, so we can't be 95% confident that a cat exists in the data.
The teapot may be a reference to Russell's teapot, or possibly to the "teapot" asterism in the constellation Sagittarius.
- [Two square panels show identical sets of scattered black dots, with only the red additions being different.]
- [The left panel shows a slightly rising red line drawn through the middle of the panel, passing near a few dots but not obviously related to most of them. A red text is below the dots:]
- [The right panel shows many of the dots connected by red lines to form a stick figure of a man resembling the constellation Orion, with the hand on the reader's right raised and holding an object. A red text is below the dots:]
- Rexthor, the Dog-Bearer
- [A caption is below and spanning both panels:]
- I don't trust linear regressions when it's harder to guess the direction of the correlation from the scatter plot than to find new constellations on it.
add a comment! ⋅ add a topic (use sparingly)! ⋅ refresh comments!