Difference between revisions of "Main Page"
(→Getting started: hide parts of section. Please wait until we have a consent at the community portal.) 
(→New here?) 

(45 intermediate revisions by 5 users not shown)  
Line 1:  Line 1:  
−  ''Welcome to the  +  __NOTOC__{{DISPLAYTITLE:explain xkcd}} 
+  <center>  
+  <font size=5px>''Welcome to the '''explain [[xkcd]]''' wiki!''</font>  
−  +  We have collaboratively explained [[:Category:Comics'''{{#expr:{{PAGESINCAT:Comics}}9}}''' xkcd comics]],  
+  <! Note: the 9 in the calculation above is to discount subcategories (there are 8 of them as of 20130227),  
+  as well as [[List of all comics]], which is obviously not a comic page. >  
+  and only {{#expr:{{LATESTCOMIC}}({{PAGESINCAT:Comics}}9)}}  
+  ({{#expr: ({{LATESTCOMIC}}({{PAGESINCAT:Comics}}9)) / {{LATESTCOMIC}} * 100 round 0}}%)  
+  remain. '''[[Help:How to add a new comic explanationAdd yours]]''' while there's a chance!  
+  </center>  
+  == Latest comic ==  
+  <div style="border:1px solid grey; background:#eee; padding:1em;">  
+  <span style="float:right;">[[{{LATESTCOMIC}}'''Go to this comic explanation''']]</span>  
+  <br clear="right">  
+  {{:{{LATESTCOMIC}}}}  
+  {{#ifexist:Talk:{{LATESTCOMIC}}<h2>Discussion</h2>  
+  {{Talk:{{LATESTCOMIC}}}}  
+  }}</div>  
−  +  <small>''Is this out of date? {{PurgeClicking here will fix that}}.''</small>  
−  
−  +  == New here? ==  
+  <div style="float:right; margin: 0 0 1em 1em">{{Special:ContributionScores/10/7/nosort,notools}}<div style="fontsize:0.85em; width:25em; fontstyle:italic">[[Special:ContributionScoresLots of people]] contribute to make this wiki a success. Many of the recent contributors, listed above, have just joined. You can do it too! Create your account [[Special:UserLogin/signuphere]].</div></div>  
−  +  You can read a brief introduction about this wiki at [[explain xkcd]]. Feel free to sign up for an account and contribute to the wiki! We need explanations for comics, characters, themes, memes and everything in between. If it is referenced in an [[xkcd]] web comic, it should be here.  
−  
−  If you  +  * If you're new to wikis like this, take a look at these help pages describing [[mw:Help:Navigationhow to navigate]] the wiki, and [[mw:Help:Editing pageshow to edit]] pages. 
−  +  * Discussion about various parts of the wiki is going on at [[Explain XKCD:Community portal]]. Share your 2¢!  
−  
−  +  * [[List of all comics]] contains a complete table of all xkcd comics so far and the corresponding explanations. The red links ([[like this]]) are missing explanations. Feel free to help out by creating them! '''[[Help:How to add a new comic explanationHere's how]]'''.  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
== Rules ==  == Rules ==  
−  Don't be a jerk.  +  Don't be a jerk. There are a lot of comics that don't have set in stone explanations; feel free to put multiple interpretations in the wiki page for each comic. 
−  If you want to talk about a specific comic,  +  If you want to talk about a specific comic, use its discussion page. 
−  Please only submit material directly related to  +  Please only submit material directly related to —and helping everyone better understand— xkcd... and of course ''only'' submit material that can legally be posted (and freely edited.) Offtopic or other inappropriate content is subject to removal or modification at admin discretion, and users who repeatedly post such content will be blocked. 
−  If you  +  If you need assistance from an admin, feel free to leave a message on their personal discussion page. The list of admins is [[Special:ListUsers/sysophere]]. 
−  +  [[Category:Root category]] 
Revision as of 21:16, 24 March 2013
Welcome to the explain xkcd wiki!
We have collaboratively explained 6 xkcd comics, and only 2042 (100%) remain. Add yours while there's a chance!
Latest comic
CurveFitting 
Title text: CauchyLorentz: "Something alarmingly mathematical is happening, and you should probably pause to Google my name and check what field I originally worked in." 
Explanation
This explanation may be incomplete or incorrect: Please edit the explanation below and only mention here why it isn't complete. Do NOT delete this tag too soon. 
An illustration of several plots of the same data with curves fitted to the points, paired with conclusions that you might draw about the person who made them. This data, when plotted on an X/Y graph, looks somewhat random and there is a desire or need to determine some kind of pattern. With some kinds of data the pattern can be visually obvious, and perhaps a straight or diagonal line, represented by a simple mathematical formula, hits or comes very near hitting all the points. In other cases where it's not as intuitively obvious, one begins to look for more sophisticated mathematical formulas that appear to fit the data, in order to be able to extrapolate other data that wasn't in the initial sampling.
When modeling such a problem statistically, it is common to search for trends, and fitted curves can help reveal these trends. Much of the work of a data scientist or statistician is knowing which fitting method to use for the data in question. Here we see various hypothetical scientists or statisticians each applying their own interpretations, and the comic mocks each of them for their various personal biases or other assorted excuses. In general, the researcher will specify the form of an equation for the line to be drawn, and an algorithm will produce the actual line.
Nonetheless scientists work much more seriously on the reliability of their assumptions by giving a value for the standard deviation represented by the Greek letter sigma σ or the Latin letter s as a measure to quantify the amount of variation of the data points against the presented best fit. If the σvalue isn't good enough an interpretation based on a specific fit wouldn't be accepted by the science community.
Linear
Linear regression is the most basic form of regression; it tries to find the straight line that best approximates the data. As it's the simplest, most widely taught form of regression, and in general derivable function are locally well approximated by a straight line, it's usually the first and most trivial attempt of fit.
The picture to the right shows how totally different data sets can result into the same line. It's obvious that some more basics about the nature of the data must be used to understand if this simple line really does make sense.
Quadratic
Quadratic fit (i.e. fitting a parabola through the data) is the lowest grade polynomial that can be used to fit data through a curved line; if the data exhibits clearly "curved" behavior (or if the experimenter feels that its growth should be more than linear), a parabola is often the first stab at fitting the data.
Logarithmic
A logarithmic curve is typical of a phenomenon whose growth gets slower and slower as time passes (indeed, its derivative  i.e. its growth rate  is for ), but still grows without bound rather than approaching a horizontal asymptote. (If it did approach a horizontal asymptote, then one of the other models subtracted from a constant would probably be better, e.g. or .) If the experimenter wants to find confirmation of this fact, they may try to fit a logarithmic curve. Comment: either you use a or you use the base b of the logarithm, but not both. They are redundant. The model has only two parameters
Exponential
An exponential curve, on the contrary, is typical of a phenomenon whose growth gets rapidly faster and faster  a common case is a process that generates stuff that contributes to the process itself, think bacteria growth or compound interest.
The logarithmic and exponential interpretations could very easily be fudged or engineered by a researcher with an agenda (such as by taking a misleading subset or even outright lying about the regression), which the comic mocks by juxtaposing them sidebyside on the same set of data.
LOESS
(notice: this is just the function used for the weights, not the actually fitted curve formula, as it's a piecewise polynomial)
A LOESS fit doesn't use a single formula to fit all the data, but approximates data points locally using different polynomials for each "zone" (weighting differently data points as they get further from it) and patching them together. As it has much more degrees of freedom compared to a single polynomial, it generally "fits better" to any data set, although it is generally impossible to derive any strong, "clean" mathematical correlation from it  it is just a nice smooth line that approximates well the data points, with a good degree of rejection from outliers.
Linear, No Slope
Apparently, the person making this line figured out pretty early on that their data analysis was turning into a scatter plot, and wanted to escape their personal stigma of scatter plots by drawing an obviously false regression line on top of it. Alternatively, they were hoping the data would be flat, and are trying to pretend that there's no real trend to the data by drawing a horizontal trend line.
Logistic
The logistic regression is taken when a variable can take binary results such as "0" and "1" or "old" and "young".
The curve provides a smooth, Sshaped transition curve between two flat intervals (like "0" and "1"); indeed the caption says that the experimenter just wants to find a mathematicallyrespectable way to link two flat lines.
Confidence Interval
Not a type of curve fitting, but a method of depicting the predictive power of a curve.
Providing a confidence interval over the graph shows the uncertainty of the acquired data, thus acknowledging the uncertain results of the experiment, and showing the will not to "cheat" with "easy" regression curves.
Piecewise
Mapping different curves to different segments of the data. This is a legitimate strategy, but the different segments should be meaningful, such as if they were pulled from different populations.
This kind of fit would arise naturally in a study based on a regression discontinuity design. For instance, if students who score below a certain cutoff must take remedial classes, the line for outcomes of those below the cutoff would reasonably be separate from the one for outcomes above the cutoff; the distance between the end of the two lines could be considered the effect of the treatment, under certain assumptions. This kind of study design is used to investigate causal theories, where mere correlation in observational data is not enough to prove anything. Thus, the associated text would be appropriate; there is a theory, and data that might prove the theory is hard to find.
Connecting lines
Not useful whatsoever, but it looks nice! It can be caused by overfitting to the data set or not using curvefitting tools correctly.
AdHoc Filter
Drawing a bunch of different lines by hand, keeping in only the data points perceived as "good". Also not useful.
House of Cards
Not a real method, but a common consequence of misapplication of statistical methods: a curve can be generated that fits the data extremely well, but immediately becomes absurd as soon as one glances outside the training data sample range, and your analysis comes crashing down "like a house of cards". This is a type of overfitting. In other words, the model may do quite well for (approximately) interpolating between values in the sample range, but not extend at all well to extrapolating values outside that range. Note: Exact polynomial fitting, a fit which gives the unique (n1)th degree polynomial through n points, often display this kind of behaviour. Also a potential reference to the TV show, House of Cards ("WAIT NO, NO, DON'T EXTEND IT!").
CauchyLorentz (title text)
CauchyLorentz is a continuous probability distribution which does not have an expected value or a defined variance. This means that the law of large numbers does not hold and that estimating e.g. the sample mean will diverge (be all over the place) the more data points you have. Hence very troublesome (mathematically alarming).
Since so many different models can fit this data set at first glance, Randall may be making a point about how if a data set is sufficiently messy, you can read any trend you want into it, and the trend that is chosen may say more about the researcher than about the data. This is a similar sentiment to 1725: Linear Regression, which also pokes fun at dubious trend lines on scatterplots.
Transcript
 CurveFitting Methods
 and the messages they send
 [In a single frame twelve scatter plots with unlabeled x and yaxes are shown. Each plot consists of the same dataset of approximately thirty points located all over the plot but slightly more distributed around the diagonal. Every plot shows in red a different fitting method which is labeled on top in gray.]
 [The first plot shows a line starting at the left bottom above the xaxis rising towards the points to the right.]
 Linear
 "Hey, I did a regression."
 [The second plot shows a curve falling slightly down and then rising up to the right.]
 Quadratic
 "I wanted a curved line, so I made one with Math."
 [At the third plot the curve starts near the left bottom and increases more and more less to the right.]
 Logarithmic
 "Look, it's tapering off!"
 [The fourth plot shows a curve starting near the left bottom and increases more and more steeper towards the right.]
 Exponential
 "Look, it's growing uncontrollably!"
 [The fifth plot uses a fitting to match many points. It starts at the left bottom, increases, then decreases, then rapidly increasing again, and finally reaching a plateau.]
 LOESS
 "I'm sophisticated, not like those bumbling polynomial people."
 [The sixth plot simply shows a line above but parallel to the xaxis.]
 Linear, no slope
 "I'm making a scatter plot but I don't want to."
 [At plot #7 starts at a plateau above the xaxis, then increases, and finally reaches a higher plateau.]
 Logistic
 "I need to connect these two lines, but my first idea didn't have enough Math."
 [Plot #8 shows two red lines embedding most points and the area between is painted as a red shadow.]
 Confidence interval
 "Listen, science is hard. But I'm a serious person doing my best."
 [Plot #9 shows two not connected lines, one at the lower left half, and one higher at the right. Both have smaller curved lines in light red above and below.]
 Piecewise
 "I have a theory, and this is the only data I could find."
 [The plot at the left bottom shows a line connecting all points from left to right, resulting in a curve going many times up and down.]
 Connecting lines
 "I clicked 'Smooth Lines' in Excel."
 [The next to last plot shows a echelon form, connecting a few real and some imaginary points.]
 AdHoc filter
 "I had an idea for how to clean up the data. What do you think?"
 [The last plot shows a wave with increasing peak values. Finally the plot of the wave is continued beyond the x and yaxis borders.]
 House of Cards
 "As you can see, this model smoothly fits the wait no no don't extend it AAAAAA!!"
Trivia
 This is the comic 2048, or 2^{11}. In addition to being the name of a popular app referenced in 1344: Digits, this is an extremely round number in binary (100,000,000,000_{2}). 1000: 1000 Comics pointed out that comic 1024 would be a round number, but there were not any comics noting 2048.
 This comic is similar to 977: Map Projections which also uses a scientific method not commonly thought about by the general public to determine specific characteristics of one's personality and approach to science.
 Regressions have been the subject of several previous comics. 1725: Linear Regression was about linear regressions on uncorrelated or poorly correlated data. 1007: Sustainable and 1204: Detail depict linear regressions on data that was actually logistic, leading to bizarre extrapolations. 605: Extrapolating shows a line extrapolating from just two data points.
Is this out of date?
.New here?
Last 7 days (Top 10) 


You can read a brief introduction about this wiki at explain xkcd. Feel free to sign up for an account and contribute to the wiki! We need explanations for comics, characters, themes, memes and everything in between. If it is referenced in an xkcd web comic, it should be here.
 If you're new to wikis like this, take a look at these help pages describing how to navigate the wiki, and how to edit pages.
 Discussion about various parts of the wiki is going on at Explain XKCD:Community portal. Share your 2¢!
 List of all comics contains a complete table of all xkcd comics so far and the corresponding explanations. The red links (like this) are missing explanations. Feel free to help out by creating them! Here's how.
Rules
Don't be a jerk. There are a lot of comics that don't have set in stone explanations; feel free to put multiple interpretations in the wiki page for each comic.
If you want to talk about a specific comic, use its discussion page.
Please only submit material directly related to —and helping everyone better understand— xkcd... and of course only submit material that can legally be posted (and freely edited.) Offtopic or other inappropriate content is subject to removal or modification at admin discretion, and users who repeatedly post such content will be blocked.
If you need assistance from an admin, feel free to leave a message on their personal discussion page. The list of admins is here.