Editing 2118: Normal Distribution

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 8: Line 8:
  
 
==Explanation==
 
==Explanation==
This is another comic on [[:Category:How to annoy|How to annoy]] people, particularly targeting statisticians in this instance.
 
 
 
[[File:Standard_deviation_diagram.svg|thumb|{{w|Normal distribution}}s and the intervals of the standard deviation are a topic commonly seen in introductory statistics.  Randall's chart is similar, but his lines are perpendicular.]]
 
[[File:Standard_deviation_diagram.svg|thumb|{{w|Normal distribution}}s and the intervals of the standard deviation are a topic commonly seen in introductory statistics.  Randall's chart is similar, but his lines are perpendicular.]]
 
In statistics, a {{w|Probability distribution|distribution}} is a representation that can be understood in terms of how much of a sample is expected to fall into either discrete bins or between particular ranges of values.  For example, if you wanted to represent an age distribution using bins of ten years (0-9, 10-19, etc.), you could produce a bar chart, one bar for each bin, where the height of each bar represents a count of the portion of the sample matching that bin. To turn that bar chart into a distribution, you'd get infinitely many people (technically: a number N which tends to infinity), put them into age bins that are infinitely narrow (technically: bins whose size is O(1/sqrt(N))), and then divide each bin count by the total count so that the whole thing added up to 1. It is common to ask how much of the distribution lies between two vertical lines; that would correspond to asking what percent of people are expected to fall between two ages.
 
In statistics, a {{w|Probability distribution|distribution}} is a representation that can be understood in terms of how much of a sample is expected to fall into either discrete bins or between particular ranges of values.  For example, if you wanted to represent an age distribution using bins of ten years (0-9, 10-19, etc.), you could produce a bar chart, one bar for each bin, where the height of each bar represents a count of the portion of the sample matching that bin. To turn that bar chart into a distribution, you'd get infinitely many people (technically: a number N which tends to infinity), put them into age bins that are infinitely narrow (technically: bins whose size is O(1/sqrt(N))), and then divide each bin count by the total count so that the whole thing added up to 1. It is common to ask how much of the distribution lies between two vertical lines; that would correspond to asking what percent of people are expected to fall between two ages.
Line 21: Line 19:
 
This distribution has never been discussed before, and has no known application. Moreover, the distribution of Y is not symmetric: while 50% of Y values fall inside interval ''R'', 41% fall below ''R'' and only 9% fall above ''R''. So the single piece of information in the comic is not a good way to describe this distribution! We do use such intervals for the normal distribution because the normal distribution is symmetric, and the center of symmetry is the mean, median, and mode. (However, it would be just about as ridiculous to observe that 50% of the X values in a standard normal distribution fall between the vertical lines X=-0.2 and X=1.41.)
 
This distribution has never been discussed before, and has no known application. Moreover, the distribution of Y is not symmetric: while 50% of Y values fall inside interval ''R'', 41% fall below ''R'' and only 9% fall above ''R''. So the single piece of information in the comic is not a good way to describe this distribution! We do use such intervals for the normal distribution because the normal distribution is symmetric, and the center of symmetry is the mean, median, and mode. (However, it would be just about as ridiculous to observe that 50% of the X values in a standard normal distribution fall between the vertical lines X=-0.2 and X=1.41.)
  
The title text refers to the notion of {{w|Normal (geometry)|normals}} and {{w|tangent}}s in geometry. Given a 2D curve or 3D surface, a line which points perpendicularly outward from a point on the curve or surface (making a 90-degree angle with the curve) is said to be ''normal'' to the curve, while a line which just grazes the curve, being exactly parallel to the curve at the point of contact, is said to be ''tangent'' to the curve at that point. The joke is that this geometrical notion of ''normal'' is completely unrelated to the statistical ''normal distribution''. Randall observes that if you take a geometric normal and rotate it 90 degrees, you produce a tangent; thus, if you take the ''normal'' distribution and rotate it by 90 degrees, you must get something called the "''tangent'' distribution." Saying this to a statistician would only annoy the statistician further.
+
The title text refers to the notion of {{w|Normal (geometry)|normals}} and {{w|tangent}}s in geometry. Given a 2D curve or 3D surface, a line which points perpendicularly outward from a point on the curve or surface (making a 90-degree angle with the curve) is said to be ''normal'' to the curve, while a line which just grazes the curve, being exactly parallel to the curve at the point of contact, is said to be ''tangent'' to the curve at that point. This geometrical notion of ''normal'' is completely unrelated to the statistical ''normal distribution''. Randall observes that if you take a geometric normal and rotate it 90 degrees, you produce a tangent; thus, if you take the ''normal'' distribution and rotate it by 90 degrees, you must get something called the "''tangent'' distribution." Saying this to a statistician would only annoy the statistician further.
  
 
This is annoying to a statistician not only because the terms ''normal'' and ''tangent'' come from differential geometry and have no established meaning in probability theory.  Even the word ''perpendicular'' has no established meaning in probability theory.  Of course, the x and y coordinates in the comic are perpendicular (orthogonal) coordinates, but X and Y are not "perpendicular" or "orthogonal" random variables.  Even if we give "perpendicular" or "orthogonal" a probabilistic meaning, and the most obvious such meaning is either {{w|Independence (probability theory)|independent}}, which even uses a symbol related to the geometric symbol for perpendicularity, or {{w|Uncorrelatedness (probability theory)|uncorrelated}}, which makes X and Y orthogonal vectors in the Hilbert space of random variables that are square integrable with respect to Lebesgue measure, X and Y are not perpendicular in either of these senses.
 
This is annoying to a statistician not only because the terms ''normal'' and ''tangent'' come from differential geometry and have no established meaning in probability theory.  Even the word ''perpendicular'' has no established meaning in probability theory.  Of course, the x and y coordinates in the comic are perpendicular (orthogonal) coordinates, but X and Y are not "perpendicular" or "orthogonal" random variables.  Even if we give "perpendicular" or "orthogonal" a probabilistic meaning, and the most obvious such meaning is either {{w|Independence (probability theory)|independent}}, which even uses a symbol related to the geometric symbol for perpendicularity, or {{w|Uncorrelatedness (probability theory)|uncorrelated}}, which makes X and Y orthogonal vectors in the Hilbert space of random variables that are square integrable with respect to Lebesgue measure, X and Y are not perpendicular in either of these senses.
Line 46: Line 44:
 
[[Category:Charts]]
 
[[Category:Charts]]
 
[[Category:Statistics]]
 
[[Category:Statistics]]
[[Category:Puns]]
 
[[Category:How to annoy]]
 

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)