Talk:1347: t Distribution

Explain xkcd: It's 'cause you're dumb.
Revision as of 20:30, 26 March 2014 by Davidy22 (talk | contribs)
Jump to: navigation, search

http://en.m.wikipedia.org/wiki/Student%27s_t-test


173.245.50.73 05:20, 26 March 2014 (UTC)Adam


I think this is a comment of the quality of education today - it is difficult to grade students on a distribution curve and even more so when you take into account the distribution curve of the teachers ability. 108.162.249.205 (talk) (please sign your comments with ~~~~)

I noticed the teacher's curve is symmetrical, and after further inspection it could be interpreted as an edge detection: high values show where an edge occurs. The two highest peaks would nicely align with the edges of the paper, the next highest peaks fit the edges of the table, and the rest could be approximation artefacts, as they're equidistant and rather insignificant compared to those four. I'm not statistics pro, but maybe that rings someone's bells? 108.162.210.239 07:56, 26 March 2014 (UTC)

Interesting observation. It may play into an age-long legend told and re-told among the students that some teachers grade papers by tossing the whole pile in the air; those sheets that land on the teacher's desk get a pass, those falling to the floor get a fail. Sometimes the story gets modified in such a way that papers falling on the teacher's book (or other object) laying on the desk will get a higher marking than those simply hitting the desk. The latter version would explain the higher sheet-size-apart peaks. 108.162.210.111 08:57, 26 March 2014 (UTC)

To be more explicit, I think the sheet of paper represents some data. Cueball is not happy with the results of applying Student's t test, so ze is trying more complex tools in the hope of getting significance. -- TimMc / 173.245.52.27 11:51, 26 March 2014 (UTC)

I would upvote this comment if allowed. As an aside, there are some teachers who think a class' grades will always fall into a nice t Distribution (thus the expression "grading on a curve") and others who vehemently hate the notion. Source: my 3-year stint as a math teacher in an urban high school. Smperron (talk) 14:06, 26 March 2014 (UTC)

Man, normally these explanations clear the comic right up for me, but I've read this one thrice now and I still can't figure out what a t-distribution is, much less a joke based on one. The only definition being a Wikipedia quote written in legalese doesn't help. So a t-distribution estimates...the probability of a population's average when there's unknown information?108.162.216.48 12:17, 26 March 2014 (UTC)

The unknown information is the sample size (class size, for example) and standard distribution (by how much, on average, is something going to vary from the mean). The unknown information is not "in the data".Jarod997 (talk) 12:28, 26 March 2014 (UTC)
Basically, if you have an underlying process that would produce samples with a Gaussian distribution with mean of 0, and stddev of 1, and then you pull a finite number of samples out of it, and do the usual "average" operation on those samples (i.e. sum them and divide by the number of samples) you would expect that that computed average would be close to zero. But it might not be! By chance the samples you pulled might mostly have been from the far right or left side of distribution and the average you got would be way off. Student's T distribution (for a certain number of samples, n) is basically "given that the underlying process a Gaussian with mean zero and stddev of 1, if I repeatedly take n samples from that distribution and compute the average of those samples to get an "estimated mean", this is how I expect that estimated mean to be distributed". Naturally, this is important in questions like "I took 100 samples and got an average of 0.02 -- does this mean that it is sensible to think that the mean of the underlying distribution is actually zero?"
Of course, most of the joke is that the distribution is named "Student's", which is not strongly dependent on the nature of the statistics. Vyzen (talk) 12:42, 26 March 2014 (UTC)
Okay, it's pretty clear to me now what the Student's t distribution is. I'm still not sure about the punchline though, how does the "Teacher's" t distribution come into play? Does the uneven distribution represent any phenomena in the academic world? Like, as suggested above, is this a joke about grading? 173.245.53.137 15:05, 26 March 2014 (UTC)
Other than the symmetry, I'd almost suggest that the distribution could be real test scores. Typically tests will have a small number of questions worth multiple points and the scores might spike around levels that represent integral numbers of questions done perfectly, with the spaces in-between filled in by part marks. The teacher may have a bias towards giving perfect or zero scores per question. Vyzen (talk) 18:53, 26 March 2014 (UTC)

The teacher's t-distribution looks like multiple spikier curves with different centres added together and it doesn't fit the table. Wwt (talk) 13:17, 26 March 2014 (UTC)


I took from it that the Students Distribution was too perfect, and real data would rarely yield those idealized results in a small sample size. That the teacher's distribution used actual numbers, with the occasional spikes. I took from the title text, the tendency of students, or anyone with pre-conceived notions, to keep redoing the test until they get the results they expect, in this case, the textbook result. 173.245.55.71 13:25, 26 March 2014 (UTC)

Any thoughts on the piece of paper he's trying to pull out from beneath the Students' T-distribution? 108.162.219.66 14:10, 26 March 2014 (UTC)

I don't think he he trying to pull the paper from out beneath the t-distribution. I think he is placing the distribution on top of the paper to see if the data on the paper matches the distribution. In panel 2, he looks at the paper and decides that, no, it doesn't, so then opts to use another distribution - the Teacher's t-distribution and see if that works. The comic may be hinting that the t-distribution in grading, etc (since students and teachers are explicitly listed) is flawed. --Dangerkeith3000 (talk) 15:10, 26 March 2014 (UTC)

I may be over-simplifying it, but the 'Teachers' T looks like a reference to the 'double-hump programmer' idea, converted into a T-distribution. The other ideas cover the general principle, but this looks like a specific example as well. 108.162.221.48 15:47, 26 March 2014 (UTC)

I don't think the explanation really explains what a T-distribution is at all. I know it's googleable, but the point of an explanation is you shouldn't have to look it up afterwards. I don't like how lately all of the scientific/maths comics seem to be given explanations laden with technical terms that don't actually clarify anything. --Mynotoar (talk) 17:57, 26 March 2014 (UTC)


I did a quick calculation using mspaint, and it appears that the Student's t-distribution in the first panel is roughly 5780 px^2 in size; at the same time the area of the "Teacher's t-distribution" in the last panel is approximately 8125 px^2 (or 140% of the Student's distribution). Thus, using the Teacher's t-distribution as Cueball is intent on doing "is both illegal and illegitimate" (illegitimate = no scientific basis for such a distribution; illegal = this it not even a distribution per se). If Cueball goes on and publishes his results based on such approach, they will not be recognized by the international scientific community (except perhaps by Russia, Syria and North Korea). We, readers, therefore express our deep concern over Cueball's methods. Stpasha (talk) 18:27, 26 March 2014 (UTC)

I believe the joke has to do with "fitting data to a distribution": In the first panel, Cueball is trying to adjust the Student's T distribution on top of the data, which could be a play on "fitting" the data to the distribution. Statistically speaking, fitting data to a distribution is often done to figure out how likely the data were to have occurred, under the assumption that the underlying data generating process follows a particular distribution (like the Student's T). It looks like Cueball first tries to fit his data to a Student's T, and is dissatisfied with the fit. He then tries a much more complicated distribution - which, I think is jokingly called a Teacher's distribution on the premise that something to do with teachers is more complicated than something to do with students. The joke is that data often don't fit a simple distribution like the Student's T... they are nuanced and complex, and their underlying data generating process was far more complex. Amoorthy (talk) 19:50, 26 March 2014 (UTC)

By the way, this is related to and compatible with the explanation given by Dangerkeith3000 above.Amoorthy (talk) 20:26, 26 March 2014 (UTC)

The title test could be referring to the tests aspiring teachers have to take in the US to get their credentials. It's sort of like a Bar- except you may take it as many times as you wish until you pass. 199.27.128.77 (talk) (please sign your comments with ~~~~)