Talk:2118: Normal Distribution

Explain xkcd: It's 'cause you're dumb.
Revision as of 00:00, 2 March 2019 by (talk)
Jump to: navigation, search

Is there a statistician in the house? Hawthorn (talk) 15:32, 1 March 2019 (UTC)

   I think they all got annoyed at the graph and left. Margath (talk) 15:46, 1 March 2019 (UTC)

Of course there is! 15:44, 1 March 2019 (UTC)

As an example: When measuring the height of people in the same age bracket, then you'll expect the number of people at each height to look like this graph. There will be a lot of people around the average height, fewer a foot shorter/taller, some (but very few) exceptionally tall people, and some (but very few) exceptionally short people. The x-value represents the height, the y-value essentially represents the amount of population that share that height. When we measure the middle 50% of the population using vertical bars, then people at a certain height are either inside OR outside the middle. Randall uses horizontal bars here, which means some people at a certain height will be counted in the middle 50%, but other people with the same height won't be. In fact, some people with the exact average height of the whole population would fall outside the middle. 16:01, 1 March 2019 (UTC)

Feel free to rip me apart for referring to it as the "number of people at each height", since y-axis is more complicated than a simple count. 16:03, 1 March 2019 (UTC)

Just to say, Randall's horizontal slice isn't entirely meaningless. It's a calculation I've had to do, where I have a series of binned samples of a population (say I knew how many fell in -10..10, how many fell in -5..5, how many fell in -2..2) and wanted to combine them with an appropriate weighting to approximate a Gaussian. I was using it for filtering, but it's logically similar. Fluppeteer (talk) 16:19, 1 March 2019 (UTC)

Also, the slice sampler for MCMC is a trick for sampling from a distribution by "turning it on its side". But I don't think the 50% figure would be meaningful in that context. 21:16, 1 March 2019 (UTC)

Pedant: etymologically, there *is* actually a connection between a normal (to a surface or line) and the normal distribution; the former comes from the Latin for a set square (giving you perpendicular), and it later came to mean "standard". The "tangential distribution" certainly fits the etymology of "odd/unusual" though. Fluppeteer (talk) 16:26, 1 March 2019 (UTC)

This reminds me of the difference between Riemann(-Stieltjes) and Lebesgue integration. 20:16, 1 March 2019 (UTC)

As the axis are not labeled (see comic 833) we could consider this a multivariate distribution where one parameter is uniform and the other is normal. That was my first thought when I saw this. 18:43, 1 March 2019 (UTC)

Is there any meaning to midpoint: 52.7%? Maybe that is the arbitrary center he formed the horizontal bounds around? Maybe it relates to data? Is this a reference to something? It's certainly reminiscent of how normal distributions produce statistically meaningful numbers that have weird decimals in them (like the % represented by being within so many standard deviations). 19:45, 1 March 2019 (UTC)

Maybe it's because the meaning of "50% of the chart lies between these lines" specifically becomes roughly useless for discerning error if the lines are not centered around the origin. 19:52, 1 March 2019 (UTC)
I might get it!!! The area between the lines is 52.7% of the total area: which means that 50% is technically included in what lies between them. 23:07, 1 March 2019 (UTC)

The correct way to do this is to have the topmost vertical line equal to or above the top of the normal plot. Then the bottom-most line would represent the same values as vertical lines would. 23:32, 1 March 2019 (UTC)

Say I want to build a diverse team or a representative council. And it is more important that the selection is representative of several subpopulations (who should not be voted down by the majority) than that it gives an equal fair chance to anybody. I would cut away the absolute outliers and reduce the weight of the most abundant group - this gives just the area between the two lines. Sebastian -- 23:40, 1 March 2019 (UTC)

Has somebody measured or calculated (by assuming normal distribution) the areas? It seems that the upper area is way smaller than the lower one, but both having the same 'height' in the middle. Is the 52.7% graphically correct? I tried half of the height at 0: .398942 and integrated, then I get 52,6% for the white area and 47,4% for the gray area. On the y-axis it seems that the three visible ticks are .1, .2, .3, then the gray area would be a bit broader than .2 and centered at .1. Sebastian -- 23:40, 1 March 2019 (UTC)

Got Nerd Sniped by the number "52.7%", but failed on an analytic solution and settled for a quick and dirty numerical integration instead, which suggested that the exact number might be somewhere between .5268 and .5269, so I think I'm not far from the truth. As I see it, the shaded area is vertically centered around the vertical midpoint, with a relative vertical width chosen such that the shaded area is exactly 50% of the total area under the curve. Just as usual, only with vertical instead of horizontal binning, which of course is the twist that makes this graph puzzling, funny, and completely useless for meaningful interpretation. The label "52.7%" is not an addition to the Midpoint label but instead gives the width of the vertical bin, as a percentage of the vertical height of the curve. Oh, and you are certainly right in that the marginal distributions at the top and the bottom are asymmetric, as is the gaussian when viewed sideways. 23:56, 1 March 2019 (UTC)