2755: Effect Size

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
Effect Size
Subgroup analysis is ongoing.
Title text: Subgroup analysis is ongoing.


This comic outlines a meta-analysis, or more aptly THE meta-analysis, as its inclusion criteria are simply all studies.

A meta-analysis, true to its name, is a statistical analysis of statistical analyses, usually those attempting to answer a single question. Meta-analyses are intended to account for possible individual error within each study, summarizing the general results of all of its studies in order to potentially draw a useful conclusion. For a meta-analysis to be possible, there must be some measured variable in common across the included studies.

Here, the meta-analysis consists of a graph of effect sizes for what is allegedly every single study ever conducted. Accordingly, even page 53,589 of the meta-analysis is only about 1/4 of the total graph, as the scroll bar on the right is only about 1/4 of the way down; this makes the total included in the meta-analysis approximately 210,000 pages, or around 2.3 million studies. Below the graph is an estimate of the "average effect" across all of these variables, the effect normally being the relationship being analyzed by the studies within a meta-analysis, though here it seems again to be just a conglomerate of all known effects, along with a (likely) 95% confidence interval for the findings of the meta-analysis. It's absurd to analyze all studies this way, as the variables that all of those studies measure are wildly different and it makes no sense whatsoever to average (or otherwise analyze) them together. In addition, 2.3 million scientific studies is much too small a number; a recent estimate is that about 3 million papers are published each year, and while not all of them would have a numerical hypothesis test, many others would have several such tests.

Statistical studies are produced by generating hypotheses and then testing those hypotheses. A meta-analysis of all studies would therefore include both studies where the original hypothesis turns out to be false, as well as studies where the original hypothesis is confirmed. Hypotheses that fail to be confirmed by studies are often discarded; however, these studies would still be included in this meta-analysis.

In the caption, Randall delivers the bad news: that the meta-analysis of "all of science" has finally been performed, and as it turns out, the results are not significant. Statistical significance is the degree to which the results of a sample or study are likely due to a correlation, as opposed to chance or sampling variation alone. Apparently, across the entirety of human science in the study of our universe, the study has found a lack of significance, or of a relationship between all the variables measured by all the studies ever.

The joke lies in the absurdity of the claim that "all of science" can be analyzed at all. Science is not a singular term that can be subcategorized in such a manner, but is rather hundreds of different fields of study, many of which have little or no overlap. Doing a meta-analysis of geology and philosophy, for example, would be patently ridiculous, so the 53,589 (or 210,000) page study is comical in its very existence, much less conclusion. In addition the comic conflates two meanings of "significant": the statistical meaning, and the more everyday meaning of importance or noteworthiness.

Additionally to the absurdity, one can see the whole joke as an instance of the Liar paradox: if one considers that the conclusion of the meta-analysis is that "science" is statistically unable to provide information on the truth of a statement, then the meta-analysis itself (in it has been made following the general principles of rigor and methods of "science") is subject to its conclusion. Hence, the conclusion of the meta-analysis might have nothing to do with the truth, and "science" might well be significant after all. But if it is, then the present meta-analysis should be considered significant as well and one should believe its conclusion, etc.

In the title text, Randall reports that subgroup analysis is ongoing. The joke here is that since all scientific studies are subsets of the overall meta-analysis, every field of scientific endeavor can be separately assessed by constraining the subgroup to include only studies in that field. Hence the subgroup analysis could be considered to include analyses of every individual area or question that scientists have made subject to statistical studies. Again, analyzing any subgroup would lump together studies that measured very different things and hence would still be meaningless.

XKCD has previously considered the topic of subgroup analyses around the important issue of jelly beans. Subgroup analyses may be used as data dredging or p-hacking in order to identify anything that is "significant" and thus publishable.


Inclusion criteria: All studies
[A forest plot is shown. In the tab on the top right, there is a label "Page 53,589". On the right side of the plot, there is a vertical scrollbar where the scroll box is about one quarter from the top. A horizontal axis centered on 0 is shown at the bottom and -1 and +1 on either side are labeled. In the middle of the plot, there is a dashed vertical line. On both sides of the vertical line in separate rows, there are black boxes of different sizes with horizontal bars of varying lengths on the sides of the boxes. Below the plot, slightly to the right of the vertical line, there is a black diamond wider than it is tall, labeled "0.17 (-0.14, 0.52)".]
[Caption below the panel:]
Bad news: They finally did a meta-analysis of all science, and it turns out it's not significant.

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!


Wow, it looks like I'm first! 16:40, 27 March 2023 (UTC)

Wasn't something like this actually done?

Robert Sapolsky mentions an obscure paper that actually did something like this. They did a meta-analysis of the average reported error throughout various disciplines in order of the physical size of the objects being studied (e.g., from cells to organs to etc.), and found no correlation between them. The conclusion was that this was evidence that philosophical reductionism was flawed. Fephisto (talk) 22:45, 27 March 2023 (UTC)

Did you manage to find it? 08:49, 28 March 2023 (UTC)
Here is the talk. He talks about the paper around 1:26:00. The figure is 1:26:50. Fephisto (talk) 13:18, 29 March 2023 (UTC)
Maybe LINK Titled "Reductionism and Variability in Data: A Meta-Analysis" Sapolsky, R.; Balt S.; Perspectives in Biology and Medicine 39(2), 1996Tier666 (talk) 16:21, 29 March 2023 (UTC)

But does the meta-analysis include itself? Technically, it too is part of Science... Artinum 13:06, 28 March 2023 (UTC)

It's SCIENCE all the way Down! Kev (talk) 18:39, 28 March 2023 (UTC)
I know this is facetious, but to answer seriously, the meta-analysis is a break down of specific areas of science, and meta-analyses was not one of the categories that was analyzed. Fephisto (talk) 14:42, 15 August 2023 (UTC)

scroll box location is ~25.5% down track: scroll box is 10px high, scrollbar is 290px high, 54px above box, 226px below = center of scrollbox is 59/231 = 25.541..% = ~209,815 pages of total studies. Adjusted to 210,000 to account for rounding errors. (Plus the scroll box might not even move a pixel for a number of pages).

Wait, if the scrollbar is 290px high, then shouldn't the position be 59/290 = 20.345%? It looks a lot more like 1/5th down than 1/4th down to my eyes. --Orion205 (talk) 17:16, 29 March 2023 (UTC)
The assumption here is that the scroll bar corresponds to the page numbers. However, that is not normally the case, it's more common to have a scroll bar per page, meaning we are here 20% into page nr 53589... -- Pbb (talk) 16:56, 7 April 2023 (UTC)

Did anyone notice the asterisk next to one of the graph elements? There's got to be a lot of those... Not all scientific studies (I would say very few) can be boiled down to a single numerical output.

Unless I misunderstand this, there's also an aspect of this that's due to sign - because some studies of some outcomes expect negative results, and some expect positive, mixing even results that are overall statistically significant may cause the effects to cancel out. Mattwigway (talk) 15:32, 28 March 2023 (UTC)

I think that could be squaredTier666 (talk) 17:03, 29 March 2023 (UTC)

meta-analyses are also referenced in 1477: Meta-Analysis 16:18, 28 March 2023 (UTC)Bumpf

1477 Is Star Wars? Kev (talk) 18:39, 28 March 2023 (UTC)
sorry, I meant 1447: Meta-Analysis :) 13:04, 29 March 2023 (UTC)Bumpf
Would this meta-analysis of all science satisfy Life Goal #28 (assuming it's rejected, as it probably should be)? Barmar (talk) 15:29, 29 March 2023 (UTC) 07:01, 30 March 2023 (UTC) SCIENCE IS HIGHLY SIGNIFICANT

If we (i) postulate that the picture of page 53,589 of the meta-analysis of all science is a representative sample, and if we (ii) postulate that the model of the meta-analysis is just simple random sampling, without stratification (and I think that is a reasonable guess, since if you really have data of ALL science or want to make an assumption about ALL science based on a sample, then Simple random sampling is okay since weighting of different scientific disciplines is proportional to the number of studies in your sample, SRS guarantees getting an unbiased estimate ...), and if we (iii) postulate that the study-specific variance is independent from the single-study means, we can approximately calculate the correct confidence interval.

Let's do it: The authors say that the weighted least square estimator of the population mean is 0.17. The picture shows 11 studies. I eye-balled the effects being (-0.125; 0.5; 0.375; 0.75; -0.375; 3.75; 0.125; 1.25; 0; 0.55; -0,2) and calculated the "between study standard deviation" (using Excel ) being 1.146 and the mean of that sub-sample being 0.6. (Remark: We can ignore the within study variation, since the dominating source of variation is "between studies" and the within error is enclosed in "between study stddev" due to error propagation). Of course, data analysis can be done with a mixed model with clustered data, but doing an analysis with the study means will give a very good approximation.

Now, first step is to calculate the confidence interval of the mean effect size based on the studies we see. We have 11 studies, 10 degrees of freedom. Assuming a t-distribution the (unweighted) 95% confidence interval of the studies in the picture is 0.6 +/- [2.228*1.146/sqrt(11)] = 0.6 +/- 0.77 = [-0.17, 1.37]

The C.I. includes zero but also includes the full meta study mean of 0.17. So, we have no evidence against our hypothesis that page 53,589 which we see on the website is representative for the full meta analysis. So, we can go on

The 95% confidence interval for ALL studies assuming a number of around 250,000 studies would be 0.17 +/- [1.96*1.146/sqrt(250000)] = 0.17 +/- 0.00572 = [0.16428, 0.17572].

The 99.9% confidence interval for ALL studies assuming a number of around 250,000 studies would be 0.17 +/- [3.3*1.146/sqrt(250000)] = 0.17 +/- 0.00756 = [0.16244, 0.17756].

meaning, on average SCIENCE IS HIGHLY SIGNIFICANT (p<0.001) 10:31, 30 March 2023 (UTC) I re-viewed the graph and read the comments on the web page. They say the underlying number of papers is 2,3 million. My fault was that I havent multiplied the number of pages with number of studies per page. So, the confidence interval will become even more narrow

The 95% confidence interval for ALL studies assuming a number of around 2,100,000 studies would be 0.17 +/- [1.96*1.146/sqrt(2100000)] = 0.17 +/- 0.00155 = [0.16845, 0.17155].

The 99.9% confidence interval for ALL studies assuming a number of around 250,000 studies would be 0.17 +/- [3.3*1.146/sqrt(2100000)] = 0.17 +/- 0.00261 = [0.16739, 0.17261]. 10:31, 30 March 2023 (UTC) Interesting is, that the population mean is 0.17 and not 0.000. When averaging the effects of so many studies, all different in topic and investigated treatments and strata, one would expect that the global mean of all effects is zero. But it is 0.17. Clear indication of publication bias. There is higher probability for a positive effect to be published in a paper.