2400: Statistics

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
Statistics
We reject the null hypothesis based on the 'hot damn, check out this chart' test.
Title text: We reject the null hypothesis based on the 'hot damn, check out this chart' test.

Explanation[edit]

This comic is another in a series of comics related to the 2020 pandemic of the coronavirus SARS-CoV-2, which causes COVID-19, specifically regarding the COVID-19 vaccine. Is is also another one of Randall's Tips, this time a statistics tip.

Graph[edit]

The main focus of the comic is a graph showing cases of COVID-19 versus time for two groups: one group was vaccinated and the other group was not. Graphs are ways to visualize data, and for real data indicate specific values. This graph seems to be based on the Moderna vaccine's results but is somewhat fictionalized. The higher line ("placebo group") rises in a steep curve. The lower line ("vaccine group") follows the first for a bit but then levels out to a much slower rate of climb. Officially, a scientific assessment of the effectiveness of anything requires rigorous statistical analysis. This is particularly true in medical studies, where impacts of biology can be highly complex and subject to many factors, meaning that careful review of the data is necessary to confirm that an intervention was effective. The joke of this comic is that the intervention presented here is so obviously effective that it's obvious even to a layman with little understanding of the math. A few days after the vaccine was administered, cases in the vaccinated group essentially flatline, while cases in the placebo group continue to rise as a significant rate. The data is so "good", meaning that numbers for the treatment and control groups diverge so dramatically, that actual analysis becomes almost a formality: a glance at the chart would convince most people that the treatment is effective.

This comic was released one day after the FDA's Dec 17th briefing document for the Moderna COVID-19 vaccine was released. The document includes the following chart. The charts draw the integral of the incidence data rather than the data itself ("cumulative" rather than "rate"): this results in changes in disease rate towards the left side of the chart, being added into the data on the right side, amplifying their difference. This technique for emphasizing the data is valid: the spread between the lines only continues to increase if the effect continues happening, such that the total spread at the right is proportional to the total effect the vaccine had. The charts do not show any information on other possible variables. Randall has described previously in his webcomics how very clear charts can be made to hide misleading data. The linked graph does not leave the numbers out, and the numbers indicate the vaccine is 91% effective at preventing the disease (and a 95% chance of being between 85 and 95% efficient).

The advice here could be seen as the inverse of the "science tip" in 2311: Confidence Interval, in which the data was so bad that its error bars fell outside of the graph and were not shown. Also there's some association with 1725: Linear Regression where the data is not so good that you don't need to perform linear analysis.

Null hypothesis[edit]

The null hypothesis, mentioned in the title text, is the hypothesis in a statistical analysis that indicates that the effect investigated by the analysis does not occur, i.e. 'null' as in zero effect. For example, the null hypothesis for this study might be "The vaccine has no effect on whether subjects catch COVID." The null hypothesis was previously the subject of 892: Null Hypothesis. The null hypothesis is rejected when the probability of something like the observed data would be very low were the null hypothesis true.

For a simplified example, imagine there are 10 000 people in the vaccinated group, and each has a 5% chance of catching COVID under the null hypothesis; we expect 500 people to catch COVID. If only 490 catch COVID, the null hypothesis remains plausible, but if just 10 do, the odds are (in Python; see binomial distribution) sum([math.comb(10000, i) * 0.05**i * 0.95**(10000-i) for i in range(0,10)]) = 1.5 × 10-204. In other words, it is wildly improbably that an ineffective vaccine would have produced such excellent results. We therefore conclude that the vaccine is not ineffective, and have rejected the null hypothesis.

Most people however, on seeing the raw results, would have concluded that the vaccine worked and statistics were just a formality. As the title text says, they would have "reject[ed] the null hypothesis based on the 'hot damn, check out this chart' test."

Transcript[edit]

[Shown is a graph with the x-axis labeled "time" and the y-axis labeled "COVID cases." There is a black line on the graph labeled "placebo group", which has a roughly linear slope moving toward the top right corner. There is a red line labeled "vaccine group", which follows the black line for about an eighth of the width of the graph before leveling off at a much slower increase.]
Caption beneath the graph: Statistics tip: Always try to get data that's good enough that you don't need to do statistics on it


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

This is a representation of the actual graph showing the efficacy of the Pfizer/BioNTech coronavirus vaccine, based on data from Deutsche Bank AG and the FDA as published in John Authers' Bloomberg Opinion column.  And yes, the results are just that clear and graphically obvious (pun unintended). RAGBRAIvet (talk) 00:51, 19 December 2020 (UTC)

I agree, but the original graph can be found in this paper: https://www.nejm.org/doi/full/10.1056/NEJMoa2034577#figures_media --162.158.203.25 09:11, 19 January 2021 (UTC)
So, the value on bottom right of the graph ... is it three days? -- Hkmaly (talk) 03:55, 19 December 2020 (UTC)
The corresponding graph in the FDA report covers about 100 days. Barmar (talk) 05:00, 19 December 2020 (UTC)

When I saw this comic I immediately thought of this bit about doublespeak in graphs. Not saying I inherently believe or disbelieve numbers/statistics about covid but an impressive graph with no numbers...Apparently it is actually that clear though. https://youtu.be/qP07oyFTRXc?t=292 DarkVex9 (talk) 01:05, 19 December 2020 (UTC) Bold text The graph really is a scientist's dream. It's so pretty that I had to add it to the explanation, but I'm not sure my upload worked (permissions?). Someone should screen grab fig 2 from the FDA briefing and add it. Mperrotta (talk) 03:56, 19 December 2020 (UTC)

I dispute that graphs are only a way of visualizing data; this graph is actually the platonic graph talked about in a textbook about graphs which funnily I found on xkcd. tldr: a good graph makes the truth obvious. This is everything working out as it should be. 172.69.63.135 08:28, 19 December 2020 (UTC)

In the kinds of statistical analyses I have been involved with, this is what's called a "bridge of the nose" analysis. It hits you right between the eyes. Roll on science. (brad)

Interestingly, the "Statistical Analysis" section of the cited study reads, in its entirety: "No formal statistical hypothesis was tested in this study and all results were descriptive." Even they went by the "hot damn check out this chart" test. Anyhow, is that notable enough to put somewhere in the explanation? 172.69.248.144 18:12, 21 December 2020 (UTC)