1574: Trouble for Science

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
Trouble for Science
Careful mathematical analysis demonstrates small-scale irregularities in Gaussian distribution
Title text: Careful mathematical analysis demonstrates small-scale irregularities in Gaussian distribution


The comic highlights the fact that several well-publicized scientific critiques have recently been published that raise questions about some commonly accepted scientific methods. For scientists, these critiques serve as reminders of the dangers of overconfidence in any method, hopefully leading those who have naively accepted results to remember that any scientific conclusion is by its very nature tentative and limited by methodological reliability. However, popular press reporting of these papers may lead a general public of modest scientific literacy to the impression that science might be in trouble, as implied by the title. Some of these methodological issues and shortcomings are well known in the scientific community but are – for better or worse – the best toolkit science has at its disposal today. This is however greatly exaggerated by the last (fictional) headline, which suggests that Bunsen burners in fact have a cooling effect, which is of course absolutely ridiculous, but would nevertheless change one more fundamental scientific belief drastically. Additionally, each headline contains irony or a double meaning for comical effect.

The titles of five scientific articles are shown:

Many commercial antibody-based immunoassays are unreliable

This sentence is true. See Kebaneilwe Lebani, Antibody Discovery for Development of a Serotyping Dengue Virus NS1 Capture Assay, 2014. In this Ph.D. thesis, 11 references are given.

Problems with the p-value as an indicator of significance

In empirical research, one is usually interested in effects, results and relationships in a population. However, for practical reasons, only smaller subsets of populations, called samples, are available to the researcher. Usually, an effect of interest is tested using a sample. The purpose of hypothesis testing is to determine whether the observed effect (or lack of effect) in a sample is a random artifact of our particular sample, or whether there is a good chance that it also exists in the population.

Generally, a null hypothesis states that there is no effect in the population while the alternative hypothesis states that there is an effect.

P-values are used in hypothesis testing. The p-value is the probability of observing an effect, result or relationship in your sample data, given that no such effect, result, or relationship exists in the population. It is based on the sample data and the particular statistic (such as sample average, t or F). A statistic is the result of a calculation based on the sample. A p-value can be calculated for each statistic of interest. Formally, the p-value is the probability of observing a test statistic equal to or greater than the one based on the sample data, given that the null hypothesis is true.

The threshold for p-value cutoff, α, is pre-specified (usually 5% or 1%, which is more conservative). When the p-value is lower to or equal to α, the null hypothesis is rejected in favor of the alternative hypothesis. When it is higher than α, the null hypothesis is retained.

The value used for α has been proposed by Fisher and is arbitrary.

The use of p-values as a measure of statistical significance is frequently criticized, for example in Hubbard & Lindsay. Randall has demonstrated this problem in the past in 882: Significant.

Overfeeding of laboratory rodents compromises animal models

Keenan et al. makes this case. Additionally, the word model takes on two meanings. In one sense, "model" can refer to a scientific description that makes sense of a phenomenon; in another sense, "model" can refer to an individual whose job it is to demonstrate fashions, typically fashionable outfits. Fashion models are notorious for being exceptionally thin, and so overfeeding would compromise their job as a model.

Replication study fails to reproduce many published results

A replication study is a study designed to duplicate the results of a previous study by using the same methods for a different set of subjects and experimenters. It aims to recreate the results to gain confidence in the results of the previous study as well as ensure that the findings of the previous study are transferable to other similar areas of study.

Randall is probably referring to this recent study described in Nature: Over half of psychology studies fail reproducibility test. It might also be a reference to at least 3 studies mentioned here: http://www.jove.com/blog/2012/05/03/studies-show-only-10-of-published-science-articles-are-reproducible-what-is-happening. There is also irony in the phrasing of the title because in biology replication is a form of reproduction.

Another possible interpretation of this headline is that a replication study, which may have successfully replicated the results of the specific study it was designed for, failed to reproduce the published results of many other unrelated studies. The headline is quite vague as to which results have been considered in this study.

Controlled trials show Bunsen burners make things colder

The theme of this comic is that commonly accepted scientific methods can be unreliable, and the joke here is that a Bunsen burner, a device intended to heat things, is newly discovered to always cool things instead, which would be absurd.

In theory, yes, putting a Bunsen Burner underneath an object that's already incredibly hot would, slowly, equalize the temperature between the flame and object resulting in cooling. Given that a Bunsen Burner burns between 1000 K and 2000 K, there is probably some methodological error if the testing materials were already much hotter than the flame (more than 2000 K). It's also possible that if the "controlled trial" involved a Bunsen burner that was not lit, but was turned on to allow gas to flow, it would have a cooling effect as the gas expanded from the line pressure to atmospheric pressure. Another alternative theory is that a cold substance, such as cold water or frigid air, was fed through the burner against a warmer object.

Alternatively, a trial could be set up to test something against a Bunsen burner on the one hand, and an even hotter flame on the other hand. As compared to that hotter flame, the Bunsen burner would not heat up the tested material as much, resulting in something being made "colder" than the alternative.

As in the previous headline, the key to understanding the joke here is to examine the headline's ambiguity, as no clue is given about how the trials were controlled.

(Title text) Careful mathematical analysis demonstrates small-scale irregularities in Gaussian distribution

This is another joke of a premise that is obviously untrue. The Gaussian distribution is a mathematical construct that is generally known as the bell curve or the Normal distribution. As it is an ideal mathematical construction, by definition, it cannot have any irregularities - similar to how the equation y = 2x + 1 cannot have small-scale irregularities. The joke probably alludes to the fact that many types of observations are frequently initially modeled as a Gaussian distribution, though on careful observation the actual distribution of outcomes will often deviate from a pure Gaussian distribution.

In addition, an experiment to test a Gaussian distribution will have a finite sample size, giving a non-exact Gaussian distribution. A possible paper submitted would conclude that this result is "approximately a normal distribution" with "small-scale irregularities". A news reporter without knowledge of statistics could easily misinterpret that this paper decisively concludes errors in the mathematical definitions (rather than coming from random error inherent in experimenting).


[Five panels, each with the top part of a scientific article, where only the title is legible. Below is the list of authors and subheading and text in unreadable wiggles.]
Many Commercial Antibody-Based Immunoassays Are Unreliable
Problems With the p-Value as an Indicator of Significance
Overfeeding of Laboratory Rodents Compromises Animal Models
Replication Study Fails to Reproduce Many Published Results
Controlled Trials Show Bunsen Burners Make Things Colder

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!


Sentence case, or down style, is one method, preferred by many print and online publications and recommended by the Publication Manual of the American Psychological Association. The only two rules are the two rules mentioned above: Capitalize the first word and all proper nouns. Everything else is in lowercase. http://www.dailywritingtips.com/rules-for-capitalization-in-titles/ 12:30, 7 September 2015 (UTC)

Problems with the p-value as an indicator of significance

The p-value alone can never be an indicator of significance. However, it is still often used as the only indicator, because a full set of parameters (including sample size, test setup, etc.) can't easily be packed into a single number. There's a nice article in nature about this problem: [1] I can also recommend story about (ab-)using hacked p-values to get maximum publicity. I hope this helps :-) -- 12:41, 7 September 2015 (UTC)

In this section, I really want to reword the p-valye explanation that "one can assume that the event observed 'exists'." Except where it's an event indirectly observed through a chained effect (unseeable gas molecules observed through brownian motion, unstable particles through detection of their decay particles, prehistoric meteorite impact through a geological/chemical fingerprint, etc) I think it should be more that "this (directly observed) event was directly linked to the presumed cause rather than spontaneous and random, at least w.r.t. the presumed cause being tested". But writing it better than I did just now. 19:36, 7 September 2015 (UTC)

I think the joke is that these newspapers are talking about how bad science is, and yet they manage to come up with a stupid story about Bunsen burners, presumably being too scientifically illiterate to know the problem. Timband (talk) 12:55, 7 September 2015 (UTC) Although reading the other comments, it's a much better joke if the Bunsen Burner story is actually true, because that makes all of them about journalists not realising that they are highlighting their own ignorance. Timband (talk) 16:05, 7 September 2015 (UTC)

See Significant for another comic on p-values.--Henke37 (talk) 14:22, 7 September 2015 (UTC)

One journal, Basic and Applied Social Psychology (vol. 37 pages 1–2, 2015), went so far as to ban p-values entirely. So, anti-p-value sentiment does seem to be on the rise. --scjphysicist (talk) 01:10, 12 September 2015 (UTC)

Controlled trials show Bunsen burners make things colder

Actually, I can easily imagine a way to use a Bunsen burner to make something colder. Involving an unlit Bunsen burner that has been placed in the freezer for a couple hours, for example. Nowhere in the headline is there any mention of a flame. --Svenman (talk) 12:59, 7 September 2015 (UTC)

Actually, there was a (badly formatted and badly placed, probably therefore now removed) comment on the explanation page earlier which pointed out that feeding a Bunsen burner from a propane bottle will cause the pressure, and therefore the temperature, in the bottle to decrease. That is a lot less contrived than my original idea. --Svenman (talk) 13:37, 7 September 2015 (UTC)
That was me. Trying to get my 2 cents in on my phone before I forgot. http://www.propane101.com/propaneregulatorfreezing.htm as an example. Mattiep (talk) 13:45, 7 September 2015 (UTC)
Thermodynamics actually doesn't guarantee that a lit Bunsen burner always heats up a cold object. It just tells us that the probability of it doing so is so high that you can trust any number of controlled trials to be unable to find a counterexample. --Gunterkoenigsmann (talk) 12:09, 29 December 2020 (UTC)
Correct me if i'm wrong here, but doesn't burning flame from a Bunsen burner cause the temperatures of the flame and the target object to equalize? Sure in most cases that results in a temperature increase in the target object, but I don't see why that would be true in all high temperature cases. The comment about "reducing the rate of heat loss in 2000K+ temp objects" would only be true if the gas (assuming any atmosphere at all) surrounding the target object was cooler than the flame from the bunsen burner. This gets worse in a perfect vacuum. If a 5000K object was in a perfect vacuum and somebody set a lit bunsen burner (assuming the tip had an Oxygen source) to spray across the target object, then the Flame would get hotter as it touched the hotter object and the object would cool as the two temperatures attempted to equalize. No reduction of heat loss would happen. Can we remove the comment about "reducing the rate of heat loss in 2000K+ temp objects" ? Harodotus (talk) 22:20, 7 September 2015 (UTC).
Found an article backing up my previous comment and lacking any objection for several hours, reveresed the note in the article.[2] Harodotus (talk) 23:58, 7 September 2015 (UTC)
Bunsen burners hasten the heat death of the universe, making things colder generally. Showing that in "controlled trials" seems like a challenge for a type 2 civilization, though. 08:30, 8 September 2015 (UTC)

I think the joke is in the wording of the headlines. The fact that a replication study fails to reproduce can be seen as a contradiction. Overfeeding rodents leads to fat rodents. This compromises their ability to function als animal (runway) models. I haven't figured out the other ones yet. But that's çause I'm dumb :-). Alva. (talk) (please sign your comments with ~~~~)

It's way simpler than that - The joke is that people outside of sciences (with no understanding really of how to science) will report basically anything that sounds shocking or exciting, especially if it proves those nerdy, scary scientists wrong! So Randall gives us a bunch of possibly headlines that to a layman read like real, scary news about science, but to scientists this is stuff that is generally well known and understood. The last one is just taking it a step further for credulous news editors - They've been lying to us all this time! 13:33, 7 September 2015 (UTC)
I think it's even simpler than that: the title is "Trouble for Science" and it shows a series of misleading headlines about misleading (i.e.: invalidated) scientific studies. The implication is "Trouble for Journalism". 14:21, 7 September 2015 (UTC)
I agree. All of the titles are poorly written. All immunoassays are antibody-based, so saying many commercial antibody-based immunoassays are unreliable is redundant, implying they have no idea what an immunoassay is. Problems with the p-value as an indicator of significance implies that there is some significant error in the use of a tool to measure significance of error, which leads one to wonder how they figured that out. If you don't know what a p-test is, the title is paradoxical. The last title would make someone assume that the controlled trials are using turned on bunsen burners to make things colder, but could mean almost anything, such as a bunsen burner being turned off the entire time, or a bunsen burner placed inside of a freezer, or even that people consider using bunsen burners in an experiment makes the experiment cool (or sweet or groovy or whatever). (talk) (please sign your comments with ~~~~)
I would appreciate someone adding info about what an immunoassay is. Teleksterling (talk) 22:53, 8 September 2015 (UTC)

I generally agree, but would say if you DO know what a p-test is, the title is paradoxical. If you don't know what a p-test is, the title is meaningless. Miamiclay (talk) 07:05, 8 September 2015 (UTC)

This comic may be in reference to Monsanto's latest ailments. (talk) (please sign your comments with ~~~~)

Replication study fails to reproduce many published results
Upon reading that specific headline, the rational behavior would be to question the veracity of all the other headlines before and after. I could see a paper picking up on that sensationalist-looking headline and ignoring the fact it casts doubt on whatever else they published. Ralfoide (talk) 14:56, 8 September 2015 (UTC)

Maybe I'm missing something obvious, but what is the irony in the first headline? Djbrasier (talk) 00:54, 9 September 2015 (UTC)

From [3]: "When a substance undergoes a phase transition (changes from one state of matter to another) it usually either takes up or releases energy. For example, when water evaporates, the kinetic energy expended as the evaporating molecules escape the attractive forces of the liquid is reflected in a decrease in temperature. The amount of energy required to induce the transition is more than the amount required to heat the water from room temperature to just short of boiling temperature, which is why evaporation is useful for cooling. " That could explain the Bunsen burner making things colder (i.e. having less kinetic energy)

About gaussian irregularities. Using a computer and floating point numbers, someone would see irregularities on a gaussian distribution. That amounts to sampling the curve with a small but finite precision. Computing the value a any given point could lead to rounding errors and would be seen as irregularities. (talk) (please sign your comments with ~~~~)

That's like saying a crack in your telescope glass has revealed new stars. 23:20, 11 September 2015 (UTC)

Gregory Chaitin makes a case for using experimentally observed mathematical relations to increase the expressiveness of mathematics beyond the limits of purely deductive axiomatic methods. If this trend is adopted, it might conceivably develop that a set of foundations that support what would then be known as the "normal distribution" could have significant irregularities which would result in either adoption of this new effect, or changing the foundational proposition from which the effect is derived, or both. Randall's headline may be predictive of the type of thing that may be seen as more mathematicians explore conjectures aided by computer computations using numeric and symbolic congruences. [Comet] 20:51, 9 September 2015 (UTC)

I think everyone is over-thinking this comic. In each headline, the question is "Well if that's the case, how did they prove it?" In other words, every test would have most likely made use of the technique that they studied in the study.

Anti-bodies-I don't know anything about this topic, so I can't explain the irony that I hypothesize to be there.

P-values-Presumably the researchers started with the null hypothesis that p-values are a good indicator of significance. They then disproved it with p<0.05.

Lab rats-They proved that animal studies are compromised. They undoubtedly used animals to conduct this experiment

Replication study-They couldn't replicate the results. To show that this is a robust phenomenon, other researchers should be able to replicate their results.

Bunsen burners-In their controlled experiment, they found that bunsen burners cool things down. But since bunsen burners are the heat-source of choice for many scientific investigations, they were probably the control heat source as well as the test.

Gaussian curve-The bell curve has irregularities in it. Assuming that these irregularities are independent, their effect is modelled by a Gaussian curve (ie the average irregularity in the faulty Gaussian curve will form a Gaussian distribution per the central limit theorem)

In each case, the joke is that the study results discredit the method that would have been used to prove the result. CAS 23:37, 11 September 2015 (UTC)

There's another interpretation. All of these articles are headlines in newspapers. Reporters will only bother to write and publish news articles about highly controversial or exciting results, framed in the most inflammatory way, regardless of their reliability or applicability. So we have carnival barkers in the news media cherry-picking and misrepresenting results they really don't understand.

But most scientists are also dependent on having a steady stream of published, novel results so they can get their grant money from the government. Which means "sexy" results that are publishable and impactful- i.e. worthy of mention in the non-scientific press. So of course we have sloppy methods and irreproduceable results-- those are the methods most likely to produce the kind of excitingly counter-intuitive results that get published and catch the notice of the mainstream media. Disciplined labs that publish properly vetted results will hit dry periods when their results are unexciting or their theories don't check out, and their grant money will dry up, and they will fall apart. 14:34, 15 September 2015 (UTC)

I think the bunsen burner part might be a reference to a demonstration a teacher once did. I can't find the reference, but when her students came in she showed them a metal plate next to a lit bunsen burner. The students observed that the side closest to the flame was colder, and she asked them to write down what they thought was going on. They wrote non-answers like, "because of heat conduction," and none of them came anywhere close to guessing the correct answer, which was simply that the teacher turned the metal plate around just before they came in. Shanek (talk) 16:46, 15 September 2015 (UTC)

I figured that this comic was mostly making a joke about how often newspapers describe things as "Trouble for Science!"... when most of the things being reported are merely niggles in one narrow area of one scientific field. Whereas this is a list of things which actually *would be* "trouble for science" in that that they would invalidate huge areas of scientific "knowledge". A few of them are real, most are not. 06:52, 23 September 2015 (UTC)

A Bunsen burner could be used to drive an absorption chiller (https://en.wikipedia.org/wiki/Absorption_refrigerator). In that case it could be said to indirectly "make things colder." (talk) (please sign your comments with ~~~~)