Editing 1574: Trouble for Science

{{comic
| number    = 1574
| date      = September 7, 2015
| title     = Trouble for Science
| image     = trouble_for_science.png
| titletext = Careful mathematical analysis demonstrates small-scale irregularities in Gaussian distribution
}}

==Explanation==
The comic highlights the fact that several well-publicized scientific critiques have recently been published that raise questions about some commonly accepted scientific methods. For scientists, these critiques serve as reminders of the dangers of overconfidence in any method, hopefully leading those who have naively accepted results to remember that any scientific conclusion is by its very nature tentative and limited by methodological reliability. However, popular press reporting of these papers may lead a general public of modest scientific literacy to the impression that science might be in trouble, as implied by the title. Some of these methodological issues and shortcomings are well known in the scientific community but are – for better or worse – the best toolkit science has at its disposal today. This is however greatly exaggerated by the last (fictional) headline, which suggests that Bunsen burners in fact have a cooling effect, which is of course absolutely ridiculous, but would nevertheless change one more fundamental scientific belief drastically. Additionally, each headline contains irony or a double meaning for comical effect.

The title of five scientific articles are shown:

;Many commercial antibody-based immunoassays are unreliable
This sentence is true. See Kebaneilwe Lebani, [http://espace.library.uq.edu.au/view/UQ:352531 Antibody Discovery for Development of a Serotyping Dengue Virus NS1 Capture Assay], 2014. In this PhD thesis, 11 references are given.

;Problems with the p-value as an indicator of significance
In empirical research, one is usually interested in effects / results / relationships in a population. However, for practical reasons, only smaller subsets of populations are available to the researcher. These are called samples. Usually an effect of interest is tested using a sample. The purpose of hypothesis testing is to determine whether the observed effect (or lack of effect) in a sample is a random artifact of our particular sample, or whether there is a good chance that it also exists in the population.

Generally a null hypothesis states that there is no effect in the population while the alternative hypothesis states that there is an effect.

P-values are used in hypothesis testing. The p-value is the probability of observing an effect / result / relationship in your sample data, given that no such effect / result / relationship exists in the population. It is based on the sample data and the particular statistic (such as sample average, t, or F). A statistic is the result of a calculation based on the sample. A p-value can be calculated for each statistic of interest. Formally, the p-value is the probability of observing a test statistic equal to or greater than the one based on the sample data, given that the null hypothesis is true.

The threshold for p-value cutoff, α, is pre-specified (usually 5% or 1%, which is more conservative). When the p-value is lower to or equal to α, (that is, there is over a 1-a chance that the result was not coincidental due to a bad sample) the null hypothesis is rejected in favor of the alternative hypothesis. When it is higher than α, the null hypothesis is retained.

The value used for ''α'' has been proposed by [http://web.lru.dk/sites/lru.dk/files/lru/docs/kap9/kapitel_9_126_On_the_origins.pdf Fisher] and is arbitrary.

The use of p-values as a measure of statistical significance is frequently criticized, for example in [http://wiki.bio.dtu.dk/~agpe/papers/pval_notuseful.pdf Hubbard and Lindsay]. Randall has demonstrated this problem in the past in [[882: Significant]].
;Overfeeding of laboratory rodents compromises animal models

[http://tpx.sagepub.com/content/24/6/757.full.pdf Keenan et al.] makes this case. Additionally, the word model takes on two meanings. In one sense, a model can refer to a scientific description that makes sense of a phenomenon; in another sense, model can refer to an individual whose job it is to demonstrate fashions, typically fashionable outfits. Fashion models are notorious for being exceptionally thin, and so overfeeding would compromise their job as a model.

;Replication study fails to reproduce many published results
A [https://explorable.com/replication-study Replication Study] is a study designed to duplicate the results of a previous study by using the same methods for a different set of subjects and experimenters. It aims to recreate the results to gain confidence in the results of the previous study as well as ensuring that the findings of the previous study are transferable to other similar areas of study.

Randall is probably referring to this recent study described in Nature: [http://www.nature.com/news/over-half-of-psychology-studies-fail-reproducibility-test-1.18248 Over half of psychology studies fail reproducibility test.] It might also be a reference to at least 3 studies mentioned here: http://www.jove.com/blog/2012/05/03/studies-show-only-10-of-published-science-articles-are-reproducible-what-is-happening. There is also irony in the phrasing of the title because in biology replication is a form of reproduction.

Another possible interpretation of this headline is that a replication study, which may have successfully replicated the results of the specific study it was designed for, failed to reproduce the published results of many other unrelated studies. The headline is quite vague as to which results have been considered in this study.

;Controlled trials show Bunsen burners make things colder
The theme of this comic is that commonly accepted scientific methods can be unreliable, and the joke here is that a Bunsen burner, a device intended to heat things, is newly discovered to always cool things instead, which would be absurd.

In theory, yes, putting a Bunsen Burner underneath an object that's already incredibly hot would, slowly, equalize the temperature between the flame and object resulting in cooling. Given that a Bunsen Burner burns between 1000 {{w|Kelvin|K}} and 2000 K, there is probably some methodological error if the testing materials were already much hotter than the flame (more than 2000 K). It's also possible that if the "controlled trial" involved a Bunsen burner that was not lit, but was turned on to allow gas to flow, it would have a cooling effect as the gas expanded from the line pressure to atmospheric pressure. Another alternative theory is that a cold substance, such as cold water or frigid air, was fed through the burner against a warmer object.

Alternatively, a trial could be set up to test something against a Bunsen burner on the one hand, and an even hotter flame on the other hand. As compared to that hotter flame, the Bunsen burner would not heat up the tested material as much, resulting in something being made "colder" than the alternative.

As in the previous headline, the key to understanding the joke here is to examine the headline's ambiguity, as no clue is given about ''how'' the trials were controlled.

;Careful mathematical analysis demonstrates small-scale irregularities in Gaussian distribution
This is another joke of a premise that is obviously untrue. The {{w|Gaussian function|Gaussian distribution}} is a mathematical construct that is generally known as the bell curve or the Normal distribution. As it is an ideal mathematical construction, by definition, it cannot have any irregularities - similar to how the equation y = 2x + 1 cannot have small-scale irregularities. The joke probably alludes to the fact that many types of observations are frequently initially modeled as a Gaussian distribution, though on careful observation the actual distribution of outcomes will often deviate from a pure Gaussian distribution.

In addition, an experiment to test a Gaussian distribution will have a finite sample size, giving a non-exact Gaussian distribution. A possible paper submitted would conclude that this result is "approximately a normal distribution" with "small-scale irregularities". A news reporter without knowledge of statistics could easily misinterpret that this paper decisively concludes errors in the mathematical definitions (rather than coming from random error inherent in experimenting).

==Transcript==
:[Five panels, each with the top part of a scientific article, where only the title is legible. Below is the list of authors and subheading and text in unreadable wiggles.]

:Many Commercial Antibody-Based Immunoassays Are Unreliable

:Problems With the p-Value as an Indicator of Significance

:Overfeeding of Laboratory Rodents Compromises Animal Models

:Replication Study Fails to Reproduce Many Published Results

:Controlled Trials Show Bunsen Burners Make Things Colder

{{comic discussion}}
[[Category:Science]]
[[Category:Biology]]
[[Category:Chemistry]]
[[Category:Math]]
[[Category:Physics]]