Editing 1574: Trouble for Science

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 10: Line 10:
 
The comic highlights the fact that several well-publicized scientific critiques have recently been published that raise questions about some commonly accepted scientific methods. For scientists, these critiques serve as reminders of the dangers of overconfidence in any method, hopefully leading those who have naively accepted results to remember that any scientific conclusion is by its very nature tentative and limited by methodological reliability. However, popular press reporting of these papers may lead a general public of modest scientific literacy to the impression that science might be in trouble, as implied by the title. Some of these methodological issues and shortcomings are well known in the scientific community but are – for better or worse – the best toolkit science has at its disposal today. This is however greatly exaggerated by the last (fictional) headline, which suggests that Bunsen burners in fact have a cooling effect, which is of course absolutely ridiculous, but would nevertheless change one more fundamental scientific belief drastically. Additionally, each headline contains irony or a double meaning for comical effect.
 
The comic highlights the fact that several well-publicized scientific critiques have recently been published that raise questions about some commonly accepted scientific methods. For scientists, these critiques serve as reminders of the dangers of overconfidence in any method, hopefully leading those who have naively accepted results to remember that any scientific conclusion is by its very nature tentative and limited by methodological reliability. However, popular press reporting of these papers may lead a general public of modest scientific literacy to the impression that science might be in trouble, as implied by the title. Some of these methodological issues and shortcomings are well known in the scientific community but are – for better or worse – the best toolkit science has at its disposal today. This is however greatly exaggerated by the last (fictional) headline, which suggests that Bunsen burners in fact have a cooling effect, which is of course absolutely ridiculous, but would nevertheless change one more fundamental scientific belief drastically. Additionally, each headline contains irony or a double meaning for comical effect.
  
The titles of five scientific articles are shown:
+
The title of five scientific articles are shown:
  
 
;Many commercial antibody-based immunoassays are unreliable
 
;Many commercial antibody-based immunoassays are unreliable
This sentence is true. See Kebaneilwe Lebani, [http://espace.library.uq.edu.au/view/UQ:352531 Antibody Discovery for Development of a Serotyping Dengue Virus NS1 Capture Assay], 2014. In this Ph.D. thesis, 11 references are given.
+
This sentence is true. See Kebaneilwe Lebani, [http://espace.library.uq.edu.au/view/UQ:352531 Antibody Discovery for Development of a Serotyping Dengue Virus NS1 Capture Assay], 2014. In this PhD thesis, 11 references are given.
  
 
;Problems with the p-value as an indicator of significance
 
;Problems with the p-value as an indicator of significance
In empirical research, one is usually interested in effects, results and relationships in a population. However, for practical reasons, only smaller subsets of populations, called samples, are available to the researcher. Usually, an effect of interest is tested using a sample. The purpose of hypothesis testing is to determine whether the observed effect (or lack of effect) in a sample is a random artifact of our particular sample, or whether there is a good chance that it also exists in the population.
+
In empirical research, one is usually interested in effects / results / relationships in a population. However, for practical reasons, only smaller subsets of populations are available to the researcher. These are called samples. Usually an effect of interest is tested using a sample. The purpose of hypothesis testing is to determine whether the observed effect (or lack of effect) in a sample is a random artifact of our particular sample, or whether there is a good chance that it also exists in the population.
  
Generally, a null hypothesis states that there is no effect in the population while the alternative hypothesis states that there is an effect.
+
Generally a null hypothesis states that there is no effect in the population while the alternative hypothesis states that there is an effect.
  
P-values are used in hypothesis testing. The p-value is the probability of observing an effect, result or relationship in your sample data, given that no such effect, result, or relationship exists in the population. It is based on the sample data and the particular statistic (such as sample average, t or F). A statistic is the result of a calculation based on the sample. A p-value can be calculated for each statistic of interest. Formally, the p-value is the probability of observing a test statistic equal to or greater than the one based on the sample data, given that the null hypothesis is true.
+
P-values are used in hypothesis testing. The p-value is the probability of observing an effect / result / relationship in your sample data, given that no such effect / result / relationship exists in the population. It is based on the sample data and the particular statistic (such as sample average, t, or F). A statistic is the result of a calculation based on the sample. A p-value can be calculated for each statistic of interest. Formally, the p-value is the probability of observing a test statistic equal to or greater than the one based on the sample data, given that the null hypothesis is true.
  
The threshold for p-value cutoff, α, is pre-specified (usually 5% or 1%, which is more conservative). When the p-value is lower to or equal to α, the null hypothesis is rejected in favor of the alternative hypothesis. When it is higher than α, the null hypothesis is retained.
+
The threshold for p-value cutoff, α, is pre-specified (usually 5% or 1%, which is more conservative). When the p-value is lower to or equal to α, (that is, there is over a 1-a chance that the result was not coincidental due to a bad sample) the null hypothesis is rejected in favor of the alternative hypothesis. When it is higher than α, the null hypothesis is retained.
  
 
The value used for ''α'' has been proposed by [http://web.lru.dk/sites/lru.dk/files/lru/docs/kap9/kapitel_9_126_On_the_origins.pdf Fisher] and is arbitrary.
 
The value used for ''α'' has been proposed by [http://web.lru.dk/sites/lru.dk/files/lru/docs/kap9/kapitel_9_126_On_the_origins.pdf Fisher] and is arbitrary.
  
The use of p-values as a measure of statistical significance is frequently criticized, for example in [http://web.archive.org/web/20161021014340/http://wiki.bio.dtu.dk/~agpe/papers/pval_notuseful.pdf Hubbard & Lindsay]. Randall has demonstrated this problem in the past in [[882: Significant]].
+
The use of p-values as a measure of statistical significance is frequently criticized, for example in [http://wiki.bio.dtu.dk/~agpe/papers/pval_notuseful.pdf Hubbard and Lindsay]. Randall has demonstrated this problem in the past in [[882: Significant]].
 
;Overfeeding of laboratory rodents compromises animal models
 
;Overfeeding of laboratory rodents compromises animal models
  
[http://tpx.sagepub.com/content/24/6/757.full.pdf Keenan et al.] makes this case. Additionally, the word model takes on two meanings. In one sense, "model" can refer to a scientific description that makes sense of a phenomenon; in another sense, "model" can refer to an individual whose job it is to demonstrate fashions, typically fashionable outfits. Fashion models are notorious for being exceptionally thin, and so overfeeding would compromise their job as a model.
+
[http://tpx.sagepub.com/content/24/6/757.full.pdf Keenan et al.] makes this case. Additionally, the word model takes on two meanings. In one sense, a model can refer to a scientific description that makes sense of a phenomenon; in another sense, model can refer to an individual whose job it is to demonstrate fashions, typically fashionable outfits. Fashion models are notorious for being exceptionally thin, and so overfeeding would compromise their job as a model.
  
 
;Replication study fails to reproduce many published results
 
;Replication study fails to reproduce many published results
A [https://explorable.com/replication-study replication study] is a study designed to duplicate the results of a previous study by using the same methods for a different set of subjects and experimenters. It aims to recreate the results to gain confidence in the results of the previous study as well as ensure that the findings of the previous study are transferable to other similar areas of study.
+
A [https://explorable.com/replication-study Replication Study] is a study designed to duplicate the results of a previous study by using the same methods for a different set of subjects and experimenters. It aims to recreate the results to gain confidence in the results of the previous study as well as ensuring that the findings of the previous study are transferable to other similar areas of study.
  
 
Randall is probably referring to this recent study described in Nature: [http://www.nature.com/news/over-half-of-psychology-studies-fail-reproducibility-test-1.18248 Over half of psychology studies fail reproducibility test.] It might also be a reference to at least 3 studies mentioned here: http://www.jove.com/blog/2012/05/03/studies-show-only-10-of-published-science-articles-are-reproducible-what-is-happening. There is also irony in the phrasing of the title because in biology replication is a form of reproduction.
 
Randall is probably referring to this recent study described in Nature: [http://www.nature.com/news/over-half-of-psychology-studies-fail-reproducibility-test-1.18248 Over half of psychology studies fail reproducibility test.] It might also be a reference to at least 3 studies mentioned here: http://www.jove.com/blog/2012/05/03/studies-show-only-10-of-published-science-articles-are-reproducible-what-is-happening. There is also irony in the phrasing of the title because in biology replication is a form of reproduction.
Line 47: Line 47:
 
As in the previous headline, the key to understanding the joke here is to examine the headline's ambiguity, as no clue is given about ''how'' the trials were controlled.
 
As in the previous headline, the key to understanding the joke here is to examine the headline's ambiguity, as no clue is given about ''how'' the trials were controlled.
  
;(Title text) Careful mathematical analysis demonstrates small-scale irregularities in Gaussian distribution
+
;Careful mathematical analysis demonstrates small-scale irregularities in Gaussian distribution
 
This is another joke of a premise that is obviously untrue. The {{w|Gaussian function|Gaussian distribution}} is a mathematical construct that is generally known as the bell curve or the Normal distribution. As it is an ideal mathematical construction, by definition, it cannot have any irregularities - similar to how the equation y = 2x + 1 cannot have small-scale irregularities. The joke probably alludes to the fact that many types of observations are frequently initially modeled as a Gaussian distribution, though on careful observation the actual distribution of outcomes will often deviate from a pure Gaussian distribution.
 
This is another joke of a premise that is obviously untrue. The {{w|Gaussian function|Gaussian distribution}} is a mathematical construct that is generally known as the bell curve or the Normal distribution. As it is an ideal mathematical construction, by definition, it cannot have any irregularities - similar to how the equation y = 2x + 1 cannot have small-scale irregularities. The joke probably alludes to the fact that many types of observations are frequently initially modeled as a Gaussian distribution, though on careful observation the actual distribution of outcomes will often deviate from a pure Gaussian distribution.
  
Line 71: Line 71:
 
[[Category:Math]]
 
[[Category:Math]]
 
[[Category:Physics]]
 
[[Category:Physics]]
[[Category:Statistics]]
 
[[Category:Scientific research]]
 

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)