Editing 1478: P-Values

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 12: Line 12:
 
By the standard significance level, analyses with a ''p''-value less than .05 are said to be 'statistically significant'. Although the difference between .04 and .06 may seem minor, the practical consequences can be major. For example, scientific journals are much more likely to publish statistically significant results. In medical research, billions of dollars of sales may ride on whether a drug shows statistically significant benefits or not. A result which does not show the proper significance can ruin months or years of work, and might inspire desperate attempts to 'encourage' the desired outcome.
 
By the standard significance level, analyses with a ''p''-value less than .05 are said to be 'statistically significant'. Although the difference between .04 and .06 may seem minor, the practical consequences can be major. For example, scientific journals are much more likely to publish statistically significant results. In medical research, billions of dollars of sales may ride on whether a drug shows statistically significant benefits or not. A result which does not show the proper significance can ruin months or years of work, and might inspire desperate attempts to 'encourage' the desired outcome.
  
βˆ’
When performing a comparison (for example, seeing whether listening to various types of music can influence test scores), a properly designed experiment includes an ''experimental group'' (of people who listen to music while taking tests) and a ''control group'' (of people who take tests without listening to music), as well as a ''{{w|null hypothesis}}'' that "music has no effect on test scores". The test scores of each group are gathered, and a series of statistical tests are performed to produce the ''p''-value. In a nutshell, this is the probability that the observed difference (or a greater difference) in scores between the experimental and control group could occur due to random chance, if the experimental stimulus has no effect. For a more drastic example, an experiment could test whether wearing glasses affects the outcome of coin flips - there would likely be some amount of difference between the coin results when wearing glasses and not wearing glasses, and the ''p''-value serves to essentially test whether this difference is small enough to be attributed to random chance, or whether it can be said that wearing glasses actually had a significant difference on the results.
+
When performing a comparison (for example, seeing whether listening to various types of music can influence test scores), a properly designed experiment includes an ''experimental group'' (of people who listen to music while taking tests) and a ''control group'' (of people who take tests without listening to music), as well as a ''{{w|null hypothesis}}'' that "music has no effect on test scores". The test scores of each group are gathered, and a series of statistical tests are performed to produce the ''p''-value. In a nutshell, this is the probability that the observed difference (or a greater difference) in scores between the experimental and control group could occur due to random chance, if the experimental stimuli has no effect. For a more drastic example, an experiment could test whether wearing glasses affects the outcome of coin flips - there would likely be some amount of difference between the coin results when wearing glasses and not wearing glasses, and the ''p''-value serves to essentially test whether this difference is small enough to be attributed to random chance, or whether it can be said that wearing glasses actually had a significant difference on the results.
  
 
If the ''p''-value is low, then the null hypothesis is said to be ''rejected'', and it can be fairly said that, in this case, music does have a significant effect on test scores. Otherwise if the ''p''-value is too high, the data is said to ''fail to reject'' the null hypothesis, meaning that it is not necessarily counter-evidence, but rather more results are needed. The standard and generally accepted ''p''-value for experiments is <0.05, hence why all values below that number in the comic are marked "significant" at the least.
 
If the ''p''-value is low, then the null hypothesis is said to be ''rejected'', and it can be fairly said that, in this case, music does have a significant effect on test scores. Otherwise if the ''p''-value is too high, the data is said to ''fail to reject'' the null hypothesis, meaning that it is not necessarily counter-evidence, but rather more results are needed. The standard and generally accepted ''p''-value for experiments is <0.05, hence why all values below that number in the comic are marked "significant" at the least.

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)