https://www.explainxkcd.com/wiki/api.php?action=feedcontributions&user=162.158.126.171&feedformat=atomexplain xkcd - User contributions [en]2024-03-28T22:25:32ZUser contributionsMediaWiki 1.30.0https://www.explainxkcd.com/wiki/index.php?title=1999:_Selection_Effect&diff=2206981999: Selection Effect2021-11-12T03:57:08Z<p>162.158.126.171: Fixed capitalization of final sentence.</p>
<hr />
<div><noinclude>:''"1999", this comic's number, redirects here. For the comic named "1999", see [[855: 1999]].''</noinclude><br />
<br />
{{comic<br />
| number = 1999<br />
| date = May 28, 2018<br />
| title = Selection Effect<br />
| image = selection_effect.png<br />
| titletext = fMRI testing showed that subjects who don't agree to participate are much more likely to escape from the machine mid-scan.<br />
}}<br />
<br />
==Explanation==<br />
<br />
The title refers to the effect in scientific fields where instead of investigating the whole population (i.e. ''all'' cancer patients or ''all'' trees) only a subset is analysed. This is common practice as the analysis of all specimens is often impractical. However, special care needs to be taken when selecting the sample to ensure that it accurately represents the general population. Otherwise the results are misleading and do not reflect reality. For example if 1000 people are asked about the numbers of cars they own but all live in a city the results cannot be generalised to the whole country. This is called the {{w|selection bias}}. If non-human subjects are studied this can be avoided by randomising the selection process, but this is not possible with humans as they cannot be forced to participate in a study against their will. For example, if people are asked to participate in a study about their political views it is likely that the responders care about politics while people with no clear opinion do not bother to respond. This is called the {{w|self-selection bias}}.<br />
<br />
[[Ponytail]] says that people who agree to be in a study at their lab are less likely to attempt to escape. The only way Ponytail could have come to this conclusion is if she compared those people to people who did not agree to be in the study. This implies that Ponytail has recently kidnapped people for a study, and that most of the people she kidnapped called the police, as one should do when being kidnapped. This makes sense, since if you agreed to the study, you know why you are there, while if you didn't, you may have been kidnapped. As Ponytail presents this as a finding, it appears that she was attempting to establish a protocol for randomised selection of human subjects and comparing it to the normal selection process.<br />
<br />
The comic shows Ponytail being allowed to present the results of this study at a conference; reputable scientific journals and conferences should not legitimize studies that clearly violate their ethical norms, such as by failing to obtain informed consent from human subjects before experimenting on them. Unfortunately, involuntary studies are published and presented, like this 2014 [http://america.aljazeera.com/articles/2014/6/30/facebook-ethics-labratsemotionalcontagion.html Facebook's emotional contagion study]. It is not clear how many people who did agree to participate may have attempted to call the police for assistance regardless; compare the {{w|Stanford Prison Experiment}}. This is similar to previous comics where obvious things are presented in obfuscated, scientific ways (e.g. [[1990: Driving Cars]]). Of course, any study of the way people behave when being kidnapped for scientific experiments would inherently involve kidnapping them. Therefore there is no way this kind of research could be done in an ethical fashion.<br />
<br />
The title text refers to a technique that measures brain activity, called {{w|Functional magnetic resonance imaging|Functional magnetic resonance imaging (fMRI)}}. Of course it's much more likely that people who did not sign up will resist and escape before the scan is complete.<br />
<br />
==Transcript==<br />
:[Ponytail stands on a podium giving a presentation in front of a chart with some box plots.]<br />
:Ponytail: Our research shows that compared to the overall population, people who agree to participate in scientific studies are significantly less likely to call the police to rescue them from our lab.<br />
<br />
{{comic discussion}}<br />
<br />
[[Category:Comics featuring Ponytail]]<br />
[[Category:Science]]<br />
[[Category:Psychology]]</div>162.158.126.171https://www.explainxkcd.com/wiki/index.php?title=2533:_Slope_Hypothesis_Testing&diff=2198242533: Slope Hypothesis Testing2021-10-26T11:49:58Z<p>162.158.126.171: /* Explanation */</p>
<hr />
<div>{{comic<br />
| number = 2533<br />
| date = October 25, 2021<br />
| title = Slope Hypothesis Testing<br />
| image = slope_hypothesis_testing.png<br />
| titletext = "What? I can't hear--" "What? I said, are you sure--" "CAN YOU PLEASE SPEAK--"<br />
}}<br />
<br />
==Explanation==<br />
{{incomplete|Created by a SCREAMING STATISTICALLY SIGNIFICANT STATISTICS STUDENT. Note: there's a name for when the bone in your ear pulls away after exposure to loud noise, could be thematic to reference it. There's probably also a name for the statistical mistake the comic demonstrates. Do NOT delete this tag too soon.}}<br />
"Slope hypothesis testing" is a method of testing the significance of a hypothesis involving a scatter plot.<br />
<br />
In this comic, [[Cueball]] and [[Megan]] are performing a study comparing student exam grades to the volume of their screams. Student A has the worst grade and softest scream, but Student B has the ''best'' grades and Student C the ''loudest'' scream. A trendline has been plotted, indicating a positive correlation between grades and volume...but the p-value is extremely high, indicating little statistical significance to the trend. P-value is based on both how well the data fits the trendline and how many data points have been taken; the more data points and the better they fit, the lower the p-value and more significant the data. <br />
<br />
Megan complains about the insignificance of their results, so Cueball suggests having each student scream into the microphone a few more times (the three students are still there as they can be seen behind them. The three students looks like school kids, one of them is [[Science Girl]]). <br />
<br />
Having the students scream again will not help though, because it only provides more data on the screaming without providing more data on its relation to exam scores, and is a joke around poor statistical calculations likely made in the field today. The p-value is incorrectly recalculated based on the increased number of measurements. Each student has exactly the same test scores (probably referencing the same datum as before) and have vocal volume ranges that don't drift far either (each seems to have a range of scream that is fairly consistent and far from overlapping). Megan is pleased by these results, but Cueball belatedly realizes this technique may not be scientifically valid.<br />
<br />
Measuring data multiple times can be a way to increase its accuracy, but does not increase the number of data points with regard to another metric, and the horizontally clustered points on the chart make this visually clear.<br />
<br />
The common p-value formulae assume the data points are statistically independent, that is, that the test score and volume measurement from one point don't reveal anything about those of the other points. By reusing the same exam scores separately across several measurements each, Cueball and Megan violate the independence assumption and invalidate their significance calculation. This is an example of pseudoreplication.<br />
<br />
In current AI, there's a push toward "few-shot learning", where only a few data items are used to form conclusions, rather than the usual millions of them. This comic displays danger associated with using such approaches without understanding them in depth.<br />
<br />
Additionally, a common theme in some research is the discovery of correlations that do not survive independent reproduction. This is because randomness with too few samples produces apparent correlations, and Randall has repeatedly made comics about this hopeful error.<br />
<br />
In the title text, Megan and Cueball are trying to yell over each other, asking each other to speak up so they can be heard, presumably because they are having trouble hearing from the yelling experiment.<br />
<br />
==Transcript==<br />
{{incomplete transcript|Do NOT delete this tag too soon.}}<br />
:[Three points labeled "Student A", "Student B" and "Student C" in a scatter plot with axes labeled "Stats exam grade" (60-100) and "Scream loudness (decibel)" (86-94) with a trend line]<br />
:[A line goes from the trend line to a text box with the text:]<br />
:β=1.94 <br />
:p=0.586<br />
<br />
:[In a frameless panel, Megan (holding a piece of paper) and Cueball are facing each other with three kids in the background]<br />
:Megan: Darn, not significant.<br />
:Cueball: We need more data. Have them each try yelling in to the mic a few more times.<br />
<br />
:[The same scatter plot as in the first panel except with more points for each of the students with slightly different decibel values, and the text in the text box changed to:]<br />
:β=1.94<br />
:p=0.037* <br />
:<nowiki>*</nowiki>Significant!<br />
<br />
:[Similar panel to the second one]<br />
:Megan: Perfect!<br />
:Cueball: Are you ''sure'' we're doing slope hypothesis testing right?<br />
<br />
{{comic discussion}}<br />
<br />
[[Category:Comics featuring Megan]]<br />
[[Category:Comics featuring Cueball]]<br />
[[Category:Comics featuring Science Girl]] <!-- The other two kids are also, well, kids, and thus not Hairy or Megan --><br />
[[Category:Science]]<br />
[[Category:Charts]]</div>162.158.126.171