Editing 2494: Flawed Data

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 4: Line 4:
 
| title    = Flawed Data
 
| title    = Flawed Data
 
| image    = flawed_data.png
 
| image    = flawed_data.png
| titletext = We trained it to produce data that looked convincing, and we have to admit, the results look convincing!
+
| titletext = We trained it to produce data that looked convincing, and we have to admit the results look convincing!
 
}}
 
}}
  
 
==Explanation==
 
==Explanation==
 
+
{{incomplete|Created by a CONVINCING AI. Please mention here why this explanation isn't complete. Do NOT delete this tag too soon.}}
This is another comic about what is the right or wrong way to perform research when data is not adequate.
 
 
 
In the first frame, [[Cueball]] presents a report on a poster (two graphs with data points and possible fitted curves), admitting that all of the data is actually flawed. He doesn't explain if it's contrary to some outcome or revelation, or perhaps a systematic error in the data-gathering process.
 
 
 
From there, three different reactions to this is displayed in order of how good a decision they make based on this realization.
 
 
 
;Good
 
In the first scenario Cueball states they are no longer sure about the conclusions they had drawn from the flawed data. This is, of course, the scientifically appropriate decision. The less reliable data is, the less reliable the conclusions that can be drawn. Ideally, flawed data would be discarded altogether, but there are situations in which better data is not available, so a compromise may be to draw tentative conclusions, but make clear that those are uncertain, due to issues with the data.
 
 
 
;Bad
 
In the second scenario Cueball then explains that after heavy manipulation ("doing a lot of math") of their flawed data, they decided they were actually fine. There are a number of methods that can be used to manipulate or "clean" data, with varying levels of complexity and reliability. Some of these methods may be valid in certain situations, but applying them after the initial analysis failed is highly suspect.  The likelihood, in such a case, is that the researchers tried different methods of data manipulation, one after another, until they found one that gave the results they wanted. This is clearly highly subject to the biases of the researchers (both conscious and unconscious) and is much less likely to result in accurate conclusions. Hence, this approach occurs in research more often than it should, and [[Randall]] is making clear that it's "bad".
 
 
 
;Very bad
 
In the third and final scenario, Cueball explains that they scrapped all the flawed data. However, instead of trying to make some new data by correctly redoing research/measurements/tests, they instead trained an {{w|Artificial Intelligence}} (AI) to generate better data from nothing but a desire to match a target outcome. This is of course not real data, but just a simulation of data, selectively sieving statistical noise for desirable qualities. And since they are probably looking for a specific result, they are training the AI to generate data that supports this. This approach is "very bad", as it not only produces no useful science, but means that future researchers will be working from entirely artificial data. Doing so would be destructive to science and would be considered incredibly unethical in any research body or association. The only purpose of such a method would be to convince others that you'd proven something interesting, rather than determining what's true (and possibly gain some experience in AI programming). AI is a recurring [[:Category:Artificial Intelligence|theme]] on xkcd.
 
 
 
In the title text, the results from the very bad approach are mentioned and the fact that they got the data they were looking for is made clear when they state that ''We trained it to produce data that looked convincing, and we have to admit, the results look convincing!'' The AI was of course trained to provide data that looks convincing, which is why they are so convinced of the results.
 
  
 
==Transcript==
 
==Transcript==
Line 30: Line 14:
 
:Cueball: We realized all our data is flawed.
 
:Cueball: We realized all our data is flawed.
  
:[The three next panels all have a label in a frame going over the top of each panels frame. The poster can no longer be seen in the rest of the panels.]
+
:[The three next panels all have a label in a frame going over the top of each panels frame. The poster can no longer be seen in the rest of the panels. Cueball has taken the stick down.]
 
:Label: Good
 
:Label: Good
:[Cueball has taken the stick down.]
 
 
:Cueball: ...So we're not sure about our conclusions.
 
:Cueball: ...So we're not sure about our conclusions.
  
 +
:[Cueball holds the pointer almost as in the first panel.]
 
:Label: Bad
 
:Label: Bad
:[Cueball holds the pointer almost as in the first panel.]
 
 
:Cueball: ...So we did lots of math and then decided our data is actually fine.
 
:Cueball: ...So we did lots of math and then decided our data is actually fine.
  
 +
:[Cueball holds the pointer so it point upwards. Also he lifts his other hand a bit up.]
 
:Label: Very bad
 
:Label: Very bad
:[Cueball holds the pointer so it point upwards. Also he lifts his other hand a bit up.]
 
 
:Cueball: ...So we trained an AI to generate better data.
 
:Cueball: ...So we trained an AI to generate better data.
  
Line 49: Line 32:
 
[[Category:Statistics]]
 
[[Category:Statistics]]
 
[[Category:Artificial Intelligence]]
 
[[Category:Artificial Intelligence]]
[[Category:Scientific research]]
 

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)