Editing 2494: Flawed Data

{{comic
| number    = 2494
| date      = July 26, 2021
| title     = Flawed Data
| image     = flawed_data.png
| titletext = We trained it to produce data that looked convincing, and we have to admit the results look convincing!
}}

==Explanation==
{{incomplete|Created by a flawed but CONVINCING AI. Please mention here why this explanation isn't complete. Do NOT delete this tag too soon.}}
This is another comic about what is the right or wrong way to perform research when your data is not adequate.

This time we see [[Cueball]] clearly admit that they have realized that all of their data is actually flawed. He presents them on a poster with two graphs with data point and possible fitted curves in the first panel. 

From there three different reactions to this is displayed in order of how good a decision they make based on this realization.

;Good
In the first scenario Cueball then admit that they are no longer sure about the conclusions they had drawn out from these flawed data. That is, they cannot really make any conclusions, which is the right (good) decision when realizing that the data you have is not valid.

;Bad
In the second scenario Cueball then explains that after doing a lot of math (manipulation) of their flawed data, they decided they where actually fine. Since the data is flawed, math will not make them true. Thus trying to use them, by hiding that they are flawed in lots of math is a bad approach. Alternatively they try to find reasons supported by math, why their bad data is correct, effectively changing the model and expected outcome so the bad data fits well. While statistical analysis can be used to discard "flaws", e.g. outliers in a data set, it is not vaild to this after the results didn't match your expectations. Since there are many different statistical methods and tests, trying one after the other will almost guarantee that you will eventually the outcome you are after - even if the data is flawed.

;Very bad
In the third and final scenario Cueball explains that they scrapped all the flawed data. But in stead of trying to make some new data doing research/measurements/tests, they instead trained an {{w|Artificial Intelligence}} (AI) to generate better data. This is of course not real data, but just a simulation of data. And since they are probably looking for a specific result, they could train the AI to generate data that supports this. This has nothing to do with research into the problem they are actually looking into and is thus very bad. They do gain some insights into programing the AI (see [[2173: Trained a Neural Net]]). AI is a recurring [[:Category:Artificial Intelligence|theme]] on xkcd.

In the title text the results from the very bad approach is mentioned and the fact that they got the data they where looking for made clear when they state that ''We trained it to produce data that looked convincing, and we have to admit the results look convincing!'' So of course if they successfully ask the AI for data that supports their theory, in a way that looks convincing, that would be what they got back.

==Transcript==
:[Cueball is pointing a stick at a poster hanging behind him while addressing an unseen audience. There are two graphs on the poster with data points and fitting curves.]
:Cueball: We realized all our data is flawed.

:[The three next panels all have a label in a frame going over the top of each panels frame. The poster can no longer be seen in the rest of the panels. Cueball has taken the stick down.]
:Label: Good
:Cueball: ...So we're not sure about our conclusions.

:[Cueball holds the pointer almost as in the first panel.]
:Label: Bad
:Cueball: ...So we did lots of math and then decided our data is actually fine.

:[Cueball holds the pointer so it point upwards. Also he lifts his other hand a bit up.]
:Label: Very bad
:Cueball: ...So we trained an AI to generate better data.

{{comic discussion}}

[[Category:Comics featuring Cueball]]
[[Category:Science]]
[[Category:Statistics]]
[[Category:Artificial Intelligence]]