Editing 2494: Flawed Data
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 4: | Line 4: | ||
| title = Flawed Data | | title = Flawed Data | ||
| image = flawed_data.png | | image = flawed_data.png | ||
− | | titletext = We trained it to produce data that looked convincing, and we have to admit | + | | titletext = We trained it to produce data that looked convincing, and we have to admit the results look convincing! |
}} | }} | ||
==Explanation== | ==Explanation== | ||
− | + | {{incomplete|Created by A FLAWED BUT CONVINCING AI. Please mention here why this explanation isn't complete. Do NOT delete this tag too soon.}} | |
This is another comic about what is the right or wrong way to perform research when data is not adequate. | This is another comic about what is the right or wrong way to perform research when data is not adequate. | ||
Line 16: | Line 16: | ||
;Good | ;Good | ||
− | In the first scenario Cueball states they are no longer sure about the conclusions they had drawn from the flawed data. | + | In the first scenario Cueball states they are no longer sure about the conclusions they had drawn from the flawed data. Drawing no conclusions when the data cannot be trusted is the only valid conclusion. |
;Bad | ;Bad | ||
− | In the second scenario Cueball then explains that after heavy manipulation ("doing a lot of math") of their flawed data, they decided they | + | In the second scenario Cueball then explains that after heavy manipulation ("doing a lot of math") of their flawed data, they decided they where actually fine. Since the data is flawed, manipulation will not make the data any more valid. Merely massaging the data is a dubious approach, however, if it can't be independently justified. Trying to find reasons why their bad data is actually correct, or pruning "bad" elements equally to maintain the status quo, effectively enforces any biases to support the model and expected outcome so that the "clean" data also fits well, but just as erroneously. While statistical analysis can be used to discard "flaws" (e.g. outliers) in a data set or establish lower expectations of accuracy in certain "streams of proof," it is not vaild to do this after the results don't match expectations. Since there are many different statistical methods and tests, trying one after the other and post-hoc selecting the one(s) more useful could almost guarantee that one will eventually confirm the expected outcome - even/especially if the data was unreliable. |
;Very bad | ;Very bad | ||
− | In the third and final scenario | + | In the third and final scenario Cueball explains that they scrapped all the flawed data. However, instead of trying to make some new data by correctly redoing research/measurements/tests, they instead trained an {{w|Artificial Intelligence}} (AI) to generate better data from nothing but a desire to match a target outcome. This is of course not real data, but just a simulation of data, selectively sieving statistical noise for desirable qualities. And since they are probably looking for a specific result, they are training the AI to generate data that supports this. This has nothing to do with research into the problem they are actually looking into and is very bad. They may gain some insights into programing the AI (see [[2173: Trained a Neural Net]]), however. AI is a recurring [[:Category:Artificial Intelligence|theme]] on xkcd. |
− | In the title text, the results from the very bad approach | + | In the title text, the results from the very bad approach is mentioned and the fact that they got the data they were looking for made clear when they state that ''We trained it to produce data that looked convincing, and we have to admit the results look convincing!'' The AI was of course trained to provide data that supports their initial theory, which is why they are so convinced of the promising results. |
==Transcript== | ==Transcript== | ||
Line 30: | Line 30: | ||
:Cueball: We realized all our data is flawed. | :Cueball: We realized all our data is flawed. | ||
− | :[The three next panels all have a label in a frame going over the top of each panels frame. The poster can no longer be seen in the rest of the panels.] | + | :[The three next panels all have a label in a frame going over the top of each panels frame. The poster can no longer be seen in the rest of the panels. Cueball has taken the stick down.] |
:Label: Good | :Label: Good | ||
− | |||
:Cueball: ...So we're not sure about our conclusions. | :Cueball: ...So we're not sure about our conclusions. | ||
+ | :[Cueball holds the pointer almost as in the first panel.] | ||
:Label: Bad | :Label: Bad | ||
− | |||
:Cueball: ...So we did lots of math and then decided our data is actually fine. | :Cueball: ...So we did lots of math and then decided our data is actually fine. | ||
+ | :[Cueball holds the pointer so it point upwards. Also he lifts his other hand a bit up.] | ||
:Label: Very bad | :Label: Very bad | ||
− | |||
:Cueball: ...So we trained an AI to generate better data. | :Cueball: ...So we trained an AI to generate better data. | ||
Line 49: | Line 48: | ||
[[Category:Statistics]] | [[Category:Statistics]] | ||
[[Category:Artificial Intelligence]] | [[Category:Artificial Intelligence]] | ||
− |