Editing 2054: Data Pipeline

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 8: Line 8:
  
 
==Explanation==
 
==Explanation==
In the first panel [[Cueball]] shows [[Ponytail]] and [[White Hat]] a Data Pipeline he has constructed that, as he puts it, <nowiki>'collects and processes all the data we need'</nowiki>. This implies that the three are running some sort of project that requires data processing. Ponytail assumes that this data pipeline is an unstable mess of scripts that will cease to function correctly should any unexpected input be received. Cueball tries to claim it isn't, but his hesitation (including using the word "might") essentially states that this is very likely, although he seems to hope that it might not be. Ponytail then seems impressed and expresses this to him. She, however, gets interrupted by Cueball who tells her that the system just malfunctioned and collapsed. He, however, states that he can fix it, making it seem like this cycle of patching and collapsing could repeat infinitely, or until all problems have been patched. [[:Category:Code Quality|Knowing Cueball's code, though,]] it seems more likely he can't patch it.
+
{{incomplete|Please direct all data pipelines to the explanation below and only mention here why it isn't complete. Do NOT delete this tag too soon.}}
 +
In the first panel [[Cueball]] shows [[Ponytail]] and [[White Hat]] a Data Pipeline he has constructed that, as he puts it, <nowiki>'collects and processes all the data we need'</nowiki>. This implies that the three are running some sort of project that requires data processing. Ponytail assumes that this data pipeline is an unstable mess of scripts that will cease to function correctly should any unexpected input be received. Cueball responds by telling her reluctantly that this is very likely, although he seems to hope that it might not be. Ponytail then seems impressed and expresses this to him. She, however, gets interrupted by Cueball who tells her that the system just malfunctioned and collapsed. He, however, states that he can fix it, making it seem like this cycle of patching and collapsing could repeat infinitely, or until all problems have been patched. [[Code Quality|Knowing Cueball's code, though,]] it seems more likely he can't patch it.
  
In the title text, Ponytail or White Hat proceeds to question how such an important system can run on such a small computer. However, Cueball makes it worse by saying he uses his phone due to the better connection. While this might give the pipeline more uptime, it also means its system resources are far more limited.
+
In the title text, Ponytail or White Hat proceeds to question how such an important system can run on such a small computer. However, Cueball makes it worse by saying he uses his phone due to the better connection. While this might make the pipeline functional, it also makes it far more fragile.
  
This comic can be logically connected to the Code Quality series ([[1513: Code Quality]], [[1695: Code Quality 2]] and [[1833: Code Quality 3]]), similarly showing Cueball having a coding ineptitude and Ponytail's exasperation with it, though this Cueball shows a higher level of competence by having produced something useful, albeit fragile. However, Ponytail doesn't see the actual code in this case, and there's no issues with or comments on coding syntax like in the Code Quality series.
+
This comic is a logical continuation of the Code Quality series ([[1513: Code Quality]], [[1695: Code Quality 2]] and [[1833: Code Quality 3]]), further highlighting Cueball's coding ineptitude and Ponytail's exasperation with it.
  
Cueball's hesitant response in this comic has some similarities to [[410: Math Paper]].
+
It's quite common for somebody who codes for enjoyment with most of their time to attempt to automate absolutely everything that is done.  Whenever a rote task is seen, a programmer thinks, "why is a human doing this when the time could be spent making a computer do it automatically, forever?"  Unfortunately, without the advent of strong artificial intelligence, one of the places this begins breaking down is in aggregating information from multiple sources.
 +
 
 +
People tend to publish their data via a variety of different channels, and as they are not programmers and don't share the value of consistency and computer-processability, it is all in completely different formats.  Some data is only available in print.  Some data is only available as photographs.  Some data is only available as written reports.  A certain kind of nerd will see this situation and become exciting, seeing the opportunity to automate something that nobody else thinks is reasonable to put the energy into.  They begin writing scripts that process all the different formats that all the data is in, and eventually get the whole thing working !  They can then, in theory, make a number of mind-numbing data-processing jobs obsolete.
 +
 
 +
Google has put a lot of energy into conquering this challenge on many, many fronts around the decade of the 2000s, making data more processable everywhere, and possible hastening the advent of those strong artificial intelligences, that would thrive off of the information available in already-digitized information.  A notable project was google books, where libraries were scoured for non-digital information and it was all painstakingly scanned.  Additionally, organizations have been increasingly pressured to offer their information in standardized formats that can all be processed the same way.  This continued pressure is giving more and more results, but because it must be implemented by humans who gain little immediately from the process, it is rare that adherence to the guidelines is universal.
 +
 
 +
The workaround of building many small programs that handle all the quirks is the domain of "scraping" -- downloading information intended to be presented to a human, running it through software that has been pre-programmed with what patterns to expect, and normalizing and making use of the data.
 +
Anybody who has, as a mere individual, attempted this goal, quickly realizes that as soon as the data source has the smallest change, the data becomes garbage.  Often it becomes garbage in a way that is laborious to hunt down and understand, and may not even be noticed.  This would be tragic for a corporation that was relying on the results, and would be like a trojan horse, destroying them from the inside.
  
 
==Transcript==
 
==Transcript==
 +
{{incomplete transcript|Do NOT delete this tag too soon.}}
 
:[Cueball is standing with an open laptop, showing it to Ponytail and White Hat.]
 
:[Cueball is standing with an open laptop, showing it to Ponytail and White Hat.]
 
:Cueball: Check it out - I made a full automated data pipeline that collects and processes all the information we need.
 
:Cueball: Check it out - I made a full automated data pipeline that collects and processes all the information we need.

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)