Editing 1425: Tasks

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 8: Line 8:
  
 
==Explanation==
 
==Explanation==
[[Cueball]] appears to be asking [[Ponytail]] to write an app that determines if a given picture is (1) taken in a national park, and (2) a picture of a bird. The first question is generally harder for a human to answer, but easy for an app that has access to location information and a {{w|geographic information system}} (GIS). The second one is easy for a human but much harder for a computer. This illustrates {{w|Moravec's paradox}} from the 1980s in a modern context. By the 1950s computers were useful for tasks like {{w|trajectory optimization}}, {{w|automated theorem proving|generating novel mathematical proofs}}, and {{w|English_draughts#Computer_players|the game of checkers}}, so such high-level computation and reasoning tasks that were hard for humans turned out to be relatively easy for them. On the other hand, it turns out to be hard to "give them the skills of a one-year-old when it comes to perception", as Moravec wrote.
+
[[Cueball]] appears to be asking [[Ponytail]] to write an app that determines if a given picture is (1) taken in a national park, and (2) a picture of a bird. The first question is generally harder for a human to answer, but easy for an app that has access to location information and a {{w|geographic information system}} (GIS). The second one is easy for a human but much harder for a computer. This illustrates {{w|Moravec's paradox}} from the 1980s in a modern context. By the 1950s computers were useful for tasks like {{w|trajectory optimization}}, {{w|Logic Theorist|generating novel mathematical proofs}} and {{w|English_draughts#Computer_players|the game of checkers}}, so such high-level computation and reasoning tasks that were hard for humans turned out to be relatively easy for them. On the other hand it turns out to be hard to "give them the skills of a one-year-old when it comes to perception", as Moravec wrote.
  
 
In order to determine whether the user is in a national park, Ponytail plans to determine the user's location using the mobile device. This location will then be cross checked with a {{w|geographic information system}} (GIS) which will be able to determine whether the coordinates lie within a national park boundary.
 
In order to determine whether the user is in a national park, Ponytail plans to determine the user's location using the mobile device. This location will then be cross checked with a {{w|geographic information system}} (GIS) which will be able to determine whether the coordinates lie within a national park boundary.
  
Determining whether an image is of a given kind of natural object is far more difficult. This task falls into the area of {{w|computer vision}}. One of the goals in computer vision is to detect and classify objects within an image. This is a very challenging task for a number of reasons.
+
Determining whether an image is of a given kind of natural object is far more difficult. This task falls into the area of {{w|computer vision}}. One of the goals in computer vision is to detect and classify objects within an image. This is a very challenging task for a number of reasons.  
  
Firstly, humans use size, edge-assignment, movement, and stereoscopic vision when looking at a scene (not a picture of a thing, but the thing itself) to discern individual objects and then {{w|Figure-ground (perception)|categorize them as foreground or background}}. A photograph, however, is a static, monoscopic image that can only provide size and edge-assignment clues. Humans are only able to discern objects from background in photographs by comparing the photo against all of the things they've seen and everything they've learned about those things over the course of their life and {{w|Visual perception|identifying matching patterns}}.
+
:Firstly, humans use size, edge-assignment, movement, and stereoscopic vision when looking at a scene (not a picture of a thing, but of the thing itself) to discern individual objects and then categorize them as foreground or background.<ref>{{w|Figure-ground_(perception)}}</ref> A photograph, however, is a static, monoscopic image that can only provide size and edge-assignment clues. Humans are only able to discern objects from background in photographs by comparing the photo against all of the things they've seen and everything they've learned about those things over the course of their life and identifying matching patterns.<ref>{{w|Visual_perception}}</ref> Presumably, today's computers do not have nearly the processing power or wealth of data available as the human mind.
  
Secondly, the quality of the photograph will have an impact on a computer's ability to match patterns. For example, the object in the photograph might be partially visible or occluded. In the case of a living bird, additional complications arise from the variations among individual birds of the same species and differences in pose (flying, perching in a tree, etc.). Differentiating between visually similar objects can result in false positives. For example, is it a photo of a bird in flight or a plane <s>(or superman!)</s>? Ponytail's estimate of 5 years may be overly optimistic (see [[678: Researcher Translation]]).
+
:Secondly, the quality of the photograph will have an impact on a computer's ability to match patterns. For example, the object in the photograph might be partially visible or occluded. In the case of a living bird, additional complications arise from the variations among individual birds of the same species and differences in pose (flying, perching in a tree, etc.). Differentiating between visually similar objects can result in false positives. For example, is it a photo of a bird in flight or a plane (or superman!)? Ponytail's estimate of 5 years may be overly optimistic (see [[678: Researcher Translation]]).
  
The state-of-the-art algorithms for solving this kind of task (as of this comic's publishing) use local features (e.g. {{w|Scale-invariant feature transform|SIFT}} or {{w|Speeded up robust features|SURF}} in combination with a {{w|support vector machine}}) or a {{w|convolutional neural network}}.
+
Today's state-of-the-art algorithms for solving this kind of task mostly use local features (e.g. {{w|Scale-invariant feature transform|SIFT}} or {{w|SURF}} in combination with a {{w|support vector machine}} or {{w|convolutional neural network}}).
  
The subtitle refers to "CS", which is a common abbreviation for "{{w|Computer Science}}", of which {{w|artificial intelligence}} and {{w|computer vision}} are sub-disciplines.
+
The subtitle refers to "CS", which is a common acronym for "{{w|Computer Science}}", of which {{w|artificial intelligence}} and {{w|computer vision}} are sub-disciplines.
  
The title text mentions [http://dspace.mit.edu/bitstream/handle/1721.1/6125/AIM-100.pdf The Summer Vision Project] and {{w|Marvin Minsky}} of MIT. In the summer of 1966, he asked his undergraduate student {{w|Gerald Jay Sussman}} to [http://szeliski.org/Book/ "spend the summer linking a camera to a computer and getting the computer to describe what it saw"]. {{w|Seymour Papert}} drafted the plan, and it seems that Sussman was joined by {{w|Bill Gosper}}, {{w|Richard Greenblatt (programmer)|Richard Greenblatt}}, {{w|Leslie Lamport}}, Adolfo Guzman, Michael Speciner, John White, Benjamin, and Henneman - in case the multiple Wikipedia links don't give it away, know that this is sizable cross-section of the AI researchers of the period). The project schedule allocated one summer for the completion of this task. The required time was obviously significantly underestimated, since dozens of research groups around the world are still working on this topic today.
+
The title text mentions [http://dspace.mit.edu/bitstream/handle/1721.1/6125/AIM-100.pdf The Summer Vision Project] and {{w|Marvin Minsky}} of MIT. In the summer of 1966, he asked his undergraduate student {{w|Gerald Jay Sussman}} to "spend the summer linking a camera to a computer and getting the computer to describe what it saw" ([http://szeliski.org/Book/]). {{w|Seymour Papert}} drafted the plan, and it seems that Sussman was joined by {{w|Bill Gosper}}, {{w|Richard Greenblatt (programmer)|Richard Greenblatt}}, {{w|Leslie Lamport}}, Adolfo Guzman, Michael Speciner, John White, Benjamin, and Henneman. The project schedule allocated one summer for the completion of this task. The required time was obviously significantly underestimated, since dozens of research groups around the world are still working on this topic today.
  
A month after this comic came out, {{w|Flickr}} [http://code.flickr.net/2014/10/20/introducing-flickr-park-or-bird/ responded] with a [http://parkorbird.flickr.com/ prototype online tool] to do something similar to what the comic describes, using its automated-tagging software. According to them, the bird solution "took us less than 5 years to build, though it's definitely a hard problem, and we've still got room for improvement".
+
A month this comic came out, {{w|Flickr}} [http://code.flickr.net/2014/10/20/introducing-flickr-park-or-bird/ released] a [http://parkorbird.flickr.com/ prototype online tool] to do just what this comic describes, using its automated-tagging software to answer the bird question.
  
 
==Transcript==
 
==Transcript==
:[Ponytail sitting at a computer with Cueball standing behind her.]
+
:[Ponytail sitting at a computer with Cueball standing behind her]
 
:Cueball: When a user takes a photo, the app should check whether they're in a national park...
 
:Cueball: When a user takes a photo, the app should check whether they're in a national park...
 
:Ponytail: Sure, easy GIS lookup. Gimme a few hours.
 
:Ponytail: Sure, easy GIS lookup. Gimme a few hours.
 
:Cueball: ...and check whether the photo is of a bird.
 
:Cueball: ...and check whether the photo is of a bird.
 
:Ponytail: I'll need a research team and five years.
 
:Ponytail: I'll need a research team and five years.
 
:[Caption below the panel:]
 
 
:In CS, it can be hard to explain the difference between the easy and the virtually impossible.
 
:In CS, it can be hard to explain the difference between the easy and the virtually impossible.
  
 +
==References==
 +
<references/>
  
 
{{comic discussion}}
 
{{comic discussion}}
 
[[Category:Comics featuring Cueball]]
 
[[Category:Comics featuring Cueball]]
 
[[Category:Comics featuring Ponytail]]
 
[[Category:Comics featuring Ponytail]]
[[Category:Artificial Intelligence]]
 
[[Category:Programming]]
 
[[Category:Photography]]
 
[[Category:Comics featuring real people]]
 

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)