Editing 1313: Regex Golf

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 10: Line 10:
 
The comic talks about {{w|regular expressions}}, which are a way to specify textual patterns. Given a regular expression, one can search for the pattern it specifies inside a text string. If the pattern is found, it's said that the pattern "matches" the string; if it's not found, it's said it doesn't match. The title of the comic and the first panel is based on "[https://regex.alf.nu/ regex golf]", which is a discipline of "{{w|code golf}}", a game in which programmers attempt to solve a given programming problem using as few characters as possible, analogous to the number of {{w|golf}} shots it takes to reach the goal. In regex golfing, the programmer is given two sets of text fragments, and tries to write the shortest possible regular expression which would match all elements of one set, while at the same time not matching any element from the other set. The day after this comic was released, Randall mentioned he got distracted by https://regex.alf.nu, a website with a regexp golf game, while researching for the ''[[what if? (blog)|what if?]]'' article ''{{what if|78|T-rex Calories}}''. Regular expressions have been mentioned on xkcd in [[:Category:Regex|many other comics]].
 
The comic talks about {{w|regular expressions}}, which are a way to specify textual patterns. Given a regular expression, one can search for the pattern it specifies inside a text string. If the pattern is found, it's said that the pattern "matches" the string; if it's not found, it's said it doesn't match. The title of the comic and the first panel is based on "[https://regex.alf.nu/ regex golf]", which is a discipline of "{{w|code golf}}", a game in which programmers attempt to solve a given programming problem using as few characters as possible, analogous to the number of {{w|golf}} shots it takes to reach the goal. In regex golfing, the programmer is given two sets of text fragments, and tries to write the shortest possible regular expression which would match all elements of one set, while at the same time not matching any element from the other set. The day after this comic was released, Randall mentioned he got distracted by https://regex.alf.nu, a website with a regexp golf game, while researching for the ''[[what if? (blog)|what if?]]'' article ''{{what if|78|T-rex Calories}}''. Regular expressions have been mentioned on xkcd in [[:Category:Regex|many other comics]].
  
In the regex golf challenge Megan faced, the two sets are the subtitles of the (then-extant) films from the ''{{w|Star Wars}}'' and ''{{w|Star Trek}}'' franchises. Her regex must match all ''Star Wars'' subtitles, and must not match any ''Star Trek'' subtitle. {{w|Subtitle (titling)|Subtitles}} are the secondary titles of the movies, after the ''"Star Trek: "'' or ''"Star Wars Episode N: "''. For example, in ''Star Wars Episode I: The Phantom Menace'', the subtitle is ''The Phantom Menace''. In the first panel, she has created a 12-character regex solving the challenge.
+
The regex golf challenge Megan faces consists of matching all subtitles of (then extant) ''{{w|Star Wars}}'' films, while not matching any subtitle of ''{{w|Star Trek}}'' movies. {{w|Subtitle (titling)|Subtitles}} are the secondary titles of the movies, after the ''"Star Trek: "'' or ''"Star Wars Episode N: "''. For example, in ''Star Wars Episode I: The Phantom Menace'', the subtitle is ''The Phantom Menace''. In the first panel, she created a 12-character regex solving the challenge.
  
Then she moved on to building a tool which would automatically build such a regex for arbitrary lists of text, which could be described as {{w|meta}}- regex golfing. But as she has lost this tool, she needs to search through her files and chooses a tool called "{{w|grep}}" to find it. This tool uses regexes, implying that she needs a regular expression that would find any code that appears to be a regex golf generator, which leads to another "meta-" layer of abstraction. At the end, Megan notes this sequence of meta-meta-... might go to infinity and Cueball quips that she now has "infinite problems" as a result of her efforts; Megan retorts that she already had "infinite problems" because she's geeky enough to run meta-versions of programs on themselves, and stubborn enough to continue on until she fails, to the exclusion of all else. This also seems to be a reference to a famous quote by {{w|Jamie Zawinski}} (see also ''[[1171: Perl Problems]]''):
+
Then she moved on to building a tool which would automatically build such a regex for arbitrary lists of text, which could be described as {{w|meta}}- regex golfing. But as she has lost this tool, she needs to search through her files and chooses a tool called "{{w|grep}}" to find it. This implies that she needs a regular expression that would find any code that appears to be a regex golf generator, which leads to another "meta-" layer of abstraction. At the end, Megan notes this sequence of meta-meta-... might go to infinity and Cueball quips that she now has "infinite problems" as a result of her efforts; Megan retorts that she already had "infinite problems" because she's geeky enough to run meta-versions of programs on themselves, and stubborn enough to continue on until she fails, to the exclusion of all else. This also seems to be a reference to a famous quote by {{w|Jamie Zawinski}} (see also ''[[1171: Perl Problems]]''):
  
 
{{Quote|Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.}}
 
{{Quote|Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.}}
  
 
===Regular expressions===
 
===Regular expressions===
The first regex Megan uses is <code>/m | [tn]|b/</code>, said to match ''Star Wars'' subtitles but not ''Star Trek''. The forward slashes <code>/</code> just mark the start and end of the regex. The <code>|</code> character means "or", so the regex matches any string that contains the patterns "<code>m </code>", "<code> [tn]</code>" or "<code>b</code>" (including the spaces). The square brackets match one of the enclosed characters, meaning that "<code> [tn]</code>" matches either "<code> t</code>" or "<code> n</code>" (that is, a space followed by a letter "t" or "n").
+
The first regex Megan uses is <code>/m | [tn]|b/</code>, said to match ''Star Wars'' subtitles but not ''Star Trek''. The forward slashes <code>/</code> just mark the start and end of the regex. The <code>|</code> character means "or", so the regex matches any string that contains the patterns "<code>m </code>", "<code> [tn]</code>" or "<code>b</code>" (including the spaces). The square brackets match one of the enclosed characters, meaning that "<code> [tn]</code>" matches either "<code> t</code>" or "<code> n</code>". The regex is apparently case-insensitive, because it wouldn't work otherwise.
 
 
(Note that in this explanation, the regex has been written in lower case. In general, regular expressions are case-sensitive—that is, they treat upper- and lower-case letters as different. Regexes ''can'' be set to ignore case, and given that [[Randall]]'s comic lettering is in all-caps, we can assume that this is done here.)
 
  
 
The Star Wars subtitles match the parts of the regex in the following way:
 
The Star Wars subtitles match the parts of the regex in the following way:
* "The Phanto<u>m </u>Menace" is matched by "<code>m </code>".
+
*"The Phanto<u>m </u>Menace" is matched by "<code>m </code>".
* "Attack of<u> t</u>he Clones" is matched by "<code> [tn]</code>".
+
*"Attack of<u> t</u>he Clones" is matched by "<code> [tn]</code>".
* "Revenge of<u> t</u>he Sith" is matched by "<code> [tn]</code>".
+
*"Revenge of<u> t</u>he Sith" is matched by "<code> [tn]</code>".
* "A<u> N</u>ew Hope" is matched by "<code> [tn]</code>".
+
*"A<u> N</u>ew Hope" is matched by "<code> [tn]</code>".
* "The Empire Strikes <u>B</u>ack" is matched by "<code>b</code>".
+
*"The Empire Strikes <u>B</u>ack" is matched by "<code>b</code>".
* "Return of<u> t</u>he Jedi" is matched by "<code> [tn]</code>".
+
*"Return of<u> t</u>he Jedi" is matched by "<code> [tn]</code>".
 
 
On the other hand, no ''Star Trek'' subtitle contains an M followed by a space, a T or an N preceded by a space, or any B, so the regex does not match any of them. Note that in the first six ("Original series") films, all subtitles start with "The", but as the "T" is the first character, it is not preceded by a space. Here is the list that Megan probably used:
 
* Original series:
 
** The Motion Picture
 
** The Wrath of Khan
 
** The Search For Spock
 
** The Voyage Home
 
** The Final Frontier
 
** The Undiscovered Country
 
* The Next Generation:
 
** Generations
 
** First Contact
 
** Insurrection
 
** Nemesis
 
* Reboot series:
 
** (No subtitle)
 
** Into Darkness
 
  
The animated film ''Star Wars: The Clone Wars'' was released before this comic. If it were included, it would not be matched by Megan's regex. ("<code> [tn]</code>" does not match, for the same reason as for the ''Star Trek'' films: the T is the start of the subtitle, and is not preceded by a space.) None of the subtitles of ''Star Wars'' films released since this comic ("The Force Awakens", "The Last Jedi", and "The Rise of Skywalker") match this regex, either.
+
Note that if the animated film "Star Wars: The Clone Wars" were included, it would not be matched by "<code> [tn]</code>" because the T is the start of the subtitle and is not preceded by a space. None of the "Star Wars:" films titles announced since this comic ("The Force Awakens", "The Last Jedi", and "The Rise of Skywalker") match this regex. On the other hand, none of the Star Trek subtitles contains an M followed by a space, a T or an N preceded by a space, or any B, so the regex does not match any of them. Note that in the original series all subtitles start with a "T" but it's the first character so it's not preceded by a space.
  
''Star Trek Beyond'', which was released after this comic, would incorrectly match the regex since it is the first ''Star Trek'' title to contain a "b". However, since ''Star Trek Into Darkness'' and ''Star Trek Beyond'' both lack a colon in their titles, it is [[1167: Star Trek into Darkness|debatable]] whether they can truly be considered to have subtitles.
+
Here is the list that Megan probably used:
 +
*Original series:
 +
**The Motion Picture
 +
**The Wrath of Khan
 +
**The Search For Spock
 +
**The Voyage Home
 +
**The Final Frontier
 +
**The Undiscovered Country
 +
*The Next Generation:
 +
**Generations
 +
**First Contact
 +
**Insurrection
 +
**Nemesis
 +
*Reboot series:
 +
**Into Darkness
  
In the last panel ("...and beyond"), Megan uses the regular expression <code>/(meta-)*regex golf/</code> to describe her problem. <code>*</code> means "zero or more" of the preceding character/group (parentheses <code>()</code> group characters). So this regex matches "regex golf", "meta-regex golf", "meta-meta-regex golf", etc. In a way this is regex golf in itself, matching all levels of meta-regex golf while not matching anything else.
+
"Star Trek Beyond", which was released after this comic, would incorrectly match the regex since it is the first "Star Trek" title to contain a "b". However, since "Star Trek Into Darkness" and "Star Trek Beyond" both lack a colon in their titles, it is [[1167: Star Trek into Darkness|debatable]] whether they can truly be considered to have subtitles. In the last panel "and beyond" Megan uses the regular expression <code>/(meta-)*regex golf/</code> to describe her problem. <code>*</code> means "zero or more" of the preceding character/group (parentheses <code>()</code> group characters). So this regex matches "regex golf", "meta-regex golf", "meta-meta-regex golf", etc. In a way this is regex golf in itself, matching all levels of meta-regex golf while not matching anything else.
  
In the title text, there is a long regex that is the solution of another regex golf challenge: matching the last names of all elected US presidents but not their opponents. Note that the list of opponents includes some people who were previously or later became presidents, or who share a last name with someone who was president, so taken literally this is impossible. To make this work, the list of opponents must exclude any names of presidents. The regular expression itself works in a very similar way to the ''Star Wars''/''Trek'' one, including several different patterns separated by <code>|</code>. Each elected president matches one pattern while each opponent matches none.
+
In the title text, there is a long regex that is the solution of another regex golf challenge: matching the last names of all elected US presidents but not their opponents. Note that the list of opponents include some people who were previously or later became presidents, or whose last name matches that of another person who was president, so taken literally this is impossible. To make this work the list of opponents must exclude any names of presidents. The regular expression itself works in a very similar way to the Star Wars/Trek one, including several different patterns separated by <code>|</code>. Each elected president matches one pattern while each opponent matches none.
  
 
The regex does not match either of the presidents elected since the comic’s release ("Trump" and "Biden"), and thus would need to be updated. The regex does match Hillary Clinton's last name, but because a person with the same last name (Bill Clinton) was president, this isn't a mistake. There was already a losing opponent called George Clinton who ran in 1792 and 1812.
 
The regex does not match either of the presidents elected since the comic’s release ("Trump" and "Biden"), and thus would need to be updated. The regex does match Hillary Clinton's last name, but because a person with the same last name (Bill Clinton) was president, this isn't a mistake. There was already a losing opponent called George Clinton who ran in 1792 and 1812.
  
Here is a list of elected presidents and the patterns they match:
+
Here is a list of elected president and the patterns they match:
 
{| class="wikitable"
 
{| class="wikitable"
 
!Number
 
!Number
Line 223: Line 218:
 
|}
 
|}
  
Four presidents ({{w|John Tyler}}, {{w|Millard Fillmore}}, {{w|Chester A. Arthur}}, and {{w|Gerald Ford}}) are omitted because they were never ''elected'' president. Each of them became president after the resignation or death of their predecessor. (Five other presidents—Coolidge, Truman, Theodore Roosevelt, and both Andrew and Lyndon B. Johnson—also succeeded to the office, but then went on to win a later presidential election.)
+
Some presidents are missing because they weren't elected but became presidents after the resignation/death of their formers.
  
 
And here is a list of how many unique last names are matched by each expression:
 
And here is a list of how many unique last names are matched by each expression:
Line 274: Line 269:
  
 
==Trivia==
 
==Trivia==
The idea that the winning president could be picked due to the letter structure of their name (as opposed to how well they campaigned) was the main twist in the 2006 movie {{w|Man of the Year (2006 film)|''Man of the Year''}}.
+
The idea that the winning president could be picked due to the letter structure of their name (as opposed to how well they campaigned) was the main twist in the 2006 movie {{w|Man of the Year (2006 film)|''Man of the Year''}}
  
 
Theoretically, such a 'pick' could easily used (played straight, or subverted) in [[2383: Electoral Precedent 2020]], or in a further update.<!-- although the current situation (early 2024) could not adopt it, without adding non-regex specifications to the argument (incumbant/returning/convictions/diagnosis, e.g.), given both likely options are proven already 'electable'... but might be worth reanalysing this statement once we get beyond all Primaries/surprises/etc and know for certain it's not Haley v. Harris, or whatever. Idea left open to revisit this idea at some future date when it could directly work! -->
 
Theoretically, such a 'pick' could easily used (played straight, or subverted) in [[2383: Electoral Precedent 2020]], or in a further update.<!-- although the current situation (early 2024) could not adopt it, without adding non-regex specifications to the argument (incumbant/returning/convictions/diagnosis, e.g.), given both likely options are proven already 'electable'... but might be worth reanalysing this statement once we get beyond all Primaries/surprises/etc and know for certain it's not Haley v. Harris, or whatever. Idea left open to revisit this idea at some future date when it could directly work! -->

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)