Editing 1313: Regex Golf

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 7: Line 7:
 
}}
 
}}
  
==Explanation==
+
== Explanation ==
The comic talks about {{w|regular expressions}}, which are a way to specify textual patterns. Given a regular expression, one can search for the pattern it specifies inside a text string. If the pattern is found, it's said that the pattern "matches" the string; if it's not found, it's said it doesn't match. The title of the comic and the first panel is based on "[https://regex.alf.nu/ regex golf]", which is a discipline of "{{w|code golf}}", a game in which programmers attempt to solve a given programming problem using as few characters as possible, analogous to the number of {{w|golf}} shots it takes to reach the goal. In regex golfing, the programmer is given two sets of text fragments, and tries to write the shortest possible regular expression which would match all elements of one set, while at the same time not matching any element from the other set. The day after this comic was released, Randall mentioned he got distracted by https://regex.alf.nu, a website with a regexp golf game, while researching for the ''[[what if? (blog)|what if?]]'' article ''{{what if|78|T-rex Calories}}''. Regular expressions have been mentioned on xkcd in [[:Category:Regex|many other comics]].
+
The comic talks about {{w|regular expressions}}, which are a way to specify textual patterns. One can later search for the pattern inside a text string: if the pattern is found it’s said that the pattern “matches” the string; if it's not found, it's said they do not match.
  
In the regex golf challenge Megan faced, the two sets are the subtitles of the (then-extant) films from the ''{{w|Star Wars}}'' and ''{{w|Star Trek}}'' franchises. Her regex must match all ''Star Wars'' subtitles, and must not match any ''Star Trek'' subtitle. {{w|Subtitle (titling)|Subtitles}} are the secondary titles of the movies, after the ''"Star Trek: "'' or ''"Star Wars Episode N: "''. For example, in ''Star Wars Episode I: The Phantom Menace'', the subtitle is ''The Phantom Menace''. In the first panel, she has created a 12-character regex solving the challenge.
+
The title of the comic and the first panel is based on “[http://regex.alf.nu/ regex golf]”, which is a discipline of {{w|code golf}}”, a game in which programmers attempt to solve a given programming problem using as few characters as possible, analogous to the number of {{w|golf}} shots it takes to reach the goal. In regex golfing, the programmer is given two sets of text fragments, and he/she tries to write the shortest possible regular expression which would match all elements of one set, while at the same time not matching any element from the other set.
  
Then she moved on to building a tool which would automatically build such a regex for arbitrary lists of text, which could be described as {{w|meta}}- regex golfing. But as she has lost this tool, she needs to search through her files and chooses a tool called "{{w|grep}}" to find it. This tool uses regexes, implying that she needs a regular expression that would find any code that appears to be a regex golf generator, which leads to another "meta-" layer of abstraction. At the end, Megan notes this sequence of meta-meta-... might go to infinity and Cueball quips that she now has "infinite problems" as a result of her efforts; Megan retorts that she already had "infinite problems" because she's geeky enough to run meta-versions of programs on themselves, and stubborn enough to continue on until she fails, to the exclusion of all else. This also seems to be a reference to a famous quote by {{w|Jamie Zawinski}} (see also ''[[1171: Perl Problems]]''):
+
The regex golf challenge Megan faces consists of matching all subtitles of ''{{w|Star Wars}}'' episodes, while not matching any subtitle of ''{{w|Star Trek}}'' episodes. ({{w|Subtitle (titling)|Subtitles}} are the secondary titles of the movies, after the ''“Star Trek: ”'' or ''“Star Wars Episode N: ”''. For example, in ''Star Wars Episode I: The Phantom Menace'', the subtitle is ''The Phantom Menace''.) In the first panel, she created a 12-character regex solving the challenge.
  
{{Quote|Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.}}
+
Then she moved on to building a tool which would automatically build such a regex for arbitrary lists of text, which could be described as {{w|meta}}- regex golfing. But as she has lost this tool, she needs to search through her files (using a tool called “{{w|grep}}”, i.e. ''grepping'') to find it, needing a regular expression that would find any code that appears to be a regex golf generator; which makes another “meta-” layer of abstraction. At the end, Megan notes this sequence of meta-meta-... might go to infinity and Cueball quips that she now has “infinite problems” as a result of her efforts; Megan retorts that she already had "infinite problems" because she's geeky enough to run meta-versions of programs on themselves, and stubborn enough to continue on until she fails, to the exclusion of all else. This also seems to be a reference to a famous quote (see also ''[[1171: Perl Problems]]''):
 +
 
 +
<blockquote>''Some people, when confronted with a problem, think “I know, I'll use regular expressions.Now they have two problems.''</blockquote>
  
 
===Regular expressions===
 
===Regular expressions===
The first regex Megan uses is <code>/m | [tn]|b/</code>, said to match ''Star Wars'' subtitles but not ''Star Trek''. The forward slashes <code>/</code> just mark the start and end of the regex. The <code>|</code> character means "or", so the regex matches any string that contains the patterns "<code>m </code>", "<code> [tn]</code>" or "<code>b</code>" (including the spaces). The square brackets match one of the enclosed characters, meaning that "<code> [tn]</code>" matches either "<code> t</code>" or "<code> n</code>" (that is, a space followed by a letter "t" or "n").
+
The first regex Megan uses is <code>/m | [tn]|b/</code>, said to match ''Star Wars'' subtitles but not ''Star Trek''.
  
(Note that in this explanation, the regex has been written in lower case. In general, regular expressions are case-sensitive—that is, they treat upper- and lower-case letters as different. Regexes ''can'' be set to ignore case, and given that [[Randall]]'s comic lettering is in all-caps, we can assume that this is done here.)
+
The forward slashes <code>/</code> just mark the start and end of the regex. The <code>|</code> character means “or”, so the regex matches any string that contains the patterns “<code>m </code>”, “<code> [tn]</code>” or “<code>b</code>” (including the spaces). The square brackets match one of the enclosed characters, meaning that “<code> [tn]</code>” matches either “<code> t</code>” or “<code> n</code>”. The regex is apparently case-insensitive, because it wouldn't work otherwise.
  
 
The Star Wars subtitles match the parts of the regex in the following way:
 
The Star Wars subtitles match the parts of the regex in the following way:
* "The Phanto<u>m </u>Menace" is matched by "<code>m </code>".
+
* “The Phanto<u>m </u>Menace” is matched by <code>m </code>.
* "Attack of<u> t</u>he Clones" is matched by "<code> [tn]</code>".
+
* “Attack of<u> t</u>he Clones” is matched by <code> [tn]</code>.
* "Revenge of<u> t</u>he Sith" is matched by "<code> [tn]</code>".
+
* “Revenge of<u> t</u>he Sith” is matched by <code> [tn]</code>.
* "A<u> N</u>ew Hope" is matched by "<code> [tn]</code>".
+
* “A<u> N</u>ew Hope” is matched by <code> [tn]</code>.
* "The Empire Strikes <u>B</u>ack" is matched by "<code>b</code>".
+
* “The Empire Strikes <u>B</u>ack” is matched by <code>b</code>.
* "Return of<u> t</u>he Jedi" is matched by "<code> [tn]</code>".
+
* “Return of<u> t</u>he Jedi” is matched by <code> [tn]</code>”.
 +
Note that the animated film “Star Wars: The Clone Wars” is not included.
  
On the other hand, no ''Star Trek'' subtitle contains an M followed by a space, a T or an N preceded by a space, or any B, so the regex does not match any of them. Note that in the first six ("Original series") films, all subtitles start with "The", but as the "T" is the first character, it is not preceded by a space. Here is the list that Megan probably used:
+
On the other hand, none of the Star Trek subtitles contains an M followed by a space, a T or an N preceded by a space, or any B, so the regex does not match any of them. Note that in the original series all subtitles start with a “T” but it's the first character so it's not preceded by a space.
 +
 
 +
Here is the list that Megan probably used:
 
* Original series:
 
* Original series:
 
** The Motion Picture
 
** The Motion Picture
Line 43: Line 48:
 
** Nemesis
 
** Nemesis
 
* Reboot series:
 
* Reboot series:
** (No subtitle)
+
** ''the one without a subtitle''
 
** Into Darkness
 
** Into Darkness
  
The animated film ''Star Wars: The Clone Wars'' was released before this comic. If it were included, it would not be matched by Megan's regex. ("<code> [tn]</code>" does not match, for the same reason as for the ''Star Trek'' films: the T is the start of the subtitle, and is not preceded by a space.) None of the subtitles of ''Star Wars'' films released since this comic ("The Force Awakens", "The Last Jedi", and "The Rise of Skywalker") match this regex, either.
+
In the last panel “and beyond” Megan uses the regular expression <code>/(meta-)*regex golf/</code> to describe her problem. <code>*</code> means “zero or more” of the preceding character/group (parentheses <code>()</code> group characters). So this regex matches “regex golf”, “meta-regex golf”, “meta-meta-regex golf”, etc. In a way this is regex golf in itself, matching all levels of meta-regex golf while not matching anything else.
 
 
''Star Trek Beyond'', which was released after this comic, would incorrectly match the regex since it is the first ''Star Trek'' title to contain a "b". However, since ''Star Trek Into Darkness'' and ''Star Trek Beyond'' both lack a colon in their titles, it is [[1167: Star Trek into Darkness|debatable]] whether they can truly be considered to have subtitles.
 
 
 
In the last panel ("...and beyond"), Megan uses the regular expression <code>/(meta-)*regex golf/</code> to describe her problem. <code>*</code> means "zero or more" of the preceding character/group (parentheses <code>()</code> group characters). So this regex matches "regex golf", "meta-regex golf", "meta-meta-regex golf", etc. In a way this is regex golf in itself, matching all levels of meta-regex golf while not matching anything else.
 
  
In the title text, there is a long regex that is the solution of another regex golf challenge: matching the last names of all elected US presidents but not their opponents. Note that the list of opponents includes some people who were previously or later became presidents, or who share a last name with someone who was president, so taken literally this is impossible. To make this work, the list of opponents must exclude any names of presidents. The regular expression itself works in a very similar way to the ''Star Wars''/''Trek'' one, including several different patterns separated by <code>|</code>. Each elected president matches one pattern while each opponent matches none.
+
In the title text, there is a long regex that is the solution of another regex golf challenge: matching the last names of all elected US presidents but not their opponents. Note that the list of opponents include some people who were previously or later became presidents, so taken literally this is impossible. To make this work the list of opponents must exclude anyone who was also president. The regular expression itself works in a very similar way to the Star Wars/Trek one, including several different patterns separated by <code>|</code>. Each elected president matches one pattern while each opponent matches none.
  
The regex does not match either of the presidents elected since the comic’s release ("Trump" and "Biden"), and thus would need to be updated. The regex does match Hillary Clinton's last name, but because a person with the same last name (Bill Clinton) was president, this isn't a mistake. There was already a losing opponent called George Clinton who ran in 1792 and 1812.
+
Here is a list of elected president and the patterns they match:
 
 
Here is a list of elected presidents and the patterns they match:
 
 
{| class="wikitable"
 
{| class="wikitable"
 
!Number
 
!Number
Line 187: Line 186:
 
|-
 
|-
 
|35
 
|35
|[[John F. Kennedy|John F. Kenn<u>ed</u>y]]
+
|{{w|John F. Kennedy|John F. Kenn<u>ed</u>y}}
 
|<code>[ae]d</code>
 
|<code>[ae]d</code>
 
|-
 
|-
Line 223: Line 222:
 
|}
 
|}
  
Four presidents ({{w|John Tyler}}, {{w|Millard Fillmore}}, {{w|Chester A. Arthur}}, and {{w|Gerald Ford}}) are omitted because they were never ''elected'' president. Each of them became president after the resignation or death of their predecessor. (Five other presidents—Coolidge, Truman, Theodore Roosevelt, and both Andrew and Lyndon B. Johnson—also succeeded to the office, but then went on to win a later presidential election.)
+
Note that some presidents are missing because they weren't elected but became presidents after the resignation/death of their formers. This regular expression must be modified slightly, because it also matches {{w|John C. Fremont|John C. Fremo<u>nt</u>}}, the loser to James Buchanan in 1856.
  
And here is a list of how many unique last names are matched by each expression:
+
== Trivia ==
 +
There are now at least four comics that references regular expressions. The other three are:
  
{| class="wikitable"
+
[[208: Regular Expressions]], [[224: Lisp]] and [[1171: Perl Problems]].
!Expression
 
!Match count
 
|-
 
| bu
 
| 3
 
|-
 
| [rn]t
 
| 3
 
|-
 
| [coy]e
 
| 3
 
|-
 
| [mtg]a
 
| 7
 
|-
 
| j
 
| 3
 
|-
 
| iso
 
| 1
 
|-
 
| n[hl]
 
| 2
 
|-
 
| [ae]d
 
| 2
 
|-
 
| lev
 
| 1
 
|-
 
| sh
 
| 1
 
|-
 
| [lnd]i
 
| 4
 
|-
 
| [po]o
 
| 3
 
|-
 
| ls
 
| 1
 
|}
 
  
Randall's regular expression does ''not'' match presidential opponents Pinckney, King, Clay, Cass, Scott, Douglas, McClellan, Seymour, Greeley, Tilden, Hancock, Blaine, Bryan, Parker, Hughes, Cox, Davis, Smith, Landon, Willkie, Dewey, Stevenson, Goldwater, Humphrey, McGovern, Mondale, Dukakis, Dole, Gore, Kerry, McCain, or Romney.  However, it must be modified slightly, because it ''does'' match {{w|John C. Fremont|John C. Fremo<u>nt</u>}}, the runner-up to James Buchanan in 1856, as discussed by {{w|Peter Norvig}} at [http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb xkcd 1313: Regex Golf]. Note that Norvig provides a small amount of Python code which actually plays regex golf with arbitrary lists, and found a shorter solution than Randall's for the ''Star Wars'' vs ''Star Trek'' game (<code>/ t|p.*e/</code>).
+
Additionally, regular expressions are mentioned in title text of [[1277: Ayn Random]].
  
==Trivia==
+
Also, Randall mentions [http://regex.alf.nu/ a website with a regexp golf game] he got distracted by while researching for the [http://what-if.xkcd.com/78/ 78th] episode of [[what if?]] (which was published one day after this comic).
The idea that the winning president could be picked due to the letter structure of their name (as opposed to how well they campaigned) was the main twist in the 2006 movie {{w|Man of the Year (2006 film)|''Man of the Year''}}.
 
  
Theoretically, such a 'pick' could easily used (played straight, or subverted) in [[2383: Electoral Precedent 2020]], or in a further update.<!-- although the current situation (early 2024) could not adopt it, without adding non-regex specifications to the argument (incumbant/returning/convictions/diagnosis, e.g.), given both likely options are proven already 'electable'... but might be worth reanalysing this statement once we get beyond all Primaries/surprises/etc and know for certain it's not Haley v. Harris, or whatever. Idea left open to revisit this idea at some future date when it could directly work! -->
+
Peter Norvig wrote code in Python that plays regex golf with arbitrary lists in [http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb xkcd 1313: Regex Golf], checking this xkcd solution.
  
==Transcript==
+
== Transcript ==
:[Caption at top of panel:]
 
 
:Regex golf:
 
:Regex golf:
 
:[Megan is sitting at a laptop. Cueball is standing behind her.]
 
:[Megan is sitting at a laptop. Cueball is standing behind her.]
Line 286: Line 242:
 
:Cueball: Cool.
 
:Cueball: Cool.
  
:[Caption at top of panel:]
 
 
:Meta-regex golf:
 
:Meta-regex golf:
:[A close-up of Megan at her laptop.]
 
 
:Megan: So I wrote a program that plays regex golf with arbitrary lists...
 
:Megan: So I wrote a program that plays regex golf with arbitrary lists...
:Cueball (offscreen): Uh oh...
+
:Cueball: Uh oh...
  
:[Caption at top of panel:]
 
 
:Meta-meta-regex golf:
 
:Meta-meta-regex golf:
:[Megan typing at her laptop.]
 
 
:Megan: ...But I lost my code, so I'm grepping for files that look like regex golf solvers.
 
:Megan: ...But I lost my code, so I'm grepping for files that look like regex golf solvers.
:[Cueball facepalming.]
 
  
:[Caption at top of panel:]
 
 
:...And beyond:
 
:...And beyond:
:[Another closeup of Megan at her laptop.]
 
 
:Megan: Really, this is all /(meta-)*regex golf/.
 
:Megan: Really, this is all /(meta-)*regex golf/.
 
:Cueball: Now you have ''infinite'' problems.
 
:Cueball: Now you have ''infinite'' problems.
Line 306: Line 255:
  
 
{{comic discussion}}
 
{{comic discussion}}
 
 
[[Category:Comics featuring Cueball]]
 
[[Category:Comics featuring Cueball]]
 
[[Category:Comics featuring Megan]]
 
[[Category:Comics featuring Megan]]
[[Category:Regex]]
+
[[Category:Computers]]
[[Category:Star Trek]]
 

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)