Editing 1313: Regex Golf
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 7: | Line 7: | ||
}} | }} | ||
− | ==Explanation== | + | == Explanation == |
− | The comic talks about {{w|regular expressions}}, which are a way to specify textual patterns. | + | The comic talks about {{w|regular expressions}}, which are a way to specify textual patterns. One can later search for the pattern inside a text string: if the pattern is found it’s said that the pattern “matches” the string; if it's not found, it's said they do not match. |
− | + | The title of the comic and the first panel is based on “[http://regex.alf.nu/ regex golf]”, which is a discipline of “{{w|code golf}}”, a game in which programmers attempt to solve a given programming problem using as few characters as possible, analogous to the number of {{w|golf}} shots it takes to reach the goal. In regex golfing, the programmer is given two sets of text fragments, and he/she tries to write the shortest possible regular expression which would match all elements of one set, while at the same time not matching any element from the other set. | |
− | + | The regex golf challenge Megan faces consists of matching all subtitles of ''{{w|Star Wars}}'' episodes, while not matching any subtitle of ''{{w|Star Trek}}'' episodes. ({{w|Subtitle (titling)|Subtitles}} are the secondary titles of the movies, after the ''“Star Trek: ”'' or ''“Star Wars Episode N: ”''. For example, in ''Star Wars Episode I: The Phantom Menace'', the subtitle is ''The Phantom Menace''.) In the first panel, she created a 12-character regex solving the challenge. | |
− | {{ | + | Then she moved on to building a tool which would automatically build such a regex for arbitrary lists of text, which could be described as {{w|meta}}- regex golfing. But as she has lost this tool, she needs to search through her files (using a tool called “{{w|grep}}”, i.e. ''grepping'') to find it, needing a regular expression that would find any code that appears to be a regex golf generator; which makes another “meta-” layer of abstraction. At the end, Megan notes this sequence of meta-meta-... might go to infinity and Cueball quips that she now has “infinite problems” as a result of her efforts; Megan retorts that she already had "infinite problems" because she's geeky enough to run meta-versions of programs on themselves, and stubborn enough to continue on until she fails, to the exclusion of all else. This also seems to be a reference to a famous quote (see also ''[[1171: Perl Problems]]''): |
+ | |||
+ | <blockquote>''Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.''</blockquote> | ||
===Regular expressions=== | ===Regular expressions=== | ||
− | The first regex Megan uses is <code>/m | [tn]|b/</code>, said to match ''Star Wars'' subtitles but not ''Star Trek'' | + | The first regex Megan uses is <code>/m | [tn]|b/</code>, said to match ''Star Wars'' subtitles but not ''Star Trek''. |
− | + | The forward slashes <code>/</code> just mark the start and end of the regex. The <code>|</code> character means “or”, so the regex matches any string that contains the patterns “<code>m </code>”, “<code> [tn]</code>” or “<code>b</code>” (including the spaces). The square brackets match one of the enclosed characters, meaning that “<code> [tn]</code>” matches either “<code> t</code>” or “<code> n</code>”. The regex is apparently case-insensitive, because it wouldn't work otherwise. | |
The Star Wars subtitles match the parts of the regex in the following way: | The Star Wars subtitles match the parts of the regex in the following way: | ||
− | * | + | * “The Phanto<u>m </u>Menace” is matched by “<code>m </code>”. |
− | * | + | * “Attack of<u> t</u>he Clones” is matched by “<code> [tn]</code>”. |
− | * | + | * “Revenge of<u> t</u>he Sith” is matched by “<code> [tn]</code>”. |
− | * | + | * “A<u> N</u>ew Hope” is matched by “<code> [tn]</code>”. |
− | * | + | * “The Empire Strikes <u>B</u>ack” is matched by “<code>b</code>”. |
− | * | + | * “Return of<u> t</u>he Jedi” is matched by “<code> [tn]</code>”. |
+ | Note that the animated film “Star Wars: The Clone Wars” is not included. | ||
− | On the other hand, | + | On the other hand, none of the Star Trek subtitles contains an M followed by a space, a T or an N preceded by a space, or any B, so the regex does not match any of them. Note that in the original series all subtitles start with a “T” but it's the first character so it's not preceded by a space. |
+ | |||
+ | Here is the list that Megan probably used: | ||
* Original series: | * Original series: | ||
** The Motion Picture | ** The Motion Picture | ||
Line 43: | Line 48: | ||
** Nemesis | ** Nemesis | ||
* Reboot series: | * Reboot series: | ||
− | ** | + | ** ''the one without a subtitle'' |
** Into Darkness | ** Into Darkness | ||
− | + | In the last panel “and beyond” Megan uses the regular expression <code>/(meta-)*regex golf/</code> to describe her problem. <code>*</code> means “zero or more” of the preceding character/group (parentheses <code>()</code> group characters). So this regex matches “regex golf”, “meta-regex golf”, “meta-meta-regex golf”, etc. In a way this is regex golf in itself, matching all levels of meta-regex golf while not matching anything else. | |
− | |||
− | |||
− | |||
− | In the last panel | ||
− | In the title text, there is a long regex that is the solution of another regex golf challenge: matching the last names of all elected US presidents but not their opponents. Note that the list of opponents | + | In the title text, there is a long regex that is the solution of another regex golf challenge: matching the last names of all elected US presidents but not their opponents. Note that the list of opponents include some people who were previously or later became presidents, so taken literally this is impossible. To make this work the list of opponents must exclude anyone who was also president. The regular expression itself works in a very similar way to the Star Wars/Trek one, including several different patterns separated by <code>|</code>. Each elected president matches one pattern while each opponent matches none. |
− | + | Here is a list of elected president and the patterns they match: | |
− | |||
− | Here is a list of elected | ||
{| class="wikitable" | {| class="wikitable" | ||
!Number | !Number | ||
Line 187: | Line 186: | ||
|- | |- | ||
|35 | |35 | ||
− | | | + | |{{w|John F. Kennedy|John F. Kenn<u>ed</u>y}} |
|<code>[ae]d</code> | |<code>[ae]d</code> | ||
|- | |- | ||
Line 223: | Line 222: | ||
|} | |} | ||
− | + | Note that some presidents are missing because they weren't elected but became presidents after the resignation/death of their formers. This regular expression must be modified slightly, because it also matches {{w|John C. Fremont|John C. Fremo<u>nt</u>}}, the loser to James Buchanan in 1856. | |
− | + | == Trivia == | |
+ | There are now at least four comics that references regular expressions. The other three are: | ||
− | + | [[208: Regular Expressions]], [[224: Lisp]] and [[1171: Perl Problems]]. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Additionally, regular expressions are mentioned in title text of [[1277: Ayn Random]]. | |
− | + | Also, Randall mentions [http://regex.alf.nu/ a website with a regexp golf game] he got distracted by while researching for the [http://what-if.xkcd.com/78/ 78th] episode of [[what if?]] (which was published one day after this comic). | |
− | |||
− | + | Peter Norvig wrote code in Python that plays regex golf with arbitrary lists in [http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb xkcd 1313: Regex Golf], checking this xkcd solution. | |
− | ==Transcript== | + | == Transcript == |
− | |||
:Regex golf: | :Regex golf: | ||
:[Megan is sitting at a laptop. Cueball is standing behind her.] | :[Megan is sitting at a laptop. Cueball is standing behind her.] | ||
Line 286: | Line 242: | ||
:Cueball: Cool. | :Cueball: Cool. | ||
− | |||
:Meta-regex golf: | :Meta-regex golf: | ||
− | |||
:Megan: So I wrote a program that plays regex golf with arbitrary lists... | :Megan: So I wrote a program that plays regex golf with arbitrary lists... | ||
− | :Cueball | + | :Cueball: Uh oh... |
− | |||
:Meta-meta-regex golf: | :Meta-meta-regex golf: | ||
− | |||
:Megan: ...But I lost my code, so I'm grepping for files that look like regex golf solvers. | :Megan: ...But I lost my code, so I'm grepping for files that look like regex golf solvers. | ||
− | |||
− | |||
:...And beyond: | :...And beyond: | ||
− | |||
:Megan: Really, this is all /(meta-)*regex golf/. | :Megan: Really, this is all /(meta-)*regex golf/. | ||
:Cueball: Now you have ''infinite'' problems. | :Cueball: Now you have ''infinite'' problems. | ||
Line 306: | Line 255: | ||
{{comic discussion}} | {{comic discussion}} | ||
− | |||
[[Category:Comics featuring Cueball]] | [[Category:Comics featuring Cueball]] | ||
[[Category:Comics featuring Megan]] | [[Category:Comics featuring Megan]] | ||
− | [[Category: | + | [[Category:Computers]] |
− |