1313: Regex Golf
Title text: /bu|[rn]t|[coy]e|[mtg]a|j|iso|n[hl]|[ae]d|lev|sh|[lnd]i|[po]o|ls/ matches the last names of elected US presidents but not their opponents.
This comic revolves around a set of increasingly complicated regular expressions, which are patterns used to search through text for blocks of text matching the pattern. There is a saying in professional programming that goes like this (see 1171):
- Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
The comic exemplifies this as Megan's problems grow increasingly more convoluted - originally she was writing regex as a game, then moved on to automatically building regex on arbitrary lists of text, to searching through her files for code that appears to be a regex golf generator. At the end, Cueball quips that she now has "Infinite Problems" as a result of her efforts, tying back to the saying above.
Code golf is a game where by programmers attempt to solve a problem using as few characters as possible, analogous to the number of golf shots it takes to reach the goal.
Regular expressions are a way to specify textual patterns. One can later search for the pattern inside a text string: if the pattern is found it's said that the pattern "matches" the string; if it's not found, it's said they do not match. The "Regex Golf" challenge is to make a regular expression that matches all of the strings in one group and none of the strings in another. As in "Code golf" the challenge is to find the shortest possible Regex that does this.
The first regex Megan uses is /m | [tn]|b/, said to match Star Wars subtitles but not Star Trek. Subtitles are the secondary titles of the movies, after the "Star Trek: " or "Star Wars Episode N: ". For example, in "Star Wars Episode I: The Phantom Menace" the subtitle is "The Phantom Menace".
The forward slashes just mark the start and end of the regex. The | character means "or", so the regex matches any string that contains the patterns "m ", " [tn]" or "b" (including the spaces). The square brackets match one of the enclosed characters, meaning that " [tn]" matches either " t" or " n". The regex is apparently case-insensitive, because it wouldn't work otherwise.
The Star Wars subtitles match the parts of the regex in the following way:
- "The Phantom Menace" is matched by "m ".
- "Attack of the Clones" is matched by " [tn]".
- "Revenge of the Sith" is matched by " [tn]".
- "A New Hope" is matched by " [tn]".
- "The Empire Strikes Back" is matched by "b".
- "Return of the Jedi" is matched by " [tn]".
Note that the animated film "Star Wars: The Clone Wars" is not included.
On the other hand, none of the Star Trek subtitles contains an M followed by a space, a T or an N preceded by a space, or any B, so the regex does not match any of them. Note that in the original series all subtitles start with a "T" but it's the first character so it's not preceded by a space.
Here is the list that Megan probably used:
- Original series:
- The Motion Picture
- The Wrath of Khan
- The Search For Spock
- The Voyage Home
- The Final Frontier
- The Undiscovered Country
- The Next Generation:
- First Contact
- Reboot series:
- the one without a subtitle
- Into Darkness
"Grepping" refers to using the Unix/Linux command line tool "grep", which is short for "Globally search a Regular Expression and Print", thus continuing to use regular expressions in search for the lost files.
In the last panel "and beyond" Megan uses the regular expression "/(meta-)*regex golf/" to describe her problem. * means "zero or more" of the preceding character/group (parentheses group characters). So this regex matches "regex golf", "meta-regex golf", "meta-meta-regex golf", etc. In a way this is regex golf in itself, matching all levels of meta-regex golf while not matching anything else.
In the title text there is a long regex that is the solution of another regex golf challenge: matching the last names of all elected US presidents but not their opponents. Note that the list of opponents include some people who were previously or later became presidents, so taken literally this is impossible. To make this work the list of opponents must exclude anyone who was also president. The regular expression itself works in a very similar way to the Star Wars/Trek one, including several different patterns separated by |. Each elected president matches one pattern while each opponent matches none. Here is a list of elected president and the patterns they match:
|John Quincy Adams||[ae]d|
|Martin Van Buren||bu|
|William Henry Harrison||iso|
|James K. Polk||[po]o|
|Ulysses S. Grant||[rn]t|
|Rutherford B. Hayes||[coy]e|
|James A. Garfield||[mtg]a|
|William Howard Taft||[mtg]a|
|Warren G. Harding||[lnd]i|
|Franklin D. Roosevelt||[po]o|
|Harry S. Truman||[mtg]a|
|Dwight D. Eisenhower||n[hl]|
|John F. Kennedy||[ae]d|
|Lyndon B. Johnson||j|
|George H. W. Bush||bu|
|George W. Bush||bu|
Note that some presidents are missing because they weren't elected but became presidents after the resignation/death of their formers. This regular expression must be modified slightly, it also matches John C. Fremont, the loser to James Buchanan in 1856.
There are now at least four comics that references regular expressions. The other three are:
- Regex golf:
- [Megan is sitting at a laptop. Cueball is standing behind her.]
- Megan: You try to match one group but not the other.
- Megan: /m | [tn]|b/ matches Star Wars subtitles but not Star Trek.
- Cueball: Cool.
- Meta-regex golf:
- Megan: So I wrote a program that plays regex golf with arbitrary lists...
- Cueball: Uh oh...
- Meta-meta-regex golf:
- Megan: ...But I lost my code, so I'm grepping for files that look like regex golf solvers.
- ...And beyond:
- Megan: Really, this is all /(meta-)*regex golf/.
- Cueball: Now you have infinite problems.
- Megan: No, I had those already.
add a comment! ⋅ add a topic (use sparingly)! ⋅ refresh comments!