Most programming languages use the concept of a string literal, which is just a text between some delimiters, usually quotes. For example, "Hello, world" is a string literal. The text being represented is Hello, world without the quotes. However, the quotes are also written to mark the beginning and end of the string. This is a problem when the text itself contains a quote, as in "This is a "quoted" string". The quotes around the word "quoted" are intended to be part of the text, but the language processor will likely confuse it for the end of the string, which would thus be two strings with quoted outside these strings (probably resulting in a syntax error).
To avoid this problem, an escape character (usually a backslash) is prepended to non-string-terminating quotes. So, the previous text would be written as "This is a \"quoted\" string". The language processor will substitute every occurrence of \" with only the quote character, and the string terminates at the quote character which does not immediately follow a backslash. In this case the resulting text string would be This is a "quoted" string as intended.
However, the problem now is that the intended text might contain a backslash itself. For example, the text "C:\" will now be interpreted as an unterminated string containing a quote character. To avoid this, literal backslashes also are escaped with a second backslash, i.e. instead of "C:\" we write "C:\\", where the language processor interprets \\ as one single backslash and the quote terminates the string to give C:\ as the output.
This doubling of backslashes happens in most programming and scripting languages, but also in other syntactic constructs such as regular expressions. So, when several of these languages are used in conjunction, backslashes pile up exponentially (each layer has to double the number of slashes). See example of a backslash explosion and alternatives to avoid this below.
The backslash explosion in the title text is about a bash command (which uses the backslash to escape arguments) invoking the grep utility which searches for text following a pattern specified by means of a regular expression (which also uses the backslash to escape special characters). This leads to 3 backslashes in a row in the command, which could easily become 7 backslashes in a row if the text being searched for also contains a backslash.
Even advanced users who completely understand the concept often have a hard time figuring out exactly how many backslashes are required in a given situation. It is hopelessly frustrating to carefully calculate exactly the number of backslashes and then noticing that there's a mistake so the whole thing doesn't work. At a point, it becomes easier to just keep throwing backslashes in until things work than trying to reason what the correct number is.
It's unclear whether the regular expression in the title text is valid or not. A long discussion about the validity of the expression has occurred here on this explanation's talk page. The fact that many editors of the site, often themselves extremely technically qualified, can't determine whether the expression is valid or not, adds a meta layer to the joke of the comic. This is an example of nerd sniping (oh, the irony\!\!\!\).
Entries in the list
- The first four examples have names that are (somewhat) based on what they actually produce:
- Backslash: 1 backslash appropriately named
- Real backslash: 2 backslashes are labeled correctly as they do indeed refer to an escaped backslash.
- Real real backslash: 3 backslashes would refer to an escaped backslash followed by an unescaped one. The first two backslashes would combine to make a real backslash while the third one would combine with the character following it to form an escape sequence. The name does thus not make a lot of sense, as this is two escape sequences and not a single "very real" one.
- Actual backslash, for real this time: 4 backslashes form one single backslash escaped twice (the first escaping produces two backslashes, the second escaping doubles each of the backslashes). This is so common that even the documentation for the Python regular expression library has a section called Regular expression operations that mentions "\\\\" explicitly. In this case, the backslash has to be escaped once for being part of a regular expression and then once more as the regular expression is inside a Python string. This is named in reference to the fact that the previous examples didn't contain enough escaping.
- The remaining five examples of backslashes have more and more occult names (explanations) and do not refer to any more real uses of backslash escapes:
- Elder backslash: 5 backslashes would be a doubly-escaped backslash plus an unescaped one. The reference to Elder in the comic has many meanings. It has become known through fantasy media; Most prominent with the Elder Days, which are the first Ages of Middle-earth in The Silmarillion, the more-or-less prequel to The Lord of the Rings. More recently it has been used in the Harry Potter universe where the Deathly Hallow called the Elder wand, made from Elder wood, is a very important part of the last book Harry Potter and the Deathly Hallows. Other examples are the Elder Gods of the Cthulhu Mythos as well as various 'Elder' magical items and beings in the Dungeons and Dragons mythologies.
- Backslash which escapes the screen and enters your brain: 6 backslashes is a play on the word "escape" as the backslash is supposed to be an "escape character" but obviously not "escaping the screen" and entering your brain. This could also be understood as the programmer is getting backslashes on his mind, when he goes beyond the Elder backslash domain...
- Backslash so real it transcends time and space : 7 backslashes goes further than escaping the screen as they now transcends both time and space
- Backslash to end all other text: 8 backslashes would be a triply-escaped backslash (same as 4 backslashes but with an additional escaping layer). It is said to "end all other text", i.e. there should never be anymore text if someone uses eight in a row. But there could be more as indicated in the last example.
- The true name of Ba'al, the Soul-Eater: ∞ backslashes (11 are shown but followed by "..." to indicate that they continue forever). If you could write an infinite number of backslashes it would actually be The true name of Ba'al, the Soul-Eater. This indicates that if you continue misusing backslashes like this you will end up devoured by a demon, for instance Beelzebub, for being so thoughtless... Ba'al has been mentioned before in the title text of 1246: Pale Blue Dot and in 1419: On the Phone.
Backslash explosion and alternatives
- The word
echois the PHP command for writing something
- The first quote starts the string
RegExp(- including the open parenthesis - is written literally
\"following that is a literal quote to be written
- The first two slashes produce one single slash
- And so on until 8 backward slashes are written
- The next
\"produces a literal quote character
).test(str);is written literally
- The next quote finishes the string.
- The final semicolon terminates the
So, the presented scenario has escalated from a simple test for
\\ to no less than seventeen backslashes in a row without stepping out of the most common operations.
If we go a bit further and try to write a Java program that outputs our PHP script, we'd have:
Here, we have 35 backslashes in a row: the first 34 produce the 17 we need in our PHP script, and the last one is for escaping the quote character. (This comes closer to The true name of Ba'al, the Soul-Eater).
Some programming languages provide alternative matching string literal delimiters to limit situations where escaping of delimiters is needed. Often, one can begin and end a string with either a single quote or a double quote. This allows one to write
'This is a "quoted" string' if double quote marks are intended in the string literal or
"This is a 'quoted' string" if single quote marks are intended. Both kinds of delimiters can't be used in the same string literal, but if one needs to construct a string containing both kinds of quote marks one can often concatenate two string literals, each of which uses a different delimiter.
Another feature that seems to be popular in modern programming languages is to provide an alternative syntax for string delimiters designed specifically to limit leaning toothpick syndrome. For example, in Python, a string literal starting with
r" is a "raw string"  in which no escape processing is done, with similar semantics for a string starting with
@" in C#. This allows one to write
r"C:\Users" in Python or
@"C:\Users" in C# without the need to escape the backslash. This does not allow one to embed the terminating delimiter in the middle of the string and prevents the use of the backslash to encode the newline character as
\n, but comes in handy when writing a string encoding of a regular expression in which the backslash is escaping one or more other punctuation characters or a shorthand character class (e.g.,
\s for a whitespace character). For example, when looking for an anchor tag in HTML, I may encode the regular expression as
<[Aa]\s[^>]*>. If I express this regular expression as a raw string literal, my code looks like
r"<[Aa]\s[^>]*>" instead of
"<[Aa]\\s[^>]*>". The point here is that leaning toothpick syndrome is such a real problem that it has influenced programming language implementations.
- [A list of the names of different numbers of backslashes. After each "item" there is a gray line to the text describing each item. As the text is aligned above each other, the lines becomes shorter as the sequence of backslashes becomes longer until there is just a line with the length of a single hyphen for the last item. There are 1 to 8 backslashes and then 11 plus "..." in the last entry.]
- \------------ Backslash
- \\----------- Real backslash
- \\\---------- Real real backslash
- \\\\---------- Actual backslash, for real this time
- \\\\\--------- Elder backslash
- \\\\\\-------- Backslash which escapes the screen and enters your brain
- \\\\\\\------- Backslash so real it transcends time and space
- \\\\\\\\------ Backslash to end all other text
- \\\\\\\\\\\...- The true name of Ba'al, the Soul-Eater
Note on Title Text
The title text when first published was
I searched my .bash_history for the line with the highest ratio of special characters to regular alphanumeric characters, and the winner was: cat out.txt | grep -o "\\\[[(].*\\\[\])][^)\]]*$" ... I have no memory of this and no idea what I was trying to do, but I sure hope it worked.
It was changed within a few days to
I searched my .bash_history for the line with the highest ratio of special characters to regular alphanumeric characters, and the winner was: cat out.txt | grep -o "[[(].*)][^)]]*$" ... I have no memory of this and no idea what I was trying to do, but I sure hope it worked.
The original title text seems to be more relevant to the comic, but the revised title text seems to make more sense as a legitimate command line due to the way backslashes are interpreted in regular expressions. See the Discussion below for much more on the topic.
add a comment! ⋅ add a topic (use sparingly)! ⋅ refresh comments!