Editing 1638: Backslashes

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 4: Line 4:
 
| title    = Backslashes
 
| title    = Backslashes
 
| image    = backslashes.png
 
| image    = backslashes.png
| titletext = I searched my .bash_history for the line with the highest ratio of special characters to regular alphanumeric characters, and the winner was: cat out.txt | grep -o "[[(].*[])][^)]]*$" ... I have no memory of this and no idea what I was trying to do, but I sure hope it worked.
+
| titletext = I searched my .bash_history for the line with the highest ratio of special characters to regular alphanumeric characters, and the winner was: cat out.txt | grep -o "\\\[[(].*\\\[\])][^)\]]*$" ... I have no memory of this and no idea what I was trying to do, but I sure hope it worked.
 
}}
 
}}
  
 
==Explanation==
 
==Explanation==
Most programming languages use the concept of a {{w|String literal|string}} literal, which is just a text between some delimiters, usually quotes. For example, "Hello, world" is a string literal. The text being represented is ''Hello, world'' without the quotes. However, the quotes are also written to mark the beginning and end of the string. This is a problem when the text itself contains a quote, as in "This is a "quoted" string". The quotes around the word "quoted" are intended to be part of the text, but the {{w|Lexical analysis|language processor}} will likely confuse it for the end of the string, which would thus be two strings with ''quoted'' outside these strings (probably resulting in a syntax error).
+
{{incomplete|Need rewriting of the entries in the list and a thorough analysis of the title text}}
  
To avoid this problem, an {{w|Escape character|escape character}} (usually a backslash) is prepended to non-string-terminating quotes. So, the previous text would be written as "This is a \"quoted\" string". The language processor will substitute every occurrence of \" with only the quote character, and the string terminates at the quote character which does not immediately follow a backslash. In this case the resulting text string would be ''This is a "quoted" string'' as intended.
+
Most {{w|Formal language|formal languages}} use the concept of a {{w|String literal|string}}, which is just a text between some delimiters, usually quotes. For example, "Hello, world" is a string. The text being represented is "Hello, world" without the quotes, however the quotes are also written to mark the beginning and end of the string. This is a problem when the text itself contains a quote, as in "This is a "quoted" string". The quotes around the word "quoted" are intended to be part of the text, but the {{w|Lexical analysis|language processor}} will likely confuse it for the end of the string.
  
However, the problem now is that the intended text might contain a backslash itself. For example, the text "C:\" will now be interpreted as an unterminated string containing a quote character. To avoid this, literal backslashes also are escaped with a second backslash, i.e. instead of "C:\" we write "C:\\", where the language processor interprets \\ as one single backslash and the quote terminates the string to give ''C:\'' as the output.
+
To avoid this problem, an {{w|Escape character|escape character}} (usually a backslash) is prepended to non-string-terminating quotes. So, the previous text would be written as "This is a \"quoted\" string". The language processor will substitute every occurrence of \" with only the quote character, and the string terminates at the quote character which does not immediately follow a backslash. However, the problem now is that the intended text might contain a backslash itself. For example, the text "C:\" will now be interpreted as an unterminated string containing a quote character. To avoid this, literal backslashes also are escaped with a second backslash, i.e. instead of "C:\" we write "C:\\", where the language processor interprets \\ as one single backslash and the quote terminates the string.
  
This doubling of backslashes happens in most programming and scripting languages, but also in other syntactic constructs such as {{w|Regular expression|regular expressions}}. So, when several of these languages are used in conjunction, backslashes pile up exponentially (each layer has to double the number of slashes). See example of a backslash explosion and alternatives to avoid this [[#Backslash explosion and alternatives|below]].
+
This doubling of backslashes happens in most programming and scripting languages, but also in other syntactic constructs such as {{w|Regular expression|regular expressions}}. So, when several of these languages are used in conjunction, backslashes pile up exponentially (each layer has to double the number of slashes). A reasonable example would be a {{w|PHP}} script in a web server which writes {{w|JavaScript}} code to be run in the client. If the JavaScript code has to output a smiley for scratching one's head (i.e. <code>r:-\</code> ), it would look like this:
 
+
  document.write ("r:-\\");
This kind of backslash explosion is known as {{w|Leaning toothpick syndrome}}, and can happen in [[1313: Regex Golf|many situations]]. Below is an explanation of all the [[#Entries in the list|entries in the comic]].
 
 
 
The backslash explosion in the title text is about a {{w|Bash (Unix shell)|bash}} command (which uses the backslash to escape arguments) invoking the {{w|grep}} utility which searches for text following a pattern specified by means of a regular expression (which also uses the backslash to escape special characters). This leads to 3 backslashes in a row in the command, which could easily become 7 backslashes in a row if the text being searched for also contains a backslash.
 
 
 
Even advanced users who completely understand the concept often have a hard time figuring out exactly how many backslashes are required in a given situation. It is hopelessly frustrating to carefully calculate exactly the number of backslashes and then noticing that there's a mistake so the whole thing doesn't work. At a point, it becomes easier to just keep throwing backslashes in until things work than trying to reason what the correct number is.
 
 
 
It's unclear whether the regular expression in the title text is valid or not. A long discussion about the validity of the expression has occurred here on this explanation's [[Talk:1638: Backslashes|talk page]]. The fact that many editors of the site, often themselves extremely technically qualified,{{Citation needed}}<sup>{{Citation needed}}</sup> can't determine whether the expression is valid or not, adds a meta layer to the joke of the comic. This is an example of [[356: Nerd Sniping|nerd sniping]] (oh, the irony\!\!\!\).
 
 
 
===Entries in the list===
 
*The first four examples have names that are (somewhat) based on what they actually produce:
 
**'''Backslash''': 1 backslash appropriately named
 
**'''Real backslash''': 2 backslashes are labeled correctly as they do indeed refer to an escaped backslash.
 
**'''''Real'' real backslash''': 3 backslashes would refer to an escaped backslash followed by an unescaped one. The first two backslashes would combine to make a ''real backslash'' while the third one would combine with the character following it to form an {{w|Escape sequence|escape sequence}}. The name does thus not make a lot of sense, as this is two escape sequences and not a single "very real" one.
 
**'''Actual backslash, for real this time''': 4 backslashes form one single backslash escaped twice (the first escaping produces two backslashes, the second escaping doubles each of the backslashes). This is so common that even the documentation for the {{w|Python (programming language)|Python}} regular expression library has a section called [https://docs.python.org/3/library/re.html  Regular expression operations] that mentions "\\\\" explicitly. In this case, the backslash has to be escaped once for being part of a regular expression and then ''each'' of these once more as the regular expression needs to be written inside a Python string. This is named in reference to the fact that the previous examples didn't contain enough escaping.
 
*The remaining five examples of backslashes have more and more occult names (explanations) and do not refer to any more real uses of backslash escapes:
 
**'''Elder backslash''': 5 backslashes would be a doubly-escaped backslash plus an unescaped one. The reference to {{w|Elder}} in the comic has many meanings. It has become known through fantasy media; Most prominent with the {{w|Elder Days}}, which are the first Ages of {{w|Middle-earth}} in {{w|The Silmarillion}}, the more-or-less prequel to {{w|The Lord of the Rings}}. More recently it has been used in the {{w|Harry Potter}} universe where the ''Deathly Hallow'' called the ''{{w|Magical_objects_in_Harry_Potter#Deathly_Hallows|Elder wand}}'', made from {{w|Sambucus|Elder wood}}, is a very important part of the last book ''{{w|Harry Potter and the Deathly Hallows}}''. Other examples are the {{w|Elder Gods}} of the {{w|Cthulhu Mythos}} as well as various 'Elder' magical items and beings in the {{w|Dungeons and Dragons}} mythologies.
 
**'''Backslash which escapes the screen and enters your brain''': 6 backslashes is a play on the word "escape" as the backslash is supposed to be an "escape character" but obviously not "escaping the screen" and entering your brain. This could also be understood as the programmer getting backslashes on their mind, when they go beyond the ''Elder backslash'' domain...
 
**'''Backslash so real it transcends time and space ''': 7 backslashes goes further than escaping the screen as they now {{w|Transcendence (philosophy)|transcends}} both {{w|Spacetime|time and space}}
 
**'''Backslash to end all other text''': 8 backslashes would be a triply-escaped backslash (same as 4 backslashes but with an additional escaping layer). It is said to "end all other text", i.e. there should never be any more text if someone uses eight in a row. But there could be more as indicated in the last example.
 
**'''The true name of Ba'al, the Soul-Eater''': {{w|Infinity|∞ backslashes}} (11 are shown but followed by "..." to indicate that they continue forever). If you could write an infinite number of backslashes it would actually be ''The true name of {{w|Baal|Ba'al}}, the {{w|Soul eater (folklore)|Soul-Eater}}''. This indicates that if you continue misusing backslashes like this you will end up devoured by a demon, for instance {{w|Beelzebub}}, for being so thoughtless... Ba'al has been mentioned before in [[1419: On the Phone]] and in the title text of [[1246: Pale Blue Dot]].
 
 
 
===Backslash explosion and alternatives===
 
A reasonable example of a backslash explosion would be a {{w|PHP}} script on a web server which writes {{w|JavaScript}} code with a {{w|Regular Expression}} to be run on the client. If the JavaScript code has to test a string to see if ''it'' has a double-backslash, the Regular Expression to do so would be:
 
  \\\\
 
where the first two backslashes represent a single backslash and the second two also represent a single backslash, so this searches for two consecutive back slashes.
 
And the JavaScript would be:
 
RegExp("\\\\\\\\").test(str);
 
where every two backslashes means just one backslashes in the string, so the 8 backslashes in JavaScript become 4 backslashes in the Regular Expression.
 
 
However, since this JavaScript code is to be written through a PHP script, the PHP code would be:
 
However, since this JavaScript code is to be written through a PHP script, the PHP code would be:
  echo "RegExp(\"\\\\\\\\\\\\\\\\\").test(str);";
+
  echo "document.write (\"r:-\\\\\");";
 
where:
 
where:
 
* The word <code>echo</code> is the PHP command for writing something
 
* The word <code>echo</code> is the PHP command for writing something
 
* The first quote starts the string
 
* The first quote starts the string
* The <code>RegExp(</code> - including the open parenthesis - is written literally
+
* The <code>document.write (</code> (including the open parenthesis) is written literally
 
* The <code>\"</code> following that is a literal quote to be written
 
* The <code>\"</code> following that is a literal quote to be written
 +
* The <code>r:-</code> is written literally
 
* The first two slashes produce one single slash
 
* The first two slashes produce one single slash
* And so on until 8 backward slashes are written
+
* The next two slashes produce another single slash
 
* The next <code>\"</code> produces a literal quote character
 
* The next <code>\"</code> produces a literal quote character
* The <code>).test(str);</code> is written literally
+
* The close parenthesis and the semicolon are to be written literally
 
* The next quote finishes the string.
 
* The next quote finishes the string.
 
* The final semicolon terminates the <code>echo</code> command
 
* The final semicolon terminates the <code>echo</code> command
So, the presented scenario has escalated from a simple test for <code>\\</code> to no less than seventeen backslashes in a row without stepping out of the most common operations.
 
  
If we go a bit further and try to write a {{w|Java (programming language)|Java}} program that outputs our PHP script, we'd have:
+
So, the presented scenario has escalated from a simple <code>r:-\</code> smiley to no less than five backslashes in a row without stepping out of the most common operations. If we go a bit further and try to write a {{w|Java (programming language)|Java}} program that outputs our PHP script, we'd have:
  System.out.println("echo \"RegExp(\\\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\").test(str);\";");
+
  System.out.println ("echo \"document.write (\\\"r:-\\\\\\\\\\\");\";";
Here, we have 35 backslashes in a row: the first 34 produce the 17 we need in our PHP script, and the last one is for escaping the quote character. (This comes closer to ''The true name of Ba'al, the Soul-Eater'').
+
Here, we have 11 backslashes in a row: the first 10 produce the 5 we need in our PHP script, and the last one is for escaping the quote character.
 +
 
 +
This kind of backslash explosion is known as {{w|Leaning toothpick syndrome}}, and can happen in many situations. The one in the title text is about a {{w|Bash (Unix shell)|bash}} command (which uses the backslash to escape arguments) invoking the {{w|grep}} utility which searches for text following a pattern specified by means of a regular expression (which also uses the backslash to escape special characters). This leads to 3 backslashes in a row in the command, which could easily become 7 backslashes in a row if the text being searched for also contains a backslash. Even advanced users who completely understand the concept often have a hard time figuring out exactly how many backslashes are required in a given situation. It is hopelessly frustrating to carefully calculate exactly the number of backslashes and then noticing that there's a mistake so the whole thing doesn't work. At a point, it becomes easier to just keep throwing backslashes in until things work than trying to reason what the correct number is.
 +
 
 +
=== Entries in the list ===
 +
The first four entries with 1-4 backslashes make sense: a "real backslash" is when the program is told to find one of these using two backslashes. If you need to find such a double "real backslash" you would need a third, hence the first ''real'' is written in italic. And if you want to write a regular expression to find a backslash, and you're working in a language like Java, you need four backslashes--the regular expression itself would require two backslashes, each of which would need to be expressed as doubled backslashes in the string literal. (The use of four backslashes is specifically mentioned in the documentation for the Python regular expression library [https://docs.python.org/2/library/re.html https://docs.python.org/2/library/re.html].)
 +
 
 +
Using 5-7 backslashes continues the trend, first with five a reference to {{w|Elder}} which has many meanings. It has become known through fantasy media; examples are the {{w|Elder Gods}} of the Cthulhu Mythos, various 'Elder' magical items and beings in the {{w|Dungeons and Dragons}} mythologies, and the {{w|Elder Days}}, which are the first Ages of {{w|Middle-earth}} in {{w|The Silmarillion}}, the more-or-less prequel to {{w|The Lord of the Rings}}. More recently it has been used in the {{w|Harry Potter}} universe with the {{w| Magical_objects_in_Harry_Potter#Deathly_Hallows|Elder wand}} made from {{w|Sambucus|Elder wood}}.
 +
 
 +
Using 6 backslashes will cause them to escape the computer and enter your brain and using 7 backslashes makes it ''so real it {{w|Transcendence (philosophy)|transcends}} {{w|Spacetime|time and space}}.''
  
Some programming languages provide alternative matching string literal delimiters to limit situations where escaping of delimiters is needed. Often, one can begin and end a string with either a single quote or a double quote. This allows one to write <code>'This is a "quoted" string'</code> if double quote marks are intended in the string literal or <code>"This is a 'quoted' string"</code> if single quote marks are intended. Both kinds of delimiters can't be used in the same string literal, but if one needs to construct a string containing both kinds of quote marks one can often concatenate two string literals, each of which uses a different delimiter.
+
The list gives names for all numbers of backslashes from 1 up to 8, but then the last entry has 11 slashes followed by "..." to indicate they continue forever. This is: ''The true name of {{w|Baal (demon)|Ba'al}}, the {{w|Soul eater (folklore)|Soul-Eater}}''. This indicates that if you continue misusing backslashes like this you will end up devoured by a demon, for instance {{w|Beelzebub}}, for being so thoughtless... Ba'al has been [[1419: On the Phone|mentioned]] [[1246: Pale Blue Dot|before]] in {{xkcd}}.
  
Another feature that seems to be popular in modern programming languages is to provide an alternative syntax for string delimiters designed specifically to limit leaning toothpick syndrome. For example, in Python, a string literal starting with <code>r"</code> is a "raw string"  [http://en.wikipedia.org/wiki/String_literal#Raw_strings] in which no escape processing is done, with similar semantics for a string starting with <code>@"</code> in C#. This allows one to write <code>r"C:\Users"</code> <!-- Note: In Python, backslashes can still escape the closing delimiter. r"C:\" is a SyntaxError. --> in Python or <code>@"C:\Users"</code> in C# without the need to escape the backslash. This does <em>not</em> allow one to embed the terminating delimiter in the middle of the string and prevents the use of the backslash to encode the newline character as <code>\n</code>, but comes in handy when writing a string encoding of a regular expression in which the backslash is escaping one or more other punctuation characters or a shorthand character class (e.g., <code>\s</code> for a whitespace character). For example, when looking for an anchor tag in HTML, developers may encode the regular expression as <code>&lt;[Aa]\s[^&gt;]*&gt;</code>. If they express this regular expression as a raw string literal, the code looks like  <code>r"&lt;[Aa]\s[^&gt;]*&gt;"</code> instead of <code>"&lt;[Aa]\\s[^&gt;]*&gt;"</code>. The point here is that "leaning toothpick syndrome" is such a real problem that it has influenced programming language implementations.
+
=== Title text ===
 +
It's unclear whether the regular expression in the title text is valid or not. A long discussion about the validity of the expression has occurred here on explainxkcd.com. The fact that many editors of the site, often themselves extremely technically qualified (citation needed for any evidence of this), can't determine whether the expression is valid or not adds a meta layer to the joke of the comic.  This is probably an example of [[356: Nerd Sniping|nerd sniping]] (oh, the irony!!!).
  
 
==Transcript==
 
==Transcript==
Line 79: Line 61:
 
:\\\\\\\\\\\...<font color="gray">-</font> The true name of Ba'al, the Soul-Eater
 
:\\\\\\\\\\\...<font color="gray">-</font> The true name of Ba'al, the Soul-Eater
  
==Note on Title Text==
+
{{comic discussion}}
The title text when first published was
 
 
 
I searched my .bash_history for the line with the highest ratio of special characters to regular alphanumeric characters, and the winner was: cat out.txt &#124; grep -o "\\\[[(].*\\\[\])][^)\]]*$" ... I have no memory of this and no idea what I was trying to do, but I sure hope it worked.
 
 
 
It was changed within a few days to
 
 
 
I searched my .bash_history for the line with the highest ratio of special characters to regular alphanumeric characters, and the winner was: cat out.txt &#124; grep -o "[[(].*[])][^)]]*$" ... I have no memory of this and no idea what I was trying to do, but I sure hope it worked.
 
 
 
The original title text seems to be more relevant to the comic, but the revised title text seems to make more sense as a legitimate command line due to the way backslashes are interpreted in regular expressions. See the Discussion below for much more on the topic.
 
  
{{comic discussion}}
 
 
[[Category:Regex]]
 
[[Category:Regex]]
 
[[Category:Programming]]
 
[[Category:Programming]]
 
[[Category:Ba'al, the Soul Eater]]
 
[[Category:Ba'al, the Soul Eater]]

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)