2700: Account Problems
Title text: My password is just every Unicode codepoint concatenated into a single UTF-8 string.
Cueball asks Ponytail to help him because he can't log in to his account. Having attempted to fix Cueball's tech issues in the past, Ponytail replies with dread. Cueball promises that "It's a normal problem this time", and Ponytail agrees to look at it. But then Cueball reveals that he has included a null string terminator character in his password when creating an account and now he can't log in.
In computer systems, every "character" (letter, digit, punctuation, etc.) is represented as an integer. For example the lowercase letter 'a' is represented as the number 97, and the digit '1' is represented as the number 49 (when using the ASCII character encoding or Unicode character encoding). A "string" refers to a sequence of characters, and can be used to store arbitrary text (for example names, messages, passwords). Strings can be arbitrarily long, so some mechanism must be used to record their length. One approach is to store the length explicitly (Pascal string). Another approach is to mark the end of the string using a specific character, usually the null character (which is represented as the number 0); such strings are called null-terminated strings, and are used by the C programming language. Both approaches have advantages and disadvantages. A limitation of null-terminated strings is that they cannot be used to represent text containing embedded null characters. This is usually not a problem, because normal text never contains null characters. However, if somehow a null character were to end up in the string, it would cause problems: any code that uses that string would assume this null character marks the end of the string, so the string would effectively be cut off.
Account registration systems often place requirements on passwords in an attempt to encourage users to pick stronger passwords. For example, they might ask that the password include at least one "special character" (such as
!@#$%^&*). Cueball misunderstood this requirement as referring to characters such as the null character (which is more accurately referred to as a control character). Cueball managed to type the null character as part of his password somehow (on some systems it is possible to type the null character using certain keyboard shortcuts such as
Alt+0 using the number pad), but the software running the registration system was poorly written and could not cope with this – it allowed him to create an account with that password, but then when he tried to log in with the same password the system didn't accept it.
It's unclear how that particular situation might arise in real software, but here is a similar situation that can easily happen in practice: Suppose a website's registration form allows the user's new password to have up to 20 characters, but due to a programmer error the login page only accepts passwords with up to 18 characters. If the user picks a medium-length password (say with 12 characters), all is well. But if the user picks a password with 20 characters, they will find themselves in the same position as Cueball, being able to register but not able to log in. Some additional situations are described below.
The title text describes a password which is "just" every Unicode character concatenated into a single string. Unicode is a standard for representing characters from many writing systems, and it has 149,186 characters as at the time of this comic (with new characters being added over time). A password consisting of all of those characters would be extremely long; it would be impractical to type by hand, and would be too long for pretty much all account registration systems. (A "codepoint" is the number assigned to a character, and UTF-8 is a common encoding system for representing each Unicode codepoint as a sequence of bytes.) Also, since Unicode includes the null character, the password would have the same issue as Cueball's password. Further, if the account registration system treats the null character as a string terminator (as in C), then the password would be equivalent to an empty password (assuming it contains the Unicode codepoints in order, starting with the null character).
- [Cueball carries an open laptop over to Ponytail, holding it in both hands. The screen shows a box filling the screen with some text on lines. Ponytail is sitting in an office chair with her laptop at her desk. She has turned her head away from the computer looking at Cueball's screen.]
- Cueball: Can you help me with my account?
- Ponytail: Oh no.
- [Cueball holds his laptop up in front of Ponytail who has turned the chair so she faces him, with her hands in her lap. Her table is not drawn.]
- Cueball: No no, I promise it's a normal problem this time.
- Ponytail: Okay. Fine. What is it?
- [Cueball holds both hands out palm up towards Ponytail who is sitting with his laptop in her lap typing on it.]
- Cueball: I included a null string terminator as part of my password, and now I can't-
- Ponytail: How?!
- Cueball: They said to use special characters!
- User input containing unsafe characters has previously appeared in the famous comic 327: Exploits of a Mom.
- Here are some additional situations where passwords with special characters might stop working:
- The registration form allows passwords to contain null characters, but the login form strips null characters (for example because it was written by a different developer/team, or because it has been updated over time). When Cueball tries to log in, the login form strips the null characters, so the resulting password can never match such a stored password (which contains a null character).
- The password system accepts Unicode characters at first, but is later changed to only accept ASCII passwords. Users who included non-ASCII characters like é or ö in their password become locked out of their account because they are no longer allowed to submit those characters.
- Passwords containing non-ASCII characters are in general problematic, because it might not be possible to type them on the keyboard used for logging in. For example, on Mac OS a logged-in user can change their password to one that contains emojis, but the keyboard on the login screen does not have good support for typing emojis..
- A business network may have multiple systems that connect to a central database of usernames and passwords. If the systems have different password handling rules, a user might find that some of the systems don't support their password (for example because the password contains a character which is forbidden on a particular system).
- There are several techniques that can be used to safely handle passwords and other user inputs that might contain unsafe characters such as the null character:
- Validate: Check whether the user input contains unsafe characters, and if it does display an error message to the user.
- Sanitize: Remove unsafe characters from the user input to prevent them from causing problems.
- Encode/quote/escape: Replace each unsafe character with an appropriate sequence of characters (depending on the context). For example, a null character can be included in a URL by encoding it as
%00. This technique is not very relevant to password handling, but is relevant for example when including user input in generated web pages or passing user inputs to database queries.
- For the specific case of null characters: Use a string representation that supports null characters (e.g. Pascal strings), and be very careful not to pass such strings to functions that can't handle embedded null characters.
- Failure to handle strings containing null characters correctly can result in security vulnerabilities. For example, including a null character in crafted input may allow a user to read or write files that they are not supposed to be able to access.
- In C, a string is usually stored in a block of memory that is allocated to have a known size. The maximum size of string that can be stored in such a buffer is one character less than the buffer's size, since the last character is used for the null terminator. Language functions that operate on strings, such as those that return the length of a specified string or which compare two strings, look for the terminator as a marker. However, there is a risk in using this feature: if that terminator is somehow overwritten by some other value, a function which assumes that there is still a stopping point may go far beyond the intended region of memory before it happens to find an unrelated terminator or otherwise is forced to stop looking. This can have serious security implications, as well as the potential for bugs and crashes. Instead, safe programming uses versions of the string functions that include a specification of the maximum allowed length. For example, the
strlen()function takes a pointer to a string, counts the number of characters until it encounters a null terminator, and returns that number: the length of the string not including the terminator. The
strnlen()function takes a pointer to a string and a maximum length, and counts characters until it either finds a terminator or reaches the maximum.
- The number of the xkcd comic is 2700. When interpreting this as two concatenated octal numbers \27 + \00 it represents both the ETB as well as the null character, both of these characters possibly leading to problems when processed in legacy systems (e.g. mainframe computers). When interpreting 2700 as hexadecimal 0x27 + 0x00 numbers it represents the ' character and the null character - a sequence that could lead to SQL injection when it is placed in unescaped form inside of a SQL command.
add a comment! ⋅ add a topic (use sparingly)! ⋅ refresh comments!