1700: New Bug

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
New Bug
There's also a unicode-handling bug in the URL request library, and we're storing the passwords unsalted ... so if we salt them with emoji, we can close three issues at once!
Title text: There's also a unicode-handling bug in the URL request library, and we're storing the passwords unsalted ... so if we salt them with emoji, we can close three issues at once!

Explanation[edit]

Cueball asks if an off-panel character can look at his bug report. The person asks if it's a "normal one," and not a "horrifying" one which proves that the entire project is "broken beyond repair and should be burned to the ground." This implies that there have been reports of the "horrifying" variety in the past.

Cueball promises that it is a normal one but it turns out that the server crashes when a user's password is a resolvable URL, which implies that the server is in some way attempting to resolve passwords as if they were URLs. A resolvable URL is one that is syntactically correct and refers to a find-able and accessible resource on the internet (i.e. does not return a 404 error or equivalent when resolved). Therefore a resolvable URL is a fully qualified domain name or a valid IP address that points to a valid server, and it can optionally specify a resource that exists on that server. Normally there is no reason for a system to treat a password as if it were a URL โ€” and testing if a password is a resolvable URL would be a horrible thing to do as it would involve sending the password over the internet in a (at the time the comic was written) most likely completely unencrypted format.

Also, Cueball specifically states that the server is crashing, rather than his application. While this could be an example of misused terminology on the part of Cueball or Randall, given Cueball's history (for example causing the most basic console commands to fail in 1084: Server Problem or other tech issues as seen in 1586: Keyboard Problems) his choice of terms is probably accurate. In the context of web services the server refers to either the computer itself or the program that responds to web requests and executes the user's (i.e. Cueball's) application. Cueball would be in charge of building the application. The importance of this distinction is that a typical system has safe guards in place at many levels to prevent a misbehaving application from crashing anything other than itself. So for his application to crash the server (either the computer itself or the server software hosting his application) would require his application to be operating in a way far outside of the normal, which has been the case for Cueball in previous comics. Alternatively, the project might include its own server software without the safeguards. In either case it is clear that Cueball's issue is far from normal, for which reason the off-panel person gives up and decides that burning the project to the ground is the only solution, telling Cueball I'll get the lighter fluid.

In the title text, another two issues with Cueball's program are mentioned, together with a possible solution that would fix all three problems at once. The second problem is a unicode-handling bug in the URL request library, and the third is that the passwords are stored unsalted. The proposed solution is to salt the passwords with emoji (unicode, multi-byte characters), which is claimed to solve all three issues at once. Salting passwords means that random characters are added to the password before it is cryptographically-secured and stored in the database. Salting increases security in the event that the database is compromised by ensuring that users with the same password will not have the same password hash. This makes some attacks that can be used to crack hash databases, such as rainbow tables, effectively impossible. Salting passwords with emoji can potentially "fix" these bugs in different ways. First, emoji and other unicode characters are not valid characters in URLs. As a result the salted-passwords will no longer be resolvable URLs. This will presumably circumvent (but not actually fix) the bug that causes the server to crash. In addition, the passwords will now be salted, increasing security. There is no obvious way that this would actually fix a unicode-handling bug in the URL request library. Given Cueball's general approach to problems like this, the best explanation is probably that he hasn't "fixed" the bug but rather that it is no longer a bug because he is relying on its behavior to help fix these other issues, i.e. the classic it's not a bug, it's a feature.

The title text shows that his general approach to problems is not to actually fix bugs but to work around them and even rely on them for other behavior. This approach to software development makes for terrible code, which is likely how Cueball got into this trouble in the first place. Therefore the title text shows that he still has yet to learn from his mistakes, further supporting the suggestion to just burn the whole thing down.

In the title text of the first, using emoji in variable names is mentioned. Emoji has since then become a recurrent theme on xkcd.

In 1349: Shouldn't Be Hard, Cueball is also programming and finding it very difficult, although he thinks it should be easy. An off-panel person suggests burning the computer down with a blowtorch, much like the off-panel person in this one suggests burning the whole project (including the computer) to the ground with lighter fluid. In the next comic, with multiple storylines 1350: Lorenz, one story line results in a computer being burned with a blow torch.

Exactly one thousand comics later, Cueball is still running into technical problems with abnormal passwords.

Interestingly, the 2021 vulnerability Log4Shell could be triggered when a specially crafted URL was logged with the Log4j framework. This could lead to a crash (as in the comic) or the computer being taken over by the attacker. However, the contents of a password field should never be logged, so this still would indicate a major problem with the design of Cueball's project.

Transcript[edit]

[Cueball sits at his desk in front of his computer leaning back and turning away from it to speak to a person off-panel.]
Cueball: Can you take a look at the bug I just opened?
Off-panel voice: Uh oh.
[Zoom out and pan to show only Cueball sitting on his chair facing away from the computer, which is now off-panel. The person speaking to him is still of panel even though this panel is much broader.]
Off-panel voice: Is this a normal bug, or one of those horrifying ones that prove your whole project is broken beyond repair and should be burned to the ground?
[Zoom in on Cueball's head and upper torso.]
Cueball: It's a normal one this time, I promise.
Off-panel voice: OK, what's the bug?
[Back to a view similar to the first panel where Cueball has turned towards the computer and points at the screen with one hand.]
Cueball: The server crashes if a user's password is a resolvable URL.
Off-panel voice: I'll get the lighter fluid.


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

I'm new. For the explanation: A bug, as in a computer (programming) bug, can be reported and tracked, and many systems allow collaboration on the reporting and tracking of problems, or bugs, in code, and their solutions. Cueball reported a problem (bug) he found in the code, which presumably caused the server (program)—which he wrote as part of his project—to try to read the passwords as URLs before storing them. This exposes serious cross-site scripting attacks and other serious security vulnerabilities, and since handling password and user account information usually requires a lot of programming, this would be difficult to fix, which is why the character off-panel suggests burning the project down, as that would be much easier, and would solve any security problems, much more quickly than fixing the bug would. The comment text refers to Cueball's horrid solution to a horrid problem: Instead of solving the problem that is causing the server to read passwords as URLs, he can instead leverage a known problem in the programme which reads URLs which prevents it from reading a particular way of representing text in binary form, by adding a few characters to the user's password that the URL-reading program can't read. This would also "salt" the user's password, which is a security technique that makes passwords harder to figure out when they are stored properly. Cueball thinks this would solve the original problem, and two other problems at the same time, the second problem being the fact that user's passwords aren't salted (a security problem). The third solved problem is difficult to deduce.  Zyzygy 05:40, 29 June 2016 (UTC)

The third bug is the unicode handling, which would need to be solved in order to salt passwords with emoji since these are unicode only character. Although I'm not sure if salting with emoji really increases security since as a rule i'd say nobody uses emoji in their passwords. 162.158.85.123 06:34, 29 June 2016 (UTC)
Password:ย ๐Ÿ‘๐ŸŽ๐Ÿ”‹ฮ  141.101.98.131 10:11, 29 June 2016 (UTC)
That is a really funny password, but is it strong ennough? :-) --Kynde (talk) 20:22, 29 June 2016 (UTC)
Actually, nobody using emoji in their password would be reason salting with emoji is MORE effective. Salting doesn't really increase security of single password, but it does increase security of whole password database, because you can hash some string - like, 123456 and check whole database for users having that as password. If every password is salted with different emoji, this strategy will not work, because while you KNOW which emoji is used - the salt is stored unhashed with the password hash - it's always different so you need to compute new hash for every line in password database. Hashing takes MUCH more time than just comparing strings. And how it's even more effective? Because someone might actually get multiple databases and search for entries with same salt, hoping there will be enough of them to be worth it. And salt with emoji likely wouldn't be so common ... -- Hkmaly (talk) 09:54, 29 June 2016 (UTC)

From 108.162.221.22 (long rant)

Two comments: first, the explanation on password salting is incorrect. The current version says "Salting passwords increases security by adding random data to the passwords which primarily helps defend against dictionary attacks.". Password salting only protects against particular kinds of (common) attacks in specific situations. Most importantly, it is designed to protect passwords only in the event ofa database breach, when a malicious user has gained direct access to the database itself. Password salting provides no protection when brute force attacks (aka dictionary attacks) are directed at the application itself, as the application automatically takes hashes into account. Instead, proper password salting randomizes the hash for each password, ensuring that if two users have the same password, they will not have the same hash. This makes it much more difficult to guess passwords through attack vectors like lookup tables, reverse lookup tables, and rainbow tables. However, because the salt has to be stored with the password (otherwise the application would not be able to make sense of the hash itself), password salting does not secure passwords against dictionary attacks even in the event that a malicious user has managed to acquire the database itself. I will update the explanation with a brief description of what password salts do.
Finally, I think there is a big misunderstanding throughout this explanation. In a web services context the "server" (referenced in the comic) is a very different thing than the application that a programmer builds. A server can refer to either the computer itself or the software that is responsible for responding to web requests and executing the actual application. In a professional context, the application (which is what cueball would be building) would never be referred to as the "server". It is possible that this is a mis-use of terminology on the part of cueball or Randall, but I suspect that the term was used properly and intentionally. The reason is because if cueball's application is crashing the *server*, it takes the level of incompetence up to completely new (and unusual) levels, in much the same way that he has done in the past. Normally the programming language used to build the application, the software hosting the application, and the operating system itself have a number of safe guards in place to ensure that if an application misbehaves, the only thing that crashes is the application itself. For cueball's application to break through all those safeguards and crash the server itself (either the operating system or the web server software) would require cueball to have developed a program that operates *well* outside the bounds of normal procedures. Just for reference, as someone who has been building web software for over 15 years, I wouldn't even know where to start to crash the server from within an application. It would probably have to involve either exploiting a previously unknown bug in the programming language or some *very* poorly designed system calls. Special:Contributions/108.162.221.22 15:11, 29 June 2016โ€Ž 108.162.221.22 (Rememeber to sign your comments)
If you by 'server' means Apache it is not completely unexpected that a sloppy coded extension to Perl or PHP could crash part of the server โ€“ and I still maintain mod_perl code that does 'fancy' stuff. I wouldn't be surprised if Cueball still wrote web-application as he did when mod_perl was the hot stuff. Today it is a common setup to have a chain of servers. In the front nginx for SSL termination, maybe an application level firewall filtering out spooky requests, then Varnish for caching and load-balancing and finally the application server running the actual web-application โ€“ all layers implementing the HTTP protocol. Which of these are 'the server'? At least it is often easy for the application developer to make the last server in the chain unresponsive (i.e. crashed). Pmakholm (talk) 12:08, 1 July 2016 (UTC)
Writing an extension to your language of choice would probably be a good way of crashing the server, although I would say that writing extensions to the language itself is not a common thing for most people to do. I suppose I'm arguing from experience (which is not always accurate) but in years of PHP and python programming I've never once had to write a language extension, nor did I ever need to for my very complicated thesis work. So I would say that the general point still stands: if Cueball is crashing any part of the server, he is doing things very wrong or at least very different.Cmancone (talk) 12:58, 1 July 2016 (UTC)
mod_perl (and it likes) are not language extensions (as they do not extend the language) but are plugins that extends the capabilities of the server (so as to be able to execute perl) 162.158.255.113 22:03, 5 July 2016 (UTC)

Regarding those last two points: sorry for my long unsigned rant. Didn't realize I wasn't logged in. Still haven't figured out how to sign comments. Gonna try it this time. Cmancone (talk) 16:51, 29 June 2016 (UTC)

You made it ;-) --Kynde (talk) 20:22, 29 June 2016 (UTC)

Explanation says "There is no reason for password handling code to access urls" but that is somewhat wrong -- Password handling code frequently perform heuristics on the password to assess the strength, for example checking if part of the password is a dictionary word -- similar heuristics could be done to check thatthe password is not a URL, such as "xkcd.com" applying DNS and other internet resources as an extention of the concept of "dictionary". Spongebob (talk) 16:11, 29 June 2016 (UTC)

I think http://xkcd.com/correct/horse/battery/staple/ would be a perfectly fine password, even though it is also an URL โ€“ but a heuristic that just looks at the length of the password and if it only contains alphanumeric characters would probably be fooled. Trying to detect the scheme used to generate the password could be helpful in choosing a relevant heuristic for deciding the password strength. Ont the other hand, I would consider it very bad to actually test whether the URL is resolvable in any way that leaks information about the password to the outside. Pmakholm (talk) 11:11, 1 July 2016 (UTC)
leaking password information to the outside would be bad, but then again any implementation of a password URL resolving scheme would just add emoji salting 162.158.255.113 22:03, 5 July 2016 (UTC)


Currently the explanation says: "Finally, emoji will often include unicode characters, which means that, if one can effectively salt passwords with emoji, then the passwords should be able to be stored in unicode (although that *probably* doesn't require anything outside the Base Multilingual Plane, so that might not need full unicode support after-all)." I'm fairly convinced that this doesn't make sense and is incorrect. Regardless of what character encoding the password is in, hashing will convert the entire thing into binary. This binary is then typically stored as a base64-encoded string in the database. Ergo, it doesn't matter whether the original password strings were in unicode or not: they will be stored in the database as ascii (or binary), not unicode. I'm going to go ahead and remove this comment from the explanation. I'm pretty certain that there isn't enough information in the comic to figure out why salting passwords with emoji would fix a unicode-handling bug in the URL request library. So I suspect that there is no explanation there: either Cueball is entirely confused and his statement makes no sense, or there is simply not enough information given to help us understand why this solution might fix the problem. However, I'm not going to make any updates to the explanation about this yet, because perhaps I'm missing something someone else will notice. Cmancone 12:50, 29 June 2016 (ETC)

I suspect that the salting with emoji is to ensure that the password does not resolve as a URL (since the library cannot understand the encoding), and thus the issue of the crash is resolved, the issue of unsalted passwords is solved, and the issue of the unicode handling bug is "solved" by virtue of it now being a feature relied on by the system. 173.245.56.68 20:50, 29 June 2016 (UTC)

I removed from the explanation the discussion about how Cueball's system might be checking passwords to see if they are resolvable URLs as a check against weak passwords. The problem with this explanation is that if that is where the crash was happening, then salting the password with emoji would not fix the crash. For salting to fix the crash (as Cueball suggests it will) requires that the crash be happening during the hashing process, not during password validation. The reason is because password validation is performed on the original password itself, while only hashing happens on the salted password. So for salting to fix the crash it must be happening during hashing, not validation. If the bug is happening while checking passwords for strength then Cueball's suggestion of fixing it by adding in a salt will not actually fix the crash at all. It could be that Cueball is simply completely wrong about everything, but I think it makes more sense to go with an explanation where the title text didn't just get everything wrong.Cmancone (talk) 13:12, 30 June 2016 (UTC)

In the nomenclature I am familiar with mailto:[email protected] is a URL. If you accept that mailto (and other protocols) are also URLs some of the description is untrue. Fortunately the untrue bits are also unnecessary and can be deleted or generalized.--108.162.219.11 04:51, 1 July 2016 (UTC)

Definitely a sequel to 1084108.162.221.92 08:27, 1 July 2016 (UTC)

The explanation of 'Resolveable URL' is very confusing and mostly wrong, it is mixing up technologies like HTTP and DNS and is confusing FQDN's with URLs. Though admittedly, the term 'resolveable URL' is a bit of a misnomer by itself. URLs are typically not resolved, they contain an FQDN that is resolved via DNS to an IP(v6) address and optionally port. The remainder of the URL can be used to identify a resource on that server, but how this is done and signaled is quite application/protocol dependent (and shouldn't be called 'resolving'). So if you hit a 404 the FQDN actually resolved but the HTTP resource could not be found. A non-resolveable URL would give a browser error like 'unknown host'. 162.158.83.198 09:59, 1 July 2016 (UTC)

Feel free to adjust the text as desired. I think in this case it is best to understand "Resolvable URL" as Cueball meant it, which (I think) is as any valid URL you might stick in your web browser. I don't think he meant any of the more technical possible definitions. In that sense a resolvable URL would be something that points to a server, potentially followed by a resource on the server. At least, that is how I would think to describe it. Feel free to give it a go. -- Cmancone (talk) (please sign your comments with ~~~~)

Around the time this comic was published, Firefox linked on its start-page to https://advocacy.mozilla.org/en-US/encrypt/codemoji/2, where a webapp Codeempji is available. With this webapp one can use emojis to scramble messages with a Caesar-cipher like method. This might be the reason for the emoji reference in the title text.162.158.85.147 09:58, 4 July 2016 (UTC)

A good addition could be that the passwords were most probably stored in plaintext, which is even worse. Jonsku99 (talk) 12:30, 26 July 2016 (UTC)


I think the real horror of this bug has not been addressed in the description above: In order to check if a password is a resolvable domain name it has to be sent to a domain name server - and the connection to the next domain name server is traditionally completely unencrypted => The system's ability to check if the password is a resolvable URL implies that all passwords might be known to everyone who has access to a piece of the internet infrastructure. Things cannot get much worse to a sysadmin than this - except perhaps for later finding the leaked passwords for sale in a public place, that is. Gunterkoenigsmann (talk) 07:10, 15 October 2016 (UTC)

Correct me if I'm wrong, but doesn't DNS resolution only send the domain name to the DNS server, not the entire URL? 108.162.219.174 21:31, 24 July 2021 (UTC)