1247: The Mother of All Suspicious Files

Explain xkcd: It's 'cause you're dumb.
Revision as of 10:32, 8 December 2022 by (talk) (Explanation)
Jump to: navigation, search
The Mother of All Suspicious Files
Better change the URL to 'https' before downloading.
Title text: Better change the URL to 'https' before downloading.


Modern operating systems try to intercept malicious files before they can be downloaded. This comic depicts a dialog box requiring the user to confirm if they want to download a potentially dangerous file — and it turns out the file being downloaded is absolutely filled with a truly absurd number of file extensions. Many of the extensions used inside there indicate executable code; multiple file extensions are sometimes used to disguise a trojan program as a document. The sheer number of extensions in the comic wouldn't just look out of place on a safe file, it's also far more than an actual computer virus would bother to have, thus the humor.

The first part of the suspicious file's name is, an IP address that hosted JavaScript malware during a recent attack on the Tor anonymity network, with a very long file title.

You can also see common download syntax for a pirated movie, Hackers, likely included to appear malicious to anyone skimming but is actually a movie about hackers, making it a benign reference rather than malicious. It is described as "_BLURAY_CAM", which contradicts itself ("_BLURAY" would imply it was ripped from a copy on Blu-ray Disc, while "_CAM" would mean it was copied by pointing a camera at the screen in the cinema). "_BLURAY_CAM" would probably indicate a search-keyword-stuffed fake copy; fake pirated media often contain viruses (although this is more likely to be a problem with newer media, before the first real pirated copy appears).

The URL contains the path "~tilde/pub/cia-bin/etc". The first part is a public folder of a user named "tilde" (which is also the name for the ~ symbol), "cgi-bin" is a common folder on a web server for server-side executables (Randall changes the name to "cia-bin"), and "etc" is a standard folder for configuration files – normally never accessible through a web server. The program "init.dll" isn't executable at all, it's a Dynamic-link library which can't be run standalone, and is rarely referenced in URLs (even though such syntax is still being employed, even on reputable websites (Google search) or here at eBay, indicating the webserver is a Microsoft ASP server). The question mark indicates the start of a parameter list, and in this case we have only one named "FILE".

The "Save" button is greyed out, suggesting that it is disabled; you can click only the "Cancel" button. For security reasons, some browsers (like Firefox) disable the "Save" button for a few seconds before enabling it. This prevents users from accidentally accepting a download while entering input, like a malicious CAPTCHA.

The complete content sent to the server, starting with "/~tilde..." and ending with "...out.exe", is exactly 256 characters long. On HTML 3 specifications you have a limitation of 1024 characters, whereas later HTML specifications don't have this limit; it just depends on the web server's capabilities. But posting parameters directly at the URL is still a worse choice.

The content of the parameter is shown here:

  • __ (underscore underscore) — used in the C programming language to denote that a symbol is really not for public consumption.
  • autoexec.bat — a batch file which is automatically run during startup on MS-DOS and Windows operating systems, and was often modified by viruses, which added malicious code to be run on each boot.
  • My%20OSX%20Documents — referencing Apple's OS X operating system (%20 is a representation of a space in a URL, i.e. it reads as "My OSX Documents").
  • install.exe — a typical installer.
  • .rar — a compressed archive file type.
  • .ini — a configuration file type.
  • .tar — a file archive popular in Unix and Unix-like operating systems. tar has been mentioned before.
  • .doçx.docx is an Office Open XML file, i.e. a word processing format used by Microsoft Word 2007 and above, but has no cedilla (¸). The addition of a cedilla may be a reference to exploits that rely on rare characters being mistaken for more common ones that look similar, such as the IDN homograph attack.
  • .phphphp — a play on PHP files, a kind of server-based web page file type. PHP originally stood for "Personal Home Page" but was later redefined as the recursive abbreviation "PHP: Hypertext Preprocessor".
  • .xhtml — another web page file type.
  • .tml — stands for Transducer Markup Language, an XML-based markup language that specifies how to capture, time-tag and describe sensor data.
  • .xtl — possibly a play on XHTML.
  • .txxt — a play on .txt file types.
  • 0DAY.HACK — a reference to a zero-day exploit. (overlaps with the next entry)
  • HACK.ERS_(1995)_BLURAY_CAM-XVID — a reference to the 1995 Hackers movie, but pirated movies would either be a BLURAYRIP/DVDRIP or CAM, but not both at the same time unless you used a camera to record a Blu-Ray movie as it played.
  • .exe — an executable file type used by Microsoft Windows.
  • [SCR] — a tag used by movie pirates to denote a 'Screener', the DVD copy of films given to critics prior to theater release. Usually the highest quality available at the time, rare, and thus good bait for a virus-laden download. ".scr" is also the extension for screensaver files, really just an exe file with a different extension and one of the classical ways to distribute infected files.
  • Lisp — programming language.
  • .msi — an installation file used by Microsoft Installer.
  • .lnk — an extension used by Microsoft Windows for shortcuts. The extension is normally hidden to the user.
  • .lnk.zda.gnn — references to Link, Zelda, and Ganon, important characters from The Legend of Zelda video game franchise.
  • wrbt.obj — A reference to the line of code Dennis Nedry used in Jurassic Park to shut down key systems.
  • .o — The extension for a linker file, an intermediary created when compiling C code.
  • .h — The file extension of a header file in C code.
  • .swfShockwave Flash file type.
  • .dpkg — The Debian package management, although the package files use the file suffix .deb.
  • .app — an application on the Mac OS X operating system.
  • .zip — compressed archive file type.
  • .co — the top-level domain (TLD) for Colombia, but marketed as a global domain. Some countries use .co.TLD for general use, e.g. .co.uk in the United Kingdom. But the TLD .gz does not exist and thus .co.gz is invalid.
  • .gz — a compressed file using GNU zip.
  • .a.out — Default filename when creating an executable on Linux or other Unix-like operating systems if none was specified for the compiler.

The title text suggests changing from http to https, as if encrypting a suspicious file before downloading it is somehow better than downloading it unencrypted. http (Hypertext Transfer Protocol) and https (Hypertext Transfer Protocol – Secure) are the two common protocols for getting web pages and web downloads. http is the simple download, whereas https adds an SSL encryption layer so the item being downloaded cannot be viewed unencrypted by anyone except the end recipient. Changing http to https is a common suggestion to improve security when browsing the web from an insecure network (such as a public WiFi hotspot) to avoid surveillance or hijacking to a malicious website; Google automatically switches to https for all mail accounts and is starting to do so with searches. The end recipient will still get whatever nasties were in the original, however — encrypting it doesn't change the content at all.

The IP address referenced in the comic,, was, at the time this article was authored, being used by the shellcode of a JavaScript zero-day exploit for the Tor Browser Bundle being run by the FBI to phone home over the clearnet [1] and deanonymize visitors to websites on Freedom Hosting that are serving child pornography. [2]

As the last extension in the file is .exe, a Windows computer would run the file like an application. Usually, it is not safe to run unknown .exe files.


[Browser download warning box containing the following text.]
This type of file can harm your computer! Are you sure you want to download:ÇX.PHPHPHP.XHTML.TML.XTL.TXXT.0DAY.HACK.ERS_(1995)_BLURAY_CAM-XVID.EXE.TAR.[SCR].LISP.MSI.LNK.ZDA.GNN.WRBT.OBJ.O.H.SWF.DPKG.APP.ZIP.TAR.TAR.CO.GZ.A.OUT.EXE
[Cancel and Save buttons (Save button disabled)]

comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!


LNK and ZDA...Link and Zelda? 13:43, 5 August 2013 (UTC)

http://www.ip-tracker.org/locator/ip-lookup.php?ip=, some place in the USA. Looks random, but still... - Actually this IP hosted some javascript that exploited some FF17 weaknesses on Windows NT during the last LEA TOR raid.

The IP address geolocates to a Starbucks just outside the beltway in Washington. DC. [[Special:Contributions/{{{1}}}|{{{1}}}]] ([[User talk:{{{1}}}|talk]]) (please sign your comments with ~~~~)

Someone mentioned you see the word Hackers as well as a pirated movie... In fact the pirated movie is the 1995 movie named Hackers. Edited it to make the reference clear. -- Sonofaresiii (talk) (please sign your comments with ~~~~)

I am missing DMG or other "Mac" suspect executable -- (talk) (please sign your comments with ~~~~)

WRBT.OBJ.O.H WhiteRabbit.obj from Jurassic Park. Not sure about the O.H Andym (talk) 14:56, 5 August 2013 (UTC)

Fixed .O.H - these are file extensions with C compilers and C headers, respectively.BlackHatm

.tar.gz stands for tarred and gzipped (archive) files; here .co. was introduced to make it look like a domain name .obj can also be a http://en.wikipedia.org/wiki/Relocatable_Object_Module_Format cia-bin is a play on cgi-bin Sebastian -- 15:06, 5 August 2013 (UTC)

After the reference to the FBI in the (currently) final paragraph I was thinking of adding something like the following:

This would also 'explain' the initial directory structure of "/PUB/CIA-BIN/ETC", something like an FTP /pub/ directory for publicly open files, and conflating the CIA with /cgi-bin/ as a somewhat common location for dynamic web-pages, then /etc/ which is another Linux/Unix directory reference, strangely stored underneath a doubley-referenced 'tilde' directory, what with ~foo as the root directory generally redirecting to the home directory for user "foo". These are all usually lower-case (and case-sensitive), but if the INIT.DLL has anthing to do with it it might mean it's an uppercase-dominated and yet actually case-insensitive Windows-based system, with that Windows Dynamically Linked Library as a dynamic responder.

...but I've rushed that and it looks messy/may have errors in it, so feel free to clean it up if it inspires you. Or not... 16:34, 5 August 2013 (UTC)

I think [SCR] actually refers to a screener. -- (talk) (please sign your comments with ~~~~)

> Agreed. The capitalization and brackets are the standard formatting in pirated movie titles, and before a movie release, Screeners (much better quality than theater cams) are excellent bait on fake downloads. Updated in the wiki. Daemonf (talk) 23:09, 5 August 2013 (UTC)

Of course, if the user is on Windows, the only extension that matters is the last one which is ".exe" - an executable. Hax (talk) 16:43, 5 August 2013 (UTC)

I edited the line on the 'save' button being greyed out. This doesn't change with HTTPS, but is instead a modern browser feature preventing a user from agreeing indiscriminately or with a mistaken click. I hope I didn't step on anybody's toes. 00:12, 6 August 2013 (UTC)

That's incorrect, the web server can identify if it's a secure connection or not and render the content of the page depending on this.--Dgbrt (talk) 06:48, 6 August 2013 (UTC)

What is the joke?That the prescence of a huge number of extensions makes this file extremely suspicious?And the punch is that he is suggesting a secure connection to download this file?-- 01:24, 6 August 2013 (UTC)

Yes, this is a joke. I it is a comic. 05:06, 7 August 2013 (UTC)
The joke is that there is no panacea for (TMoASF-downloading user) stupidity, and it can only be fought with education. (talk) (please sign your comments with ~~~~)

"...CO - looks like a top-level domain. Many countries use .co.tld in front of their main TLD, e.g. .co.uk...." Aha! I always thought co.uk meant "Cornwall, United Kingdom." And I couldn't figure out why all their domains were mediated through Cornwall. Every day, I meet a new opportunity to feel clueless... -- (talk) (please sign your comments with ~~~~)

Well... traditionally many of our transatlantic communications links (starting back with telegraph cables) came ashore at Porthcurno, in Cornwall. However, that's just a historic co-inky-dink. ;)
(I was going to say as how most of the secondary-level-domains to .uk were two-letters (as opposed to .com.au, for example), with a few exceptions that I could name, except that I've just looked it up and found that .co(mpany) and .ac(ademic) are actually the only historic ones (and .me.uk coming along later). Of course, those (the first two) were the only ones that ever bothered me (along with things under the .gb TLD), twenty years or so ago. And now we've got Police and Parliament as verbose SLDs. Not that this is relevent, but I just find it interesting to be reminded of how much things seem to have changed...) 05:37, 9 August 2013 (UTC)

On a scale of 'party' to 'judge' in the 'Sketchiness' scale ( http://www.explainxkcd.com/wiki/index.php?title=Sketchiness ), how sketchy is this file? Greyson (talk) 13:38, 6 August 2013 (UTC)

"UNIX" with all caps is a trademark, whereas "Unix" is probably what you should use when you refer to the family of OSes. -- (talk) (please sign your comments with ~~~~)

It is probably important to explain that:

  • The information in that dialog box gives absolutely no indication of what the file being downloaded will be actually named (let alone what is inside), including its extension: it can just as well be downloaded into a harmless text file named "anything.txt" by default. The server may as well just silently ignore anything after "?", especially if INIT.DLL library does not expect an argument named "FILE". (If INIT.DLL even exists - see below)
  • Normally, with such a syntax (DLL-file in a cgi-bin-like directory with a "?" query), "init.dll" should be viewed as benign, as it should execute on the server instead of itself getting downloaded. If it was called "init.exe" nothing would have changed. However if the server is not configured to execute that file, by default it will in fact get simply downloaded as a file, and in that case everything after "?" is completely ignored by the server.
  • Please note that "normally" the URL indicates that on the server somewhere there are nested folders called "~TILDE","PUB","CIA-BIN" and "ETC", last one with a file "INIT.DLL" inside. However it is a perfectly normal practice for a server to interpret that part of URL arbitarily, i.e. there may be only one file named "whatever.php" on the server, and it then may be configured to execute or allow download of that file when anyone tries to request whatever is mentioned in the comic, or request anything at all from that server, or even if anyone types more than 10 "A" letters after the server name, whatever.
  • Thus, the point of this comic is largely that the depicted warning message is almost completely useless: unless a user can somehow make sure that they trust this particular URL, there is no way to know if the file being downloaded could or could not be malicious by looking at its extension because that extension is not displayed.

Could anyone please rephrase some of the above and add it into the article? Because I am new here and dislike digging through all the guidelines before posting. Leftload (talk) 18:52, 6 August 2013 (UTC)

I also noticed that the requested path ("/~TILDE(...)A.OUT.EXE") is exactly 255 characters long. This could be a joke on 255-character path limitation on windows, however the actual file path should have ended with "?", and even if it somehow did not, there would be no extra space for the drive letter then. Leftload (talk) 18:52, 6 August 2013 (UTC)

I think a significant and unexplained element of the joke is the fact that by switching to https, the download would not be scanned by many anti-virus gateway products on the market, because the scanner is unable to inspect the content within the encrypted stream. By clicking on "Save" (if it weren't greyed out) without switching to https, the file is likely to be scanned for virus/malware signatures. By switching to https, this scanning is not available.

Also, I think the 255 character size is important, either as an attempt to overflow a buffer, or as as a means to bypass a scanner (as some scanning systems limit their scope to only the start of a file, where virus signatures are generally found, to maintain throughput). Perhaps if the Windows filename limit is 255 characters, then a 256 character filename might not be detected as having a .EXE extension, thus bypassing a gateway scanner. 09:19, 7 August 2013 (UTC)

When you save the file to your file system it is not encrypted any more. The virus scanner will test this file. The length of the file name is 250 characters because "FILE=" is not part of the name.--Dgbrt (talk) 10:48, 7 August 2013 (UTC)
I think by “anti-virus gateway“ he means something like a web proxy that scans all your traffic. That’s quite common in bigger networks – and quite annoying sometimes … Quoti (talk) 11:19, 7 August 2013 (UTC)

I think the grayed out save button is a reference to Firefox behavior, which doesn't let you to immediately save the file after dialogue pops up. 16:46, 8 August 2013 (UTC)

No, this behavior is only valid for installing Add-ons, etc. When downloading a simple file you can save it immediately. But maybe there is an Add-on to change this.--Dgbrt (talk) 18:46, 8 August 2013 (UTC)
No, I'm 100% positive that in Firefox 3 without any add-ons, you were able to click save only after you clicked the window. Might have changed with new versions, but I'm sure it lasted several(teen) versions at least. 21:46, 10 August 2013 (UTC)

the '0day' could also be a reference to 0day warez, especially because it's right before the warezed movie reference... 12:31, 10 August 2013 (UTC)

I think it's supposed to add to the following "HACK.ERS", as 0-Day Hackers is a network security buzzword. --I Should Get Out More (talk) 14:06, 11 October 2013 (UTC)