Editing 1301: File Extensions

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 8: Line 8:
  
 
==Explanation==
 
==Explanation==
Computer file names often end in {{w|file extension}}s like ".ppt" or ".exe". These extensions are a holdover from early operating systems like {{W|DOS}} in which filenames had a maximum eight characters followed by a period and the three-character extension. The extension was used by the operating system to determine filetype so that the system would know how to handle the file (e.g. which program could open the file). Newer operating systems and file systems now accept longer-than eight-character filenames, and extensions of greater than three characters; although most extensions remain three characters.
+
Almost all file names end in a period followed by a (generally three-letter) suffix known as a {{w|file extension}}, used to determine the type of content contained in the file. Generally (but not always), a particular extension will only be used by a specific program or small set of programs, making a file's extension a quick indicator of how the file might have been produced.
  
Most extensions are created as proprietary to certain pieces of software, although software by other developers may later be designed to be able to read the format. For example, .doc is a Microsoft Word document, although because of that software's popularity, many word processors include the ability to open .doc files. Some common file extensions are not proprietary to a piece of software and may be handled by various programs; .jpg or .gif images are examples. In either case, a file's extension is generally a good indicator of what type of data the file contains.
+
Because of that last part, and the fact that certain programs will tend to be used by only certain types of people, a file's extension may provide a hint toward how trustworthy the file's content may be.
  
Certain file types are more prevalent for certain uses, with some being almost exclusive to one use, while other are in general use and might contain almost anything. Here, [[Randall]] presents a series of file extensions which often contain information, and he is rating the reliability of the information they generally contain from most reliable to least.
+
* {{w|.tex}} files are {{w|TeX}} and {{w|LaTeX}} source files; the aforementioned programs are often and almost exclusively used by academics, especially in mathematics and the hard sciences. .tex means serious business.
 +
* {{w|.pdf}} files are a document format by Adobe, frequently used for publication. Thus, a .pdf file is likely to be some type of final product or polished work.
 +
* {{w|.csv}} files contain a bunch of raw data delimited by commas, and are likely computer-generated (from, say, a scientific experiment).
 +
* {{w|.txt}} files contain only plain text, no "rich text" or anything fancy. They are generally used by programmers for purposes such as README files.
 +
* {{w|.svg}} files are a vector graphics format used a lot for diagrams, such as on Wikipedia.
 +
* {{w|.xls}} and {{w|.xlsx}} are spreadsheets. .xls (.xlsx since 2007) is a proprietary format used by Microsoft Excel as part of the Microsoft Office bundle and .xlsx is an ''Office Open XML'' format created by Microsoft. Anyone with Microsoft Office (very popular among Windows Users) could easily make one of these. The files stereotypically contain a mix of raw data (similar to a .csv) and calculations and plots using that data.
 +
* {{w|.doc}} is another proprietary document format, used by Microsoft Word, also part of the Microsoft Office bundle. A good portion of Windows users have Microsoft Office, and any one of them could easily make one of these (probably why Randall doesn't trust it much).
 +
* {{w|.png}} is a bitmap image format designed for the Internet. It is enjoying wide popularity for providing crisp, full-color images with lossless (invisible) compression. Almost all xkcd comics, this diagram included, use PNG. Self-deprecation, maybe?
 +
* {{w|.ppt}} refers to a Microsoft Office Powerpoint file. Again, anyone with Windows can make one of these, but they are usually used for presentations, not documents. Thus, the information will be arranged differently, possibly to "dumb down" the content.
 +
* {{w|.jpg}} and {{w|jpeg}} are the same and used as an image format with high compression capabilities, excellent for storing photos, but not so good for many other things. This file format is prone to annoying compression artifacts; storing numerical or textual information in a JPEG file is typically a bad idea. Digital cameras use JPEG compression while the original {{w|Raw image format|.raw}} could be up to hundred times larger. Therefore, you can't trust the content of a JPEG file, because it doesn't contain the original content. Further, there is also the ''possibility'' that [http://www.geek.com/news/updated2-new-virus-embeds-itself-in-jpg-images-549279/ viruses] can get embedded into JPEG files. The extension .jpeg is less trustworthy than .jpg as it does not conform to the three letter rule for file extension suffixes, implying the creator is not very tech-savvy.
 +
* {{w|.gif}} is a bitmap image format capable of short animations. It was once ''the'' Internet image file format until PNG gradually replaced it for many good reasons. It made a comeback in recent years, mostly for silly clips of cats falling into boxes. It's also used in blinking ads claiming that you're the '''[[570|570,000]]th VISITOR!''', and in the online adult industry for both content and marketing. In addition, because it can be animated, people will often make seemingly normal images that then have something pop out and startle you.
  
*{{w|.tex}} files are source files for the programs {{w|TeX}} and {{w|LaTeX}}, which are used often and almost exclusively by academics, especially in mathematics and the hard sciences. .tex pretty much means serious business, and Randall does not anticipate that anyone would use such a format other than for reliable information.
+
Most of the Microsoft file formats can also be created using open source programs such as Open Office or Libre Office; unlike Microsoft Office applications these are also available for Linux. There also exist apps for Android tablets that can edit Microsoft file formats.
*{{w|.pdf}} files are a '''p'''ortable (as in over the web) '''d'''ocument '''f'''ormat by Adobe, frequently used for publication. Companies use them for official documentation. Thus, a .pdf file is likely to be some type of final product or polished work. Further, .tex files are generally compiled into .pdf files in order to make them readable. It would be strange to trust a .tex file without trusting the .pdf to which it compiles. For example, when submitting to academic journals in math and the hard sciences, the journal accepts the .tex file, but then compiles it and publishes the resulting .pdf. On the other hand, software which can produce a .doc/.xls(x), as described below, these days tends to have an inbuilt or addable ability to "Export to PDF", with the promise of slightly more read-onlyness and localisation-immunity than the .doc, so it might arise - in good faith or otherwise - from a less professional editor ''trying'' to look a little more serious about the copy they distribute in this document format.
 
*{{w|.csv}} are '''c'''omma-'''s'''eparated '''v'''alues: tables of information delimited by commas, and often consist of computer-generated raw data (from, say, a scientific experiment or a database).
 
*{{w|.txt}} files contain only plain text, no "rich text" or anything fancy. Programmers often use them for README files. The txt format indicates that the creator prioritizes recording the information over making the information visually appealing, although {{w|ASCII art}} images or multiline 'bannering' of text might be included by some authors.
 
*{{w|.svg}} files are a ('''s'''calable) '''v'''ector '''g'''raphics format used a lot for diagrams, such as on Wikipedia.
 
*{{w|.xls}} and {{w|.xlsx}} files are spreadsheets used and created by the program Microsoft Excel, part of a bundle of applications known as Microsoft Office (also supported by compatible free software such as LibreOffice). These applications are very commonly used, especially for business, finance and data analysis tasks. {{w|.xls}} is a binary format used for Excel versions up to 2003, while {{w|.xlsx}} is a ZIPped XML-based format used for Excel versions 2007 and later.
 
*{{w|.doc}} files are a rich-text document format used and created by the program {{w|Microsoft Word}}, another application in the Microsoft Office bundle. As with .xls, almost anyone with access to Microsoft Office could easily make one of these. While Excel is generally used for creating tables and presenting data, Word could be used for any text-based document. Thus, Word documents tend to be far more prevalent and casually created than Excel documents, which is presumably why Randall doesn't trust them as much.
 
*{{w|.png}} files are a bitmap image format designed for the Internet. They enjoy wide popularity for providing crisp, full-color images with lossless (reversible) compression. Almost all xkcd comics, this diagram included, use PNG. But, since anyone can create an image (you can draw something online and it will use .png), Randall rates this type as not very trustworthy.
 
*{{w|.ppt}} files are used and created by the program {{w|Microsoft PowerPoint}}; as with the other two Office applications, almost anyone could easily make one of these. As they are usually used for presentations rather than documents, the information in them may be arranged differently, possibly to "dumb down" the content, or in marketing materials or talks in which the author may not be very objective. Further, several years ago, PowerPoint presentations were sometimes included instead of plain images as attachments in e-mail forwards containing inaccurate information. These emails still occasionally circulate, and may be the source of Randall's distrust.
 
*{{w|.jpg}} files are another image format with high compression capabilities, good for storing photos and not so good for many other things. Photographs in general are prone to image manipulation, hence Randall's low score for this file format.
 
*{{w|.jpeg}} files are the same thing as .jpg files, but these are more likely to have been created manually rather than automatically, making them even less reliable.
 
*{{w|.gif}} files are yet another bitmap image format, notable for supporting short animations. GIF was once ''the'' Internet image file format until PNG gradually replaced it. Since GIF is the only common image format capable of animation, it is often used to contain things like silly clips of cats falling into boxes, or annoying, blinking advertisements claiming that "you're the '''[[570|100,000,000]]th VISITOR!'''". GIFs are also created by Internet trolls, such as on 4chan.org, to feed misinformation to gullible gamers and other computer users. For example, a recent [http://mashable.com/2013/12/09/xbox-one-hoax-4chan-backward-compatible/ Xbox One Hoax GIF] contained instructions that were said to make the Xbox One backwards compatible with Xbox 360 games, but would actually make the console inoperable.
 
  
Note that while the extensions .xls/.xlsx, .doc, and .ppt were originally exclusive only to Microsoft Office and users of Windows, there now exist a number of open source programs such as Open Office, Libre Office, and some Android apps that are capable of editing such files. These programs can run on systems other than just Windows, such as Linux, perhaps contributing to making them even more widespread and easy to make than before.
+
The title text refers to some plain text editors, producing simple .txt files, where the human editor and not the application is resposible for aligning the text. Proper indents are one method to improve the text for a human reader on such a plain text file.
 
 
The title text refers to how .txt files contain only plain text and nothing else, meaning that any alignment (such as for indentation, tables, or {{w|Justification (typesetting)|justification}}) would have to be performed manually by adding in spaces or tabs. Anyone who would go through such an effort to improve their text's readability is likely to be trustworthy, and almost by definition, the opinion presented would be justified.
 
  
 
==Transcript==
 
==Transcript==
:[Caption above the bar chart:]
 
 
:Trustworthiness of Information by File Extension
 
:Trustworthiness of Information by File Extension
 
+
:[A bar graph charting this. No units or figures are given, but for ease of comprehension this transcript will arbitrarily designate the highest score as "+100"; subsequent scores are estimates based on the size of their bars.]
:[A line is going down and from that gray bars charting the trustworthiness in a bar graph that goes both left and right of the line. No units or figures are given. For ease of comprehension this transcript will arbitrarily designate the highest score as [+100]; subsequent scores are estimates based on the size of their bars.]
+
::.tex: +100
:[+100]: .tex
+
::.pdf: +89
:[+89]: .pdf
+
::.csv: +85
:[+85]: .csv
+
::.txt: +67
:[+67]: .txt
+
::.svg: +65
:[+65]: .svg
+
::.xls/.xlsx: +49
:[+49]: .xls/.xlsx
+
::.doc: +21
:[+21]: .doc
+
::.png: +15
:[+15]: .png
+
::.ppt: +14
:[+14]: .ppt
+
::.jpg: +3
:[+3]: .jpg
+
::.jpeg: -8
:[-8]: .jpeg
+
::.gif: -36
:[-36]: .gif
 
 
 
==Trivia==
 
The various extensions are, for the most part, abbreviations of the file type.
 
*.tex isn't short for anything, {{w|TeX}} (that lowercase e is very important) is in fact the full name of the program
 
*.pdf is an acronym for Portable Document Format
 
*.csv is an acronym for Comma-Separated Values
 
*.txt is short for "text" - the 8.3 format meant the vowel was dropped
 
*.svg is an acronym for Scalable Vector Graphics
 
*.xls is short for eXceL Sheet (it's also why Microsoft Excel has an "X" on its icon rather than an "E")
 
*The extra x in .xlsx (.docx and .pptx) refers to the upgrade from binary to ZIPped '''X'''ML for those formats
 
*.doc is short for DOCument
 
*.ppt is short of PowerPoinT presentation
 
*.png is an acronym for Portable Network Graphics
 
*.jpg is short for .jpeg - the 8.3 format again removed the vowel
 
*.jpeg is an acronym for Joint Photographic Experts Group, the organization that created the standard
 
*.gif is an acronym for Graphics Interchange Format
 
  
 
{{comic discussion}}
 
{{comic discussion}}
 
+
[[Category:Charts]]
[[Category:Bar charts]]
 
[[Category:Computers]]
 

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)