Editing 2739: Data Quality
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 10: | Line 10: | ||
==Explanation== | ==Explanation== | ||
+ | {{incomplete|Created by a SUPERIOR FELINE. There should probably be a table of the values, explaining what each value means, maybe a second table of how the Title Text elements fit each data point - Do NOT delete this tag too soon.}} | ||
<!-- Specifically "No Idea If There's A Character Limit LMAO": please refrain from removing any more Incomplete tags by yourself and so quickly, and please check your Talk page! And please remove this comment once you've read it. :) --> | <!-- Specifically "No Idea If There's A Character Limit LMAO": please refrain from removing any more Incomplete tags by yourself and so quickly, and please check your Talk page! And please remove this comment once you've read it. :) --> | ||
− | Digital data can be compressed to make transmission and/or storage more efficient; some {{w|compression algorithms}} discard some | + | Digital data are transferred in bits, and {{w|data loss}} is the process by which some of these bits are lost or altered during data transport. Data can also be compressed to make transmission and/or storage more efficient; some {{w|compression algorithms}} discard some data to improve the compression (this can be acceptable in audio or visual data, since the difference may be hard for humans to perceive). |
− | This comic shows a chart in the form of a line, increasing quality from | + | This comic shows a chart in the form of a line, moving in increasing quality from most lossy to most lossless. However, the highest quality, "better data", is using a different sense of the term "quality". In the context of data transmission or compression, it refers to how accurately the result represents the original. But in this case, he's referring to its more general excellence. |
− | The title text uses your cat as an example of this range of losses (or, in the case of the latter reaches of the graph, gains) in the | + | The title text uses your cat as an example of this range of losses (or, in the case of the latter reaches of the graph, gains) in the data. The most lossy is an exclamation about how cute your cat is, which is ephemeral and obviously carries very little significance in terms of actually providing specific, transferrable information about your cat. The example then progresses into your cat's chip ID; presumably your cat has been microchipped, and between the last four digits (commonly used in sensitive information as an identifier without revealing the full number) or the entire chip ID, provides a still-uninformative yet slightly improved way of identifying your cat. A drawing of your cat and a photo of your cat would portray the cat reasonably well, while a clone of your cat and (of course) your actual cat would be the best way of gaining data about your cat. However, as in the actual comic, the final, most lossless (in this case, with the most gain) form of data transfer has nothing to do with your cat, but is simply Randall's better cat. This is apparently made out by Randall to be the pinnacle of cat data. |
=== Details === | === Details === | ||
Line 22: | Line 23: | ||
|- | |- | ||
! Item | ! Item | ||
− | |||
! Explanation | ! Explanation | ||
− | |||
− | |||
− | |||
− | |||
|- | |- | ||
| {{w|Bloom filter}} | | {{w|Bloom filter}} | ||
− | + | | A Bloom filter is a probabilistic data structure that can efficiently say whether an element is probably already a dataset, while it can say "element is not in set" with 100% accuracy. If a Bloom filter is used to compress the contents of a book, the Bloom filter can re-tell the story - just by guessing. | |
− | | A Bloom filter is a probabilistic data structure that can efficiently say whether an element is | ||
|- | |- | ||
| {{w|Hash table}} | | {{w|Hash table}} | ||
− | + | | A hash table allows you to find data very fast. Randall probably means hashing the contents of entire books. Calculating a hash value for an entire book means that there is (most probably) a unique relationship between the book and a hash value - e.g. "58b8893b2a116d4966f31236eb2c77c4172d00e9". This means the book will yield this exact hash value, and this hash value can only mean this book (though it's impossible to reconstruct the book's content from a hash vaue). It is a highly efficient, but is meaningless. | |
− | | A hash table allows you to find data very fast. Randall probably means hashing the contents of entire books. Calculating a hash value for an entire book means that there is (most probably) a unique relationship between the book and a hash value - e.g. " | ||
|- | |- | ||
| {{w|JPEG|JPG}}, {{w|GIF}}, {{w|MPEG-1|MPEG}} | | {{w|JPEG|JPG}}, {{w|GIF}}, {{w|MPEG-1|MPEG}} | ||
− | + | | Image and video formats that are considered 'lossy'. JPG (or "JPEG") format and the MPEG {{w|MPEG-2|group}} {{w|Advanced Video Coding|of}} formats typically use a range of data-compression methods that save space by selectively fudging (thus losing) what details it can of the image (and audio, where appropriate), to make disproportionate gains in compression; best used for real world images (and films) where real-world 'noise' can afford to be replaced by a more compressible vesion, without too much obvious change. | |
− | | Image and video formats that are considered 'lossy'. JPG (or "JPEG") format and the MPEG {{w|MPEG-2|group}} {{w|Advanced Video Coding|of}} formats typically use a range of data-compression methods that save space by selectively fudging (thus losing) what details it can of the image (and audio, where appropriate), to make disproportionate gains in compression; best used for real world images (and films) where real-world 'noise' can afford to be replaced by a more compressible | + | GIF compression is not 'lossy' in the same way, i.e. whatever it is asked to encode can be faithfully decoded, but Randall may consider its limitations (it can only write images of 256 unique hues, albeit that these can come from anywhere across the whole 65,536 "True color" range, plus transparency) to be a form of loss, as conversion from a more sophisticated format (e.g. PNG, below) could lose many of the subtle shades of the original and produce an inferior image. For this reason, GIF format became one best left to render diagrams and other computer-generated imagery with swathes of identical pixels and mostly sharp edges (and to utilise the optional transparent mask). Alternatively, he may just have included it as a joke/nerd-snipe. |
− | GIF compression is not 'lossy' in the same way, i.e. whatever it is asked to encode can be faithfully decoded, but Randall may consider its limitations (it can only write images of 256 unique hues, albeit that these can come from anywhere across the whole 65,536 "True color" range, plus transparency) to be a form of loss, as conversion from a more sophisticated format (e.g. PNG, below) could lose many of the subtle shades of the original and produce an inferior image. For this reason, GIF format | ||
|- | |- | ||
− | | {{w|PNG}}, {{w|ZIP (file format)|ZIP}}, {{w|TIFF}}, {{w|WAV}} | + | | {{w|PNG}}, {{w|ZIP (file format)|ZIP}}, {{w|TIFF}}, {{w|WAV}} |
− | + | | A series of formats using lossless compression. PNG and TIFF are image formats, that are suitable for photos but without resorting to reduced accuracy in order to assist compression. WAV is an audio format that also does not arbitrarily sacrifice 'unnecessary' details, unlike the more recently developed {{w|MP3|MPEG Audio Layer III}} which has become the defacto consumer audio format for many. | |
− | | A series of formats using lossless compression. PNG and TIFF are image formats that are suitable for photos | + | ZIP is a generic compression algorithm(/format) that can be used to store any other digital file, for exact decompression later on, although any file(s) already compressed in some way are not likely to compress significantly more. |
− | ZIP is a generic compression algorithm ( | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|- | |- | ||
− | | | + | | Parity bits for error detection |
− | | | + | | In the number 135, the sum of digits is 9. So, the number 135 could be written as "135-9". If the number was tampered with, the parity bits could tell you so (in some cases). But a change from "135" to "153" could not be detected that way. There are more reliable means to detect errors: The obsolete CRC-32 and MD5, and the much more modern {{w|Secure Hash Algorithm|SHA}}. |
− | |||
|- | |- | ||
− | | | + | | Parity bits for error correction |
− | | | + | | There are ways to restore the original data with the given additional data. |
− | |||
|} | |} | ||
==Transcript== | ==Transcript== | ||
− | :[A line chart is shown with eight unevenly-spaced ticks each one with a label beneath the line. Above the middle of the line there is a dotted vertical line with a word on either side of this divider. Above the chart there is a big caption with an arrow beneath it | + | :[A line chart is shown with eight unevenly-spaced ticks each one with a label beneath the line. Above the middle of the line there is a dotted vertical line with a word on either side of this divider. Above the chart there is a big caption with an arrow pointing right beneath it.] |
:<big>Data Quality</big> | :<big>Data Quality</big> | ||
:Lossy ┊ Lossless | :Lossy ┊ Lossless |