Editing 2739: Data Quality

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 42: Line 42:
 
GIF compression is not 'lossy' in the same way, i.e. whatever it is asked to encode can be faithfully decoded, but Randall may consider its limitations (it can only write images of 256 unique hues, albeit that these can come from anywhere across the whole 65,536 "True color" range, plus transparency) to be a form of loss, as conversion from a more sophisticated format (e.g. PNG, below) could lose many of the subtle shades of the original and produce an inferior image. For this reason, GIF format becomes one best left to render diagrams and other computer-generated imagery with swathes of identical pixels and mostly sharp edges (and to utilize the optional transparent mask), for which JPEG compression will create prominant image artefacts. Alternatively, he may just have included it as a joke/nerd-snipe.
 
GIF compression is not 'lossy' in the same way, i.e. whatever it is asked to encode can be faithfully decoded, but Randall may consider its limitations (it can only write images of 256 unique hues, albeit that these can come from anywhere across the whole 65,536 "True color" range, plus transparency) to be a form of loss, as conversion from a more sophisticated format (e.g. PNG, below) could lose many of the subtle shades of the original and produce an inferior image. For this reason, GIF format becomes one best left to render diagrams and other computer-generated imagery with swathes of identical pixels and mostly sharp edges (and to utilize the optional transparent mask), for which JPEG compression will create prominant image artefacts. Alternatively, he may just have included it as a joke/nerd-snipe.
 
|-
 
|-
| {{w|PNG}}, {{w|ZIP (file format)|ZIP}}, {{w|TIFF}}, {{w|WAV}}, raw data
+
| {{w|PNG}}, {{w|ZIP (file format)|ZIP}}, {{w|TIFF}}, {{w|WAV}}
 
| photo of your cat
 
| photo of your cat
 
| A series of formats using lossless compression. PNG and TIFF are image formats that are suitable for photos, but without (necessarily) resorting to reduced accuracy in order to assist compression. WAV is an audio format that also does not arbitrarily sacrifice 'unnecessary' details, unlike the more recently developed {{w|MP3|MPEG Audio Layer III}} which has become the de-facto consumer audio format for many.
 
| A series of formats using lossless compression. PNG and TIFF are image formats that are suitable for photos, but without (necessarily) resorting to reduced accuracy in order to assist compression. WAV is an audio format that also does not arbitrarily sacrifice 'unnecessary' details, unlike the more recently developed {{w|MP3|MPEG Audio Layer III}} which has become the de-facto consumer audio format for many.
 
ZIP is a generic compression algorithm (and the name of the format it creates) that can be used to store any other digital files. Anything put within a ZIP file can be exactly decompressed into the original state later on, although any such file already compressed in some way (such as any of the image formats mentioned in this comic, or other ZIPs) are unlikely to recompress significantly more.
 
ZIP is a generic compression algorithm (and the name of the format it creates) that can be used to store any other digital files. Anything put within a ZIP file can be exactly decompressed into the original state later on, although any such file already compressed in some way (such as any of the image formats mentioned in this comic, or other ZIPs) are unlikely to recompress significantly more.
 
|-
 
|-
| Raw data + parity bits for error detection
+
| Parity bits for error detection
 
| clone of your cat
 
| clone of your cat
 
| In the number 135, the sum of its digits is 9. So the number 135 could be written as "1359", for example, slightly increasing the amount of data that needs to be sent. But with the slight advantage that, if the number was tampered with, the parity bits may be able tell you that an error has occured. (Possibly that the parity itself was the digit that was miswritten.) But a change from "1359" to "1539" could not be detected, in this method, when extracting the parity digit and using this to presume that the first three digits are indeed 'correct'.
 
| In the number 135, the sum of its digits is 9. So the number 135 could be written as "1359", for example, slightly increasing the amount of data that needs to be sent. But with the slight advantage that, if the number was tampered with, the parity bits may be able tell you that an error has occured. (Possibly that the parity itself was the digit that was miswritten.) But a change from "1359" to "1539" could not be detected, in this method, when extracting the parity digit and using this to presume that the first three digits are indeed 'correct'.
Line 54: Line 54:
 
However it is done, if the check indicates a problem then you can only seek a new copy (of the data, and/or the parity or hash), hoping that the problems encountered can be resolved.
 
However it is done, if the check indicates a problem then you can only seek a new copy (of the data, and/or the parity or hash), hoping that the problems encountered can be resolved.
 
|-
 
|-
| Raw data + parity bits for error ''correction''
+
| Parity bits for error correction
 
| your actual cat
 
| your actual cat
 
| With extra error-checking, there are ways to immediately restore the original data with the given additional data. One method is to 'overlap' multiple error-detection parities such that any small enough corruption of data (including of parity bits themselves) can be reconstructed to the correct original value by cross-comparison between all parity bits and the supposed data. One of the first modern methods developed was {{w|Hamming(7,4)}}, invented around 1950, which was a balanced approach designed to handle the typical error conditions typically encountered at the time and has inspired even contemporary electronic methods of maintaining data integrity. Another practical application of error correction bits would be that present in {{w|QR_code#Error_correction|QR Codes}}, using {{w|Reed–Solomon error correction|Reed–Solomon error correction}}.
 
| With extra error-checking, there are ways to immediately restore the original data with the given additional data. One method is to 'overlap' multiple error-detection parities such that any small enough corruption of data (including of parity bits themselves) can be reconstructed to the correct original value by cross-comparison between all parity bits and the supposed data. One of the first modern methods developed was {{w|Hamming(7,4)}}, invented around 1950, which was a balanced approach designed to handle the typical error conditions typically encountered at the time and has inspired even contemporary electronic methods of maintaining data integrity. Another practical application of error correction bits would be that present in {{w|QR_code#Error_correction|QR Codes}}, using {{w|Reed–Solomon error correction|Reed–Solomon error correction}}.

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)