Difference between revisions of "1718: Backups"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Transcript: add transcript)
(Explanation)
Line 8: Line 8:
  
 
==Explanation==
 
==Explanation==
{{incomplete|Needs to be cleaned up and reorganized. Also needs a transcript.}}
+
{{incomplete|Needs to be cleaned up and reorganized}}
  
Here, Cueball is depicted on his laptop, exploring the strange paths files are taking from storage to storage. His laptop (presumably the one he is on) is sending its files to a server, which sends its files to ''another'' server, which in turn syncs up files with his laptop. Apparently this continued flow of information leads to growth each time it cycles, which would compound each time. This leads to an exponential growth of information being stored. Cueball, who was rather alarmed, calms down when he realizes that ''this'' exponential growth is slower than that of Moore's Law. Moore's Law is <strike>a law</strike> an observation in computer science that states that the <strike>capacity of information storage</strike> number of transistors we can fit in a chip will double approximately every 18 months. And so, as long as Cueball keeps at the forefront of information storage density, he will never run out of room. Someone else in the house tells him, in reaction to his realization, that he is why they can't have nice things.
+
Here, Cueball is depicted on his laptop, exploring a cyclic path along which his files are being copied from storage to storage. His laptop (presumably the one he is on) is sending its files to a server, which sends its files to ''another'' server, which in turn syncs back to his laptop. Cueball determines that this setup leads to an exponential growth, implying that each node in the cycle simply copies files over to the next without any effort to avoid duplicates.  Indeed, each time a set of files completes a full cycle, a second copy of the same set will be created, thus doubling the amount of storage space required.
 +
 
 +
Cueball, who was rather alarmed, calms down when he realizes that ''this'' exponential growth is slower than that of Moore's Law. Moore's Law is an observation in computer science that states that the number of transistors we can fit in a chip will double approximately every 18 months. And so, as long as Cueball keeps at the forefront of information storage density, he will never run out of room.
 +
 
 +
The phrase "[this is] why we can't have nice things" is often used in response to incidents where someone abuses a feature meant to benefit people and ultimately causing the feature to break down.  In the comic, the person off-screen is commenting on the fact that Cueball is not using advances in storage capacity in a responsible manner.  That is, rather than using the increased capacity to store more useful information, he is simply using it as a workaround to avoid having to make his backup strategy more efficient.
  
 
==Transcript==
 
==Transcript==

Revision as of 16:28, 10 August 2016

Backups
Maybe you should keep FEWER backups; it sounds like throwing away everything you've done and starting from scratch might not be the worst idea.
Title text: Maybe you should keep FEWER backups; it sounds like throwing away everything you've done and starting from scratch might not be the worst idea.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: Needs to be cleaned up and reorganized
If you can address this issue, please edit the page! Thanks.

Here, Cueball is depicted on his laptop, exploring a cyclic path along which his files are being copied from storage to storage. His laptop (presumably the one he is on) is sending its files to a server, which sends its files to another server, which in turn syncs back to his laptop. Cueball determines that this setup leads to an exponential growth, implying that each node in the cycle simply copies files over to the next without any effort to avoid duplicates. Indeed, each time a set of files completes a full cycle, a second copy of the same set will be created, thus doubling the amount of storage space required.

Cueball, who was rather alarmed, calms down when he realizes that this exponential growth is slower than that of Moore's Law. Moore's Law is an observation in computer science that states that the number of transistors we can fit in a chip will double approximately every 18 months. And so, as long as Cueball keeps at the forefront of information storage density, he will never run out of room.

The phrase "[this is] why we can't have nice things" is often used in response to incidents where someone abuses a feature meant to benefit people and ultimately causing the feature to break down. In the comic, the person off-screen is commenting on the fact that Cueball is not using advances in storage capacity in a responsible manner. That is, rather than using the increased capacity to store more useful information, he is simply using it as a workaround to avoid having to make his backup strategy more efficient.

Transcript

[Cueball] is sitting at a desk, working on a laptop.

Cueball: Wait. My laptop is backing up some folders to this server...

[Cueball scratches his chin in thought.]

Cueball: ...which is backing up its archives to that server...
Cueball: ...and that server is syncing certain folders over to my laptop...

[Cueball clicks something on his laptop keyboard.]

Keyboard: Click click click

[Cueball is back to working normally on his laptop. A voice speaks to him from offscreen.]

Cueball: ...but the exponential growth is slightly slower than Moore's law, so whatever.
Offscreen voice: Oh my God.
Offscreen voice: You are why we can't have nice things.


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

I think this makes more sense if only a small portion of all files from the laptop complete the ENTIRE loop. if the total percentage of files which complete the entire loop is 0.0004% , and he backups once a month, that should give him exponential growth slightly smaller than Moore's Law. At 18 months, his total file size would be about 168% of the original. 172.68.58.245 22:03, 10 August 2016 (UTC)

"Cueball: Wait. My laptop is backing up some folders to this server..." Because of that I agree with you. It's saying "Some" folders are being backed up. The wording heavily implies it's not everything in the computer being backed up just a part. 141.101.98.61
Even if all the files do make the round trip they might use good deduplication. If all the files round trip but only the changes and a few kilobytes of metadata per file are duplicated then the growth can be exponential. This is only true if none of the backups are compressed or encrypted, though. 108.162.219.232 (talk) (please sign your comments with ~~~~)


Also, the title text my refer to that often when you lose a project and have to start over from scratch, the project become so much better. 162.158.133.102 01:55, 11 August 2016 (UTC)


This happens. It can really surprise you when the exponential curve is flat enough. We had a case where we kept a log of the backups on a server that was backed up. This went fine for years, until at some point when we ran out of backup space we found that backups of the logs of backups consumed over 99% of our diskspace.162.158.87.11 10:04, 11 August 2016 (UTC)

Tee hee! This is why the first thing I exclude from backup is the log directory, or the whole /var tree (with a few selected exceptions, like /var/spool/cron/crontabs - this is a royally misplaced location, it should go under /etc). The logs that need to be kept are sent to a log server, online, by the logger daemon itself. If there's no log server (small systems) at least send the logs to backup place during log rotation. -- 162.158.203.151 18:59, 11 August 2016 (UTC)
I once managed to backup / to the backup disk at /media/Backup Disk. D'oh. Backupception. --162.158.150.228 12:17, 11 August 2016 (UTC)

I think there should be an explanation, why this setup leads to exponential growth. IMO, it is linear or polynomial of degree 2 at most. Let's assume, the notebook does only contain one file: /A.txt. After one backup-cycle there are two files: /A.txt and /backups/A.txt. After the next one, there are three: /A.txt, /backups/A.txt and /backups/backups/A.txt. Thus the amount of files does only grow in a linear way. Only the path-information is growing faster: The amount of additional directories in the file's path is growing with the square of the amount cycles (it's the sum of all integers from 1 to the cycle-count). Can anybody explain the exponential growth? Epaminaidos (talk) 06:44, 12 August 2016 (UTC)

The number of files grows exponentially, if not a certain amount of data but a percentage of the data is backed up in each cycle. --162.158.83.228 07:31, 12 August 2016 (UTC)
Can you elaborate this? I don't get it. Epaminaidos (talk) 09:50, 12 August 2016 (UTC)
I guess most backup systems keep older backups. First, there's /A.txt. Next, there's /A.txt and /backup/2016-08-12/A.txt. Third, there's /A.txt, /backup/2016-08-12/A.txt, /backup/2016-08-13/A.txt and /backup/2016-08-13/backup/2016-08-12/A.txt. --SlashMe (talk) 09:38, 12 August 2016 (UTC)
Cueball is talking about "syncing folders", not about a backup-system that keeps old versions. Epaminaidos (talk) 09:50, 12 August 2016 (UTC)
 ????? The first two panels say they are creating back-ups. 108.162.210.196 12:35, 12 August 2016 (UTC)
Actually, there are two backup systems and one sync involved. --SlashMe (talk) 13:17, 12 August 2016 (UTC)