Difference between revisions of "1718: Backups"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Created by dgbrtBOT)
 
(Made the first explanation. Needs some re-organizing and elaboration.)
Line 8: Line 8:
  
 
==Explanation==
 
==Explanation==
{{incomplete|Created by a BOT - Please change this comment when editing this page.}}
+
{{incomplete|Needs to be cleaned up and reorganized. Also needs a transcript.}}
 +
 
 +
Here, Cueball is depicted on his laptop, exploring the strange paths files are taking from storage to storage. His laptop (presumably the one he is on) is sending its files to a server, which sends its files to ''another'' server, which in turn syncs up files with his laptop. Apparently this continues flow of information leads to growth each time it cycles, which would compound each time. This leads to an exponential growth of information being stored. Cueball, who was rather alarmed, calms down when he realizes that ''this'' exponential growth is slower than that of Moore's Law. Moore's Law is a law in CS that states that the maximum capacity of information storage would double approximately every 18 months. And so, as long as Cueball keeps at the forefront of information storage density, he will never run out of room. Someone else in the house tells him, in reaction to his realization, that he is why they can't have nice things.
  
 
==Transcript==
 
==Transcript==

Revision as of 14:05, 10 August 2016

Backups
Maybe you should keep FEWER backups; it sounds like throwing away everything you've done and starting from scratch might not be the worst idea.
Title text: Maybe you should keep FEWER backups; it sounds like throwing away everything you've done and starting from scratch might not be the worst idea.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: Needs to be cleaned up and reorganized. Also needs a transcript.
If you can address this issue, please edit the page! Thanks.

Here, Cueball is depicted on his laptop, exploring the strange paths files are taking from storage to storage. His laptop (presumably the one he is on) is sending its files to a server, which sends its files to another server, which in turn syncs up files with his laptop. Apparently this continues flow of information leads to growth each time it cycles, which would compound each time. This leads to an exponential growth of information being stored. Cueball, who was rather alarmed, calms down when he realizes that this exponential growth is slower than that of Moore's Law. Moore's Law is a law in CS that states that the maximum capacity of information storage would double approximately every 18 months. And so, as long as Cueball keeps at the forefront of information storage density, he will never run out of room. Someone else in the house tells him, in reaction to his realization, that he is why they can't have nice things.

Transcript

Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

I think this makes more sense if only a small portion of all files from the laptop complete the ENTIRE loop. if the total percentage of files which complete the entire loop is 0.0004% , and he backups once a month, that should give him exponential growth slightly smaller than Moore's Law. At 18 months, his total file size would be about 168% of the original. 172.68.58.245 22:03, 10 August 2016 (UTC)

"Cueball: Wait. My laptop is backing up some folders to this server..." Because of that I agree with you. It's saying "Some" folders are being backed up. The wording heavily implies it's not everything in the computer being backed up just a part. 141.101.98.61
Even if all the files do make the round trip they might use good deduplication. If all the files round trip but only the changes and a few kilobytes of metadata per file are duplicated then the growth can be exponential. This is only true if none of the backups are compressed or encrypted, though. 108.162.219.232 (talk) (please sign your comments with ~~~~)


Also, the title text my refer to that often when you lose a project and have to start over from scratch, the project become so much better. 162.158.133.102 01:55, 11 August 2016 (UTC)


This happens. It can really surprise you when the exponential curve is flat enough. We had a case where we kept a log of the backups on a server that was backed up. This went fine for years, until at some point when we ran out of backup space we found that backups of the logs of backups consumed over 99% of our diskspace.162.158.87.11 10:04, 11 August 2016 (UTC)

Tee hee! This is why the first thing I exclude from backup is the log directory, or the whole /var tree (with a few selected exceptions, like /var/spool/cron/crontabs - this is a royally misplaced location, it should go under /etc). The logs that need to be kept are sent to a log server, online, by the logger daemon itself. If there's no log server (small systems) at least send the logs to backup place during log rotation. -- 162.158.203.151 18:59, 11 August 2016 (UTC)
I once managed to backup / to the backup disk at /media/Backup Disk. D'oh. Backupception. --162.158.150.228 12:17, 11 August 2016 (UTC)

I think there should be an explanation, why this setup leads to exponential growth. IMO, it is linear or polynomial of degree 2 at most. Let's assume, the notebook does only contain one file: /A.txt. After one backup-cycle there are two files: /A.txt and /backups/A.txt. After the next one, there are three: /A.txt, /backups/A.txt and /backups/backups/A.txt. Thus the amount of files does only grow in a linear way. Only the path-information is growing faster: The amount of additional directories in the file's path is growing with the square of the amount cycles (it's the sum of all integers from 1 to the cycle-count). Can anybody explain the exponential growth? Epaminaidos (talk) 06:44, 12 August 2016 (UTC)

The number of files grows exponentially, if not a certain amount of data but a percentage of the data is backed up in each cycle. --162.158.83.228 07:31, 12 August 2016 (UTC)
Can you elaborate this? I don't get it. Epaminaidos (talk) 09:50, 12 August 2016 (UTC)
I guess most backup systems keep older backups. First, there's /A.txt. Next, there's /A.txt and /backup/2016-08-12/A.txt. Third, there's /A.txt, /backup/2016-08-12/A.txt, /backup/2016-08-13/A.txt and /backup/2016-08-13/backup/2016-08-12/A.txt. --SlashMe (talk) 09:38, 12 August 2016 (UTC)
Cueball is talking about "syncing folders", not about a backup-system that keeps old versions. Epaminaidos (talk) 09:50, 12 August 2016 (UTC)
 ????? The first two panels say they are creating back-ups. 108.162.210.196 12:35, 12 August 2016 (UTC)
Actually, there are two backup systems and one sync involved. --SlashMe (talk) 13:17, 12 August 2016 (UTC)