Talk:1909: Digital Resource Lifespan

Explain xkcd: It's 'cause you're dumb.
Revision as of 17:39, 1 November 2017 by (talk)
Jump to: navigation, search

Even PDFs can be broken, which is why we have PDF/A (archive) - a subset of PDF that has no external dependencies and thus should last forever. JakubNarebski

To clarify: .PDF files are *frequently* created with content such as fonts (or really anything other than the actual text) referenced within the document but not *embedded* within the document. This is usually done to reduce file size, but it's usually not advisable. Whether it's a .pdf or a .ppt or a .exe it is best to keep your dependencies embedded whenever possible!
.PDF files (or any files) can of course also suffer from hash failure (CRC errors, etc) and PDF/A does not provide redundancy tables; Always make an extra copy on another drive (ideally both off-site & locally). 06:07, 1 November 2017 (UTC)

CD scratched, new computer has no CD drive anyway. - First, you can still buy external CD-ROM drive, for example connected via USB cable. Second, you can try recover data from scratched CD with tools such as ddrescue (free and OSS) or IsoBuster (shareware). --JakubNarebski (talk) 17:51, 30 October 2017 (UTC)

Scratches on the DATA layer of any optical disk destroys that DATA. There is also the consideration that the plastics of the majority of optical disks degrade with time and heat. There are some optical media that are designed to prevent such scratching or corruption like the commercially available M-Disk or laser etching into a micro format into a crystal like a 5D disk. Even then the DATA stored must be in an ISO format to read as well as the equipment to read the media needs to be maintained. I have often told people that their data is never safe unless there is a constant effort to copy, check for quality, and make multiple backups using multiple modern mediums as often as humanly possible. All form of digital media can fail, even the extended warranty on a high end HDD will not cover the data lost and most EULAs for cloud storage will say the same.
Pressed commercial CD-ROMs carry their information between two 0.6 mm thick plastic discs which are glued together, which makes them pretty resilient against scratches on either side – just remove some material with abrasive methods like toothpaste. Often the glue is the bigger issue with low-quality pressings in the long run. This is in contrast to recordable CDs, which are coated with the reflective layer on top of a single disc. –TisTheAlmondTavern (talk) 12:24, 31 October 2017 (UTC)
Or cheaper than an external drive, borrow a friend's computer and copy the CD onto the cloud somewhere. --Angel (talk) 18:39, 30 October 2017 (UTC)
Yet something affected by that would just as likely be affected by "Broken on new OS, not updated". For example, I've got a multimedia encyclopedia which runs on Win 3.11, and thus can't run on 64-bit windows.
Ehrm... You do realise the limitation is the other way around right? You can't run 64-bit application on 32-bit Windows, but 64-bit windows can perfectly well run 32-bit apps. Though Win 3.11 is far enough back it might actually be a fun challenge to see if it runs :D 10:57, 31 October 2017 (UTC)
You can not – Win 3.1(.1) was a 16bit operation system – and Microsoft dropped the 16-bit-layer in win7. --DaB. (talk) 19:18, 31 October 2017 (UTC)

Interestingly, static .PDF files are intended to be electronic equivalents of printed books - an electronic microfiche if you will RIIW - Ponder it (talk) 18:57, 30 October 2017 (UTC)

I'm wondering if data on an older, static, website would still be readable. Would likely still be there (or on, but might be suffering progressive link rot. Also a little surprised that the start of microfilm is so recent; I remember the library having microfilm readers (that nobody ever used) when I was young enough to spend ages staring at a machine, trying to determine its purpose. Guess it depends on the subject, when it was put into that format. --Angel (talk) 18:39, 30 October 2017 (UTC)

Angel, note both the My in the title and the left arrow implying that the resource (like books) were about before Randal had access. RIIW - Ponder it (talk) 18:57, 30 October 2017 (UTC)

Should those white left arrows be noted in the transcript? The gray right arrows are implied by "past", perhaps something like "Before 1980-past 2020" 17:39, 1 November 2017 (UTC)

"Only to realized? - 23:08, 30 October 2017 (UTC)

[Subject] wiki, anyone? Wikis have rather detailed analyses of even obscure topics in my line of work/study. --Nialpxe, 2017. (Arguments welcome) (P.S. just to be clear I mean wikis maintained by researchers and professionals in [Subject] field, not Wikipedia)

There's a wealth of thought about exactly this problem by librarians; the Library of Congress has some recommendations along with a database evaluating over a hundred formats along a variety of axes: is the format documented openly? Is it widely used? Is it inherently transparent to inspection even if the specification is lost? Can it contain its own metadata? What sort of external dependencies does it have? Is it patent-encumbered, and are there technical access restrictions like DRM? (tl;dr, images as TIFF, text as EPUB or PDF/A, sound as WAV. They're very conservative.) 05:07, 31 October 2017 (UTC)

Note that digital data have big advantage over books when dealing with bigger quantity. The amount of work you need to make to preserve printed book is same no matter how many books you have - so it's thousand times more when you have thousand books. Meanwhile, the amount of work needed to preserve for example collection of digital images doesn't really depend on collection size. Let's say that the used format is going out of use: you can automatically convert all images fairy quickly. Of course, harder with applications ... -- Hkmaly (talk) 08:23, 31 October 2017 (UTC)

The software not running after OS update is such a Mac problem. Linux updates would break if closed software was commonly available, but open source can be recompiled, and Windows maintain a scarry amount of backwards compatibility, and only system-admin or DRM-crippled software ever stops working. 10:54, 31 October 2017 (UTC)

I must strongly disagree there; Networking features have been known to break following Windows updates, & Android is *terribly* prone to breaking apps or even removing what may be considered core system features with an OS update. Search "kitkat sd", for just one good example. Even Linux can turn into dependency hell when repositories change their branch structure. Then there's the incredible variety of different hardware which only a specific version of Windows with specific hardware once supported: I still can't get an affordable analog serial port adapter that will work with my favorite flight controller. 06:37, 1 November 2017 (UTC)

Here in the UK, the library access would also have ended some time in the last few years... 11:33, 31 October 2017 (UTC)

Nothing lasts forever (or at least that's what seems to be true for anything observed by humanity). Data becomes corrupted and lost over time and usage, and books become damaged and lost over time and usage. Not to mention, thousands of books were burned during the Nazi regime. Human minds are inevitably subject to corrupted memories as well. We lose information all the time, and we try to recover what remains. However, it is also worth mentioning that our digital technology is still pretty young compared to books and other sources of information. Information used to be recorded on papyrus, tablets (I understand that this contradicts my point as some tablets have stood the test of time), etc. Some of the earliest Chinese inks were created with soot and animal glue. The first (attempts of) photographs required hours of light exposure and would fade away quickly. Over time, we discovered ways to improve upon these sources of information. The same could apply to our digital information today. We are essentially in the "papyrus" phase of electronic technology (one could argue with other descriptions, but this isn't significant to my statements). In time, we may achieve more successful long-term solutions to maintaining original data. There are so many avenues for the advancement of technology, and those avenues continue to multiply with each step. At this time, we just need to continue to work on our projects and experiments for the progress of humanity. NAE (talk) 14:29, 31 October 2017 (UTC)

Randall did a good job frightening me this Halloween... 02:10, 1 November 2017 (UTC)

I wonder if Randall is aware of digital archiving solutions such as those provided by Preservica (, formerly part of Tessella plc. Their solutions are aimed at precisely this problem. Their library/museum clients include "the MoMA, the Frick Collection, the Museum of Fine Arts Houston, Yale Library, The National Library of Australia, The Royal Danish Library, The Philadelphia Museum of Art, McNay Art Museum, DC Public Library and the University of Manchester" and their archive clients include "15 leading pan-national and national archives, 18 US state archives, major corporate archives at BT, HSBC, Unilever and the Associated Press". 03:32, 1 November 2017 (UTC)

Randall forgot my personal favorite: UTF-8 formatted .txt files. Since 1993 & counting, never had an issue opening one. I still have my first copy of The Anarchist's Cookbook, copied from a Kaypro II running CP/M on a 5-1/4" floppy to an 8088XT running MS-DOS on a 30mb hard drive to an IBM PS/2 286 on 20mb hard drive to an Asus 486 on a 3.5" floppy to a 1.2gHz Pentium on a 100mb Zip drive to a Core 2 Duo on a CD-R to an i7 system on a 128gb solid state drive, which was finally backed up to a 1tb hard drive & archived, as there's a newer copy to carry around. That original file still opens just fine on any PC I've ever used (including mobile).

Also, I believe Linus Torvalds once said (talking about code, but it applies to anything sufficiently desirable) "Only wimps use tape backup, real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" I can certainly attest to that. I once made a torrent of all the Star Trek I'd accumulated (IE, all the Star Trek ever) & uploaded that. Two years later an old hard drive died & I was able to recover all 200+ gb in a little over 6 hours, simply by downloading my own torrent from other seeds. Thanks Trekkies! 07:22, 1 November 2017 (UTC)