Difference between revisions of "2304: Preprint"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Removed the bit about LaTeX because it's not mentioned in the comic and neither necessary nor useful for the explanation. Also, LaTeX isn't a file format.)
Line 14: Line 14:
 
[[Randall]] suggests that, instead of explaining that the paper was in {{w|preprint}}, or unpublished or submitted to a preprint server and not peer-reviewed, the newscaster could simply say it was a {{w|PDF}}. PDF (Portable Document Format) is a file format for documents developed by Adobe to be used independent of application software, hardware and operating systems.
 
[[Randall]] suggests that, instead of explaining that the paper was in {{w|preprint}}, or unpublished or submitted to a preprint server and not peer-reviewed, the newscaster could simply say it was a {{w|PDF}}. PDF (Portable Document Format) is a file format for documents developed by Adobe to be used independent of application software, hardware and operating systems.
 
Randall proceeds to lists several benefits of using "PDF":  
 
Randall proceeds to lists several benefits of using "PDF":  
* The use of terms such as "preprint" makes statement about its publication status, which might be based on inaccurate information or even be in the process of changing as the news goes out; in contrast, proclaiming it to be a PDF is an unambiguously factual statement.
+
* The use of terms such as "preprint" makes statement about its publication status, which might be based on inaccurate information or even be in the process of changing as the news goes out; in contrast, proclaiming it to be a PDF is an unambiguously factual statement.  Additionally, "preprint" and "peer review" and related terminology are not familiar to most people, who are not academics.
 
* Referring to the PDF directly also prevents individuals from making assumptions that the one responsible knows and has verified what they're doing - or, in contrast, that the information is automatically false based on the grounds that it hasn't yet been officially published.
 
* Referring to the PDF directly also prevents individuals from making assumptions that the one responsible knows and has verified what they're doing - or, in contrast, that the information is automatically false based on the grounds that it hasn't yet been officially published.
 
* The comic finishes with a jab at the PDF format itself, proclaiming that no ordinary person would ''voluntarily'' choose a PDF file as their medium of communication.  Ordinary people use the default file format of whatever word processor or text editor they use, but PDF files are not very convenient to edit, so they're generally only used for final versions of documents that are ready to print or distribute, following a dedicated export or conversion process.
 
* The comic finishes with a jab at the PDF format itself, proclaiming that no ordinary person would ''voluntarily'' choose a PDF file as their medium of communication.  Ordinary people use the default file format of whatever word processor or text editor they use, but PDF files are not very convenient to edit, so they're generally only used for final versions of documents that are ready to print or distribute, following a dedicated export or conversion process.

Revision as of 17:54, 9 May 2020

Preprint
DOWNSIDES: Adobe people may periodically email your newsroom to ask you to call it an 'Adobe® PDF document,' but they'll reverse course once they learn how sarcastically you can pronounce the registered trademark symbol.
Title text: DOWNSIDES: Adobe people may periodically email your newsroom to ask you to call it an 'Adobe® PDF document,' but they'll reverse course once they learn how sarcastically you can pronounce the registered trademark symbol.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: Created by an ADOBE® PDF DOCUMENT. Explain the different terminology used by the newscaster, expand upon benefit points. Do NOT delete this tag too soon.
If you can address this issue, please edit the page! Thanks.

This comic is about how media reports non-peer-reviewed research papers. The newscaster depicted is attempting to report breaking news based on information in a study; however, the study in question has not been formally published. This leads to uncertainty on the part of either the newscaster or her scriptwriters as they try to determine how to refer to this study, represented here by alternative introduction lines being scribbled out.

Randall suggests that, instead of explaining that the paper was in preprint, or unpublished or submitted to a preprint server and not peer-reviewed, the newscaster could simply say it was a PDF. PDF (Portable Document Format) is a file format for documents developed by Adobe to be used independent of application software, hardware and operating systems. Randall proceeds to lists several benefits of using "PDF":

  • The use of terms such as "preprint" makes statement about its publication status, which might be based on inaccurate information or even be in the process of changing as the news goes out; in contrast, proclaiming it to be a PDF is an unambiguously factual statement. Additionally, "preprint" and "peer review" and related terminology are not familiar to most people, who are not academics.
  • Referring to the PDF directly also prevents individuals from making assumptions that the one responsible knows and has verified what they're doing - or, in contrast, that the information is automatically false based on the grounds that it hasn't yet been officially published.
  • The comic finishes with a jab at the PDF format itself, proclaiming that no ordinary person would voluntarily choose a PDF file as their medium of communication. Ordinary people use the default file format of whatever word processor or text editor they use, but PDF files are not very convenient to edit, so they're generally only used for final versions of documents that are ready to print or distribute, following a dedicated export or conversion process.

The title text makes fun of what is incorrectly believed to be the official name of the PDF format; PDF is now an open international standard (ISO 32000-1), and the only PDF files that are "Adobe Acrobat files" or "Adobe PDF" files are those created using Adobe Systems' software. Further, Adobe does not use the ® designation in conjunction with PDF. (See Adobe Trademark Guidelines, 1 Nov. 2014)

Since so many applications can create and even edit PDF files, implying a connection with Adobe every time someone talks about one is preposterous, and one could sarcastically pronounce the registered trademark symbol to show contempt for the fact that it is a registered trademark.

Transcript

Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.

[A newscaster is sitting at a desk. Several of her opening sentences are scribbled over, indicating revisions to her script.]

Newscaster (scribbled out): According to a new preprint…
Newscaster (scribbled out): …An unpublished study…
Newscaster (scribbled out): According to a new paper uploaded to a preprint server but which has not undergone peer review…
Newscaster: According to a new PDF…
Inset graphic: Breaking NEWS
Caption: Benefits of just saying "a PDF":
  • Avoids implications about publication status
  • Immediately raises questions about author(s)
  • Still implies "this document was probably prepared by a professional, because no normal human trying to communicate in 2020 would choose this ridiculous format."


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

I was going to mention the TeX format(/family), but someone got in there before me. So how about if it's a .wp4 document? ;) 141.101.107.84 01:40, 9 May 2020 (UTC)

But now the LaTeX reference is removed, anyway. 162.158.158.163 16:14, 9 May 2020 (UTC)

Why is this comic labeled as a Saturday comic? I don't know what timezone you use, but it was posted Friday, well before midnight UTC. 172.69.69.204 02:15, 9 May 2020 (UTC)

I'm pretty sure that's just an error. The date for the comic in the archive is "2020-5-8", which is today (Friday). Comic #2303 correctly has the "Wednesday comic" category, and the archive lists its date as 2020-5-6 (which is Wednesday). ...And I've fixed it now. The category is automatically generated based on the date listed in the Template:Comic infobox at the top of the article; someone incorrectly entered it as "May 9, 2020" instead of "May 8, 2020". --V2Blast (talk) 02:53, 9 May 2020 (UTC)
'Someone' == DgbrtBOT; and thus probably based off the time() it thinks it is, upon autocreating the base article, rather than any human erring. Depending on the home system's timezone, it probably was Saturday for DB, if not for Randall. Maybe an offset/correction/relocali(s|z)ation should be put into the code, but it seems to normally work out Ok and this comic might have been just over a threshhold... (edit: Wiki time in history seems to be UTC, for me at least - I'm in UTC+1/BST but as an IP-editor I haven't made any setting changes to my personal login that I don't have. DgbrtBOT piped up at 22:48, which at UTC+2 or more (Central Europe Daylight Savings, which matches what I recall of knowing about that entity, or anywhere more Easterly) would have been 'tomorrow', and I didn't spot the new comic until at least those dozen minutes after that which occured before my own clocks ticked past midnight. Given that Randall is (usually?) In UTC-5, or UTC-4 when daylight savings is established, maybe Dgbrt needs a special offset of -6 hours (or go directly via localtime() with the best current known Munroevian locale specified) in calculating things. Or we can let the community smooth these things out like we just did when a possible late-evening update causes this to be an issue?) 162.158.155.62 03:17, 9 May 2020 (UTC)

Is "sarcastically pronouncing the registered trademark symbol" meant as pronouncing it "arr" in the way pirates talk? Bischoff (talk) 15:00, 9 May 2020 (UTC)

I would expect professional news anchors can come with something even more sarcastic. -- Hkmaly (talk) 01:08, 10 May 2020 (UTC)
Perhaps they'd go with something like "R in a circle" or "Circled R" (pronounced "Circledar"). PotatoGod (talk) 17:27, 10 May 2020 (UTC)
Perhaps we can use a little of both and create a new standard for sarcastically pronouncing it as "circled, arrr!" Iggynelix (talk) 12:05, 11 May 2020 (UTC)
ReGiStErEd TrAdEmArK! 108.162.216.128 20:34, 11 May 2020 (UTC)
I thought it was meant to be read as "Ado-bear" - but then again, English is not my first language:)

In 2020 I use pdf to put documents with tables onto a website, because html exports from editors are voluminous and brittle. 162.158.6.118 10:32, 10 May 2020 (UTC)

As someone who regularly takes tables from PDF in order to put them into spreadsheets for further use, some people don't do me any favours by that method. Among the problems, if the table setter didn't pay attention to the column widths then the copied-out text of two adjacent cells that don't appear to overlap each other will interlace at a character level and need editing back to separate entites. And then there's the inconsistencies of Header rows atop the table and/or atop the next newpage the table splits over. I could run a quick script on (X)HTML tables, and get it perfectly for my needs. CSV, or even TabSV, would actually be my preferred transport format (i.e. no format, just pure layout without even spanned/merged cells, and I can redo what needs redoing on the final redo), but I can't ever seem to get them to do that for me despite having the data almost in that form prior to the PDFing... Grrrr. 162.158.159.142 11:30, 10 May 2020 (UTC)
I feel your pain. I receive pdf documents from a financial professional, where an A4 landscape page seems to have about five two-column-wide tables side-by-side, and I'm still deciding what kind of manipulation to do, to get it into CSV and do some analysis. 162.158.6.232 10:21, 12 May 2020 (UTC)
If the PDFing hasn't ruined the groupings/precedence, like it often does, try mouse-selecting each table, to copy and paste into notepad or equivalent. Sometimes that works well enough to create tab delimited elements (other times, it line-feeds between columns as well as rows, but still can be reconstructed) and then that'll paste into a spreadsheet (or be parsable with a script) better than any Paste Special (using "no textformat" options?) straight into a grid. Sometimes you need to fiddle a bit with the notepad text, but depending on the data that might be doable with a few choice find+replace runs, perhaps upon consecutive table-pastings to save you time repeating yourself. Or not. 162.158.158.163 00:08, 13 May 2020 (UTC)

I think Randall's last point (no unprofessional humans use PDFs in 2020) is very wrong. Especially due to the coronavirus, all college classes have switched to online assignment submissions, and the teachers only accept PDF submissions (although, annoyingly, they give the original template files in .doc format!) I would NOT trust random college student's assignment submissions as a reputable information source! PotatoGod (talk) 17:22, 10 May 2020 (UTC)