Difference between revisions of "2347: Dependency"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
m (rv)
(Background and Examples: Revised the xz example)
(19 intermediate revisions by 10 users not shown)
Line 18: Line 18:
  
 
In 2014, the {{w|Heartbleed|Heartbleed bug}} revealed a significant portion of the internet was vulnerable to attack due to a bug in OpenSSL, a free and open-source library facilitating secure communication. One headline at the time demonstrated this comic in real life: [https://www.buzzfeed.com/chrisstokelwalker/the-internet-is-being-protected-by-two-guys-named-st "The Internet is Being Protected by Two Guys Named Steve"]. The aforementioned Steves were overworked, underfunded, and largely unknown volunteers whose efforts nevertheless underpinned the security of major websites throughout the world. Randall provided a concise, helpful explanation of the bug in [[1354: Heartbleed Explanation]].
 
In 2014, the {{w|Heartbleed|Heartbleed bug}} revealed a significant portion of the internet was vulnerable to attack due to a bug in OpenSSL, a free and open-source library facilitating secure communication. One headline at the time demonstrated this comic in real life: [https://www.buzzfeed.com/chrisstokelwalker/the-internet-is-being-protected-by-two-guys-named-st "The Internet is Being Protected by Two Guys Named Steve"]. The aforementioned Steves were overworked, underfunded, and largely unknown volunteers whose efforts nevertheless underpinned the security of major websites throughout the world. Randall provided a concise, helpful explanation of the bug in [[1354: Heartbleed Explanation]].
 +
 +
In 2020, the sole maintainer of the library [https://github.com/zloirock/core-js/blob/master/docs/2023-02-14-so-whats-next.md core-js], used by 75% of the top 100 websites to polyfill in new JavaScript features for old browsers and depended on by tons of popular libraries such as Babel, ran over two dark-clothed drunk pedestrians, one of which were laying down, at night in Russia while speeding in front of a crossing. He quit previous jobs to be able to maintain core-js, resulting in not having enough money to settle, and he was convicted for 18 months in an open prison ([https://ru.wikipedia.org/wiki/%D0%9A%D0%BE%D0%BB%D0%BE%D0%BD%D0%B8%D1%8F-%D0%BF%D0%BE%D1%81%D0%B5%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5 "колония-поселение"]).
 +
 +
Leading up to 2024, a user account going by the name Jia Tan gained the trust of ''{{w|XZ Utils|xz}}'''s (one and only) maintainer.  Over the course of 3 years, Jia Tan cleverly inserted a patch into ''xz'' that allows a remote user to gain root-level access via the common ssh protocol. This {{w|XZ_Utils_backdoor|comprised version of ''xz''}} was released in March 2024.  Another programmer, Andres Freund, found this backdoor before ''xz'' was widely distributed.
  
 
The current model of libraries and open-source development (topics which Randall has addressed extensively in the past) relies heavily on the free and continued dedication of unpaid hobbyists. Though some major projects such as Linux may be able to garner enough attention to build an organization, many smaller projects, which are in turn reused by larger projects, may only be maintained by one person, either the founder or another who has taken the torch. Maintaining libraries requires both extensive knowledge of the library itself as well as any use cases and the broader community around it, which usually is suited for maintainers who have spent years at the task, and thus cannot be easily replaced. Thus, there are many abandoned projects on the internet as people move on to greener pastures. Far from the days of backwards compatibility, that's usually not a problem, unless a project happens to be far up the dependency chain, as illustrated, in which case there may be a crisis down the road for both the developers and the users down the chain.
 
The current model of libraries and open-source development (topics which Randall has addressed extensively in the past) relies heavily on the free and continued dedication of unpaid hobbyists. Though some major projects such as Linux may be able to garner enough attention to build an organization, many smaller projects, which are in turn reused by larger projects, may only be maintained by one person, either the founder or another who has taken the torch. Maintaining libraries requires both extensive knowledge of the library itself as well as any use cases and the broader community around it, which usually is suited for maintainers who have spent years at the task, and thus cannot be easily replaced. Thus, there are many abandoned projects on the internet as people move on to greener pastures. Far from the days of backwards compatibility, that's usually not a problem, unless a project happens to be far up the dependency chain, as illustrated, in which case there may be a crisis down the road for both the developers and the users down the chain.
 +
 +
  
 
==Transcript==
 
==Transcript==

Revision as of 20:44, 28 April 2024

Dependency
Someday ImageMagick will finally break for good and we'll have a long period of scrambling as we try to reassemble civilization from the rubble.
Title text: Someday ImageMagick will finally break for good and we'll have a long period of scrambling as we try to reassemble civilization from the rubble.

Explanation

Technology architecture is often illustrated by a stack diagram, in which higher levels of rectangles indicate components that are dependent on components in lower levels. This is analogous to a physical tower of blocks, in which higher blocks rest on lower blocks. The stack in this cartoon bears a striking resemblance to a physical block tower, suggesting the danger that the tower will lose its balance when a critical piece is removed, in this case a piece near the bottom, labeled as being maintained by a single semi-anonymous person located somewhere relatively unimportant doing it for their own unknown reasons without fame or acknowledgement. The concept of balance is not intended to be communicated by a stack diagram, making this a humorously absurd extension of a well-known diagram style.

ImageMagick, mentioned in the title text, is a popular, standalone utility released in 1990 that is used for performing transformations between various graphics file formats, and various other transformations. While there are also numerous libraries and APIs for performing these tasks within larger programs, ImageMagick is so popular and easy to use that many programs use its API or just find it easier to shell out to ImageMagick to perform a necessary transformation. They therefore depend on ImageMagick, and would break if ImageMagick were to disappear.

Background and Examples

Taking code re-usability and modularization to its logical extreme has been a long-time tenet for programmers; programming began as a slow task on very memory-constrained systems, utilizing punch cards and days of delay waiting to discover a bug, so that reuse made things possible that otherwise wouldn't be. Once systems became small, fast, and able to hold a lot of data, the ability to provide higher and higher degrees of automation made reusable libraries a huge engine behind the development of technology. By outsourcing what would seem like basic functions, such as string manipulation, to other libraries, developers waste less time reinventing the wheel, so the philosophy goes (or as Beret Guy's business practices literally: 2140: Reinvent the Wheel), and thus many tiny packages, many of which contained only one function, became popular dependencies. This was especially true in Unix and Linux, where an entire program is commonly used for one small task, and programs exist to tie others together into powerful shell scripts.

Node.js (a platform for JavaScript) and Python are two modern ecosystems providing huge stashes of centralized libraries where developers of the world can come together to stand on the shoulders of all the small useful libraries they make for each other, to make new ones that are more and more powerful, and also more and more prone to sudden new unexpected bugs somewhere in the dependency chain. JavaScript was designed to be an easy to use front end scripting language, not a basic and core backend language as users of node.js's NPM package manager have made it be. While in theory, such a system may sound good for developers who would need to write and maintain fewer lines of code, systems which are highly optimized are also highly susceptible to rapid changes. For example, the famous left-pad incident in the NPM package manager left many major and minor web services which depended on it unable to build. A disgruntled developer unpublishing 11 lines of code was able to break everybody's build, because everyone was using it.

In 2014, the Heartbleed bug revealed a significant portion of the internet was vulnerable to attack due to a bug in OpenSSL, a free and open-source library facilitating secure communication. One headline at the time demonstrated this comic in real life: "The Internet is Being Protected by Two Guys Named Steve". The aforementioned Steves were overworked, underfunded, and largely unknown volunteers whose efforts nevertheless underpinned the security of major websites throughout the world. Randall provided a concise, helpful explanation of the bug in 1354: Heartbleed Explanation.

In 2020, the sole maintainer of the library core-js, used by 75% of the top 100 websites to polyfill in new JavaScript features for old browsers and depended on by tons of popular libraries such as Babel, ran over two dark-clothed drunk pedestrians, one of which were laying down, at night in Russia while speeding in front of a crossing. He quit previous jobs to be able to maintain core-js, resulting in not having enough money to settle, and he was convicted for 18 months in an open prison ("колония-поселение").

Leading up to 2024, a user account going by the name Jia Tan gained the trust of xz's (one and only) maintainer. Over the course of 3 years, Jia Tan cleverly inserted a patch into xz that allows a remote user to gain root-level access via the common ssh protocol. This comprised version of xz was released in March 2024. Another programmer, Andres Freund, found this backdoor before xz was widely distributed.

The current model of libraries and open-source development (topics which Randall has addressed extensively in the past) relies heavily on the free and continued dedication of unpaid hobbyists. Though some major projects such as Linux may be able to garner enough attention to build an organization, many smaller projects, which are in turn reused by larger projects, may only be maintained by one person, either the founder or another who has taken the torch. Maintaining libraries requires both extensive knowledge of the library itself as well as any use cases and the broader community around it, which usually is suited for maintainers who have spent years at the task, and thus cannot be easily replaced. Thus, there are many abandoned projects on the internet as people move on to greener pastures. Far from the days of backwards compatibility, that's usually not a problem, unless a project happens to be far up the dependency chain, as illustrated, in which case there may be a crisis down the road for both the developers and the users down the chain.


Transcript

[A tower of blocks is shown. The upper half consists of many tiny blocks balanced on top of one another to form smaller towers, labeled:]
All modern digital infrastructure
[The blocks rest on larger blocks lower down in the image, finally on a single large block. This is balanced on top of a set of blocks on the left, and on the right, a single tiny block placed on its side. This one is labeled:]
A project some random person in Nebraska has been thanklessly maintaining since 2003


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

I worked for the Linux Foundation on the Core Infrastructure Initiative supporting OpenSSL and other projects. The one that scared me was Expat the XML parser maintained by two people on alternate Sunday afternoons assuming no other distractions. We did get funding for a test suite. Joe Biden was a supporter of LF and CII and was going to host a fund raiser for us at the White House until a perverse result.141.101.98.222 22:46, 17 August 2020 (UTC)

Are you trying to tell me that Biden and Harris weren't for CALEA, DIETYBOUNCE, and similar backdoors just like all the feds? When will they discover how to stop sending money overseas? https://blog.risingstack.com/controlling-node-js-security-risk-npm-dependencies/ 172.69.34.18 07:37, 25 August 2020 (UTC)

In the explanation, is "far from the days of backwards compatibility" a reference to something specific? I thought quite a few things made today were still backwards compatible, or am I mistaken? Zowayix (talk) 18:26, 30 August 2020 (UTC)

Relevance of Imagemagick?

Could someone perhaps add to the explanation an explanation of how this applies to Imagemagick (as mentioned in the title text)? —108.162.219.174 22:58, 17 August 2020 (UTC)

I don't use it myself, but it is a very versatile standalone utility that does a lot through command-line (batched) processing or can be accessed through actual API interface (I use GIMP tools that way, in automation, when not using it directly as a manual interface, but I understand there's a lot of love out there for IM). There's potentially untold uses for that, hidden in the background of other applications. If it disappeared or changed in just the wrong way, could perhaps half the CAPTCHA dialogues suddenly break? Could a self-driving car company find its vehicles are suddenly blind? We might suddenly have so many fewer Doge memes! (Wow! Much up-to-datedness! So topical!).
In Randall's (or his characters') world, that is. In our world, I see someone mentioned Leftpad in the Explanation, which probably needs more Explanation (or else wikilinking) but is an interesting thing that actually happened in our world, albeit not quite armagg3don for society... 162.158.154.131 23:22, 17 August 2020 (UTC)
Imagemagick is the de-facto standard for Image processing. Since the 90's engineers were either adding support for new formats to ImageMagick or adding new language bindings for ImageMagick. This resulted in a single library that is available on almost every server and desktop platform and can read and write almost every image format. Using imageMagick is sometimes unwieldly. e.g. on nodeJS it actually spawns a sub-process to run imagemagick. But it is still the de-facto (and the only practical) choice in most cases.--Deepjoy (talk) 00:24, 18 August 2020 (UTC)
I would put emphasis on the "almost every image format" ... there are lot of alternative image libraries, but most only support handful of formats (often just jpeg, png and gif). Meanwhile, I suspect not even Gimp supports as many formats as ImageMagick ... and, of course, Gimp is not really usable as library OR for shelling-out. -- Hkmaly (talk) 23:43, 18 August 2020 (UTC)
The massive reliance on ImageMagick was recognized in 2002 by the developers of GraphicsMagick who needed to guarantee a stable version of ImageMagick and created their own fork. So while almost everyone uses and depends on ImageMagick (or think they are using ImageMagick when they are actually using GM) there is an actively maintained alternative. -- 162.158.159.48 17:10, 21 August 2020 (UTC)

from the late 2010s onwards?

I'm pretty sure re-use and modularization was a thing long before then. Maybe it got more popular in the 2010s, but it's been around since at least the '70s.

The ideal of reusable code libraries has been around for nearly ever, but except for some popular Fortran statistics libraries I don't think it achieved widespread achievement until much later, e.g. CPAN. Barmar (talk) 03:25, 18 August 2020 (UTC)p

The timezone database (https://en.wikipedia.org/wiki/Tz_database#History) has been around since 1986. libc in various forms has been around as long as C has. Reuse and modularity is a fundamental principle of software engineering, and not an invention of the last few years. I'd just remove any mention of date.

I think it's relatively recent that you can delete a file from one Web server and everything on the internet breaks. Dependencies are one thing, dependency on live updated resources is new. Because it's rather a bad idea. Incidentally overall... I think today's comic needs to be explained slower. Most people in the world are very unfamiliar with these concepts. Although coronavirus responses have taught a lot of us about "supply chains" that put stuff into shops for us to buy. Robert Carnegie [email protected] 141.101.69.87 10:18, 18 August 2020 (UTC)
While libc in various forms has been around as long as C has, it was never SINGULAR. Every version of C compiler had it's own version of C library maintained by different people. Even now there are alternatives to GNU libc. The timezone database might be better example. Also, reuse and modularity is fundamental principle, but reusing code maintained by someone else in project with bigger staff than that of such code is relatively recent. -- Hkmaly (talk) 23:48, 18 August 2020 (UTC)

German Television referencing this comic to illustrate the Log4j dependency (at around 1:11)

This has happened before

It may be worth mentioning a case where this actually happened, like https://www.theregister.com/2016/03/23/npm_left_pad_chaos/ 141.101.97.101 01:03, 18 August 2020 (UTC)

That was only a problem for those who tried to compile against network versions, instead of having a local copy. One of the dumbest and laziest things you can do as a programmer. Not to mention that you could just copy the code directly into one of your files or just writing your own routine. SDSpivey (talk) 02:04, 20 August 2020 (UTC)
Speaking as a SecDevOps person, another risky thing programmers do out of ignorance is host static local copies of code repositories without a good update and security review plan to make sure the static copy gets regular testing and updates as security and bugfixes are published to the source. Still another risk is writing your own library to reinvent the wheel and making the same mistakes the maintainer of the wheel solved six major versions ago. I would be careful throwing terms like "dumb" and "lazy" around. Every one of those solutions, including your proposals, *also* can be risky if implemented without proper expertise and forethought. There is no 'best' practice here, just risks and advantages that make it so that there is no single one-size-fits-all solution 108.162.212.145 13:30, 27 August 2020 (UTC)

One particularly big risk that instantly came to mind is the timezone database, which is maintained by volunteers yet underpins basically everything: https://en.wikipedia.org/wiki/Tz_database#Maintenance

I remember hearing about this a few years back at a Linux Foundation conference - the NTP daemon was underfunded (as I recall) and the one person maintaining it was struggling to pay bills. Losing NTP breaks an awful lot of things.... 162.158.107.167 19:48, 18 August 2020 (UTC)

I see this was problem in 2016 ... I'm not able to find any update on the situation ... -- Hkmaly (talk) 00:10, 19 August 2020 (UTC)
Nice long interview with Harlan Stenn, author/maintainer of NTP. RandalSchwartz (talk) 05:56, 19 August 2020 (UTC)
I work with a E100k robot that keeps breaking on account of [Atomic Parsley]. Everyone is very amused at this Kev (talk) 13:32, 20 August 2020 (UTC)

Some random person in Nebraska

Is the reference to a random person in Nebraska totally arbitrary, or is it a reference to someone in particular?

Also, it would be good to have examples of heavily used projects with very small (especially one person) maintainer teams. OpenSSL definitely comes to mind, from what I have read. Stevage (talk) 01:49, 18 August 2020 (UTC)

Nebraska came up in 1667, "Algorithms" as well.162.158.79.33 02:22, 18 August 2020 (UTC)

Nebraska is... Well, I'm sure some Nebraskonians might have a more fully-fleshed out and accurate opinion of its subtleties, depth of culture(s?) and Deity-given geographic artisanship but viewed from further afield it is one of the contenders for "miles and miles of not much going on", or similar, peopled by people that largely live within that promise.
It may be just a meme of such a generality, as a brief look at a list of people from Nebraska tends to support the hypothesis that the ones who became significant (Astair, Brando, Carson...) probably did so only once they left.
OTOH, there are (at least) four computing pioneers/developers mentioned among them, creator or authors of significant 'products', and maybe one of these matches the (intellectual) dependency meme quite well - other than being written in Massachusetts. Or this one, though that might have been LA-baked, maybe?
I learnt some interesting things when investigating this issue, just now. Cheers! 108.162.229.142 09:54, 18 August 2020 (UTC)
I feel like Nebraska is mentioned just because ot's the.most flyover-sounding flyover state name? Or is it actually home to some well known library maintainer? -- 162.158.119.199 (talk) (please sign your comments with ~~~~)
Another good example might be left-pad. It actually caused a big issue in 2016 when the developer took it offline and a whole bunch of projects and websites broke. Numbermaniac (talk) 07:41, 22 August 2020 (UTC)

Microservices reference

Microservices reference is not related to this comic, as ImageMagick is monolith application. Also microservices are way of operating and deploying web services, not utility apps. 162.158.103.177 07:56, 18 August 2020 (UTC)

ImageMagick is a library. -- Hkmaly (talk) 23:50, 18 August 2020 (UTC)

The Thirty Million Line Problem

See The Thirty Million Line Problem. Randall's drawing looks like a house of cards on the verge of collapse. In the video, Casey talks about how the lack of a "hardware ISA" causes critical software (like OS'es and browsers) to bloat like crazy (a "hardware ISA" would be a standard for how hardware works, just like the x86 ISA is a standard for how an x86 CPU works, that both AMD and Intel agrees on). Also, he mentions how fragile and broken software is due to this "Thirty Million Line" bloat. 162.158.107.167 19:48, 18 August 2020 (UTC)

Based on related discussion, that's a VERY bad video: he may have a point, but it takes VERY long time before he gets to it. I'm not going to watch it that long myself. -- Hkmaly (talk) 00:03, 19 August 2020 (UTC)
This reminds me of that old joke: If carpenters built buildings the same way programmers made programs, the first woodpecker that came along would destroy civilization. 162.158.106.160 (talk) 14:29, 19 August 2020 (please sign your comments with ~~~~)
(Known as "Weinberg's Law" from 1971 "Psychology of Computer Programming", G.M. Weinberg [1] ) 172.68.126.139 (talk) 23:37, 20 January 2024 (please sign your comments with ~~~~)
I thought the drawing looks more like the Jenga game, except the components are not simple rectangles. Barmar (talk) 16:31, 20 August 2020 (UTC)

"Famous" Left Pad Incident

The "famous" left-pad incident in JavaScript's package manager could use some elaboration for those of us for which it isn't. 162.158.107.89 02:42, 19 August 2020 (UTC)


  • Aaaaand that's why i'll never use kik 172.70.251.108 09:47, 17 June 2022 (UTC)
  • I feel blaming the "disgruntled" (a loaded term in itself) open-source developer for withdrawing from an "open source" platform that screwed him over in favor of corporate interests is a misrepresentation of the incident. 172.70.38.26 15:18, 26 June 2024 (UTC)

Log4j Zero-Day Vulnerability (CVE-2021-44228) Incident

On December 9, 2021, security researchers discovered a flaw in the code of a software library used for logging. The software library, Log4j, is built on a popular coding language, Java, that has widespread use in other software and applications used worldwide. This flaw in Log4j is estimated to be present in over 100 million instances globally. If exploited, could permit a remote attacker to execute arbitrary code on vulnerable systems. This library had one maintainer who lived in the outback.

Loadsharers

There is an initiative by Eric Raymond targeted specifically to mitigate this problem.
Article: https://www.linuxjournal.com/content/loadsharers-funding-load-bearing-internet-person
Website: https://esr.gitlab.io/loadsharers/   — Smartchair (talk) 16:20, 19 August 2020 (UTC)


NTP

The Network Time Protocol is also a great example. --Slashme (talk) 21:50, 19 August 2020 (UTC)


Explain “maintenance”

What this article does a poor job of is explaining what software “maintenance” is. Software doesn’t usually disappear (despite the several cases mentioned in the article which are kind of beside the point). It also doesn’t rust or wear out like a car. But software usually needs to be continuously updated to fix security vulnerabilities or to keep it compatible with other software. Also it can get new features or bug fixes. And if the guy in Nebraska doesn’t do a good job of it, everyone has a problem.

Also worth mentioning is how the comic highlights the absurdity of this anarchic communism. Neither users (capitalism) nor the government (socialism) is paying these people. And somehow it works 95% of the time. Except when it doesn’t.

Duplicity

Saw this cartoon and immediately thought of the backup software Duplicity, which comes with Ubuntu (using Deja-Dup interface). Big shout-out to Kenneth Loafman for keeping it running! 108.162.238.124 16:06, 7 February 2024 (UTC)

xz Backdoor

The xz backdoor has brought up an even more disturbing ramification of this situation, which is that a malicious entity (e.g. a nation-state) can create a persona (or multiple), build trust with the random guy maintaining the library since 2003, eventually take over the project, then implant a backdoor that targets core software like OpenSSH. The only reason we just avoided one of the largest cyber incidents in history is because one guy running Debian Sid noticed sshd using a bit more CPU than normal while he was benchmarking something completely unrelated. The implications here are terrifying. 172.70.210.131 20:02, 30 March 2024 (UTC)

I wonder how many times that has already happened. Not *if*, but how many times. See also the title text of xkcd 2057 (Internal Monologues). 172.70.46.34 14:24, 8 May 2024 (UTC)

Concept of vulnerability thru dependency rests mostly on this cartoon

On July 11th, 2024, a version of this cartoon re-labelling "all modern digital infrastructure" to "every conversation about dependencies since 2020", and in which the lynchpin block is now labelled "this fucking comic" first appeared on the fediverse social networks. This variant turns the cartoon into a self-referencing one, where conceiving the vulnerability thru dependency rests mostly on the cartoon that illustrates this very concept. This mise en abyme was soon illustrated by a further variant of the cartoon making use of the Droste effect. --172.71.130.67 21:27, 11 July 2024 (UTC)