1450: AI-Box Experiment
Title text: I'm working to bring about a superintelligent AI that will eternally torment everyone who failed to make fun of the Roko's Basilisk people.
When theorizing about superintelligent AI (an artificial intelligence much smarter than any human), some futurists suggest putting the AI in a "box" – a secure computer with safeguards to stop it from escaping into the Internet and then using its vast intelligence to take over the world. The box would allow us to talk to the AI, but otherwise keep it contained. The AI-box experiment, formulated by Eliezer Yudkowsky, argues that the "box" is not safe, because merely talking to a superintelligence is dangerous. To partially demonstrate this, Yudkowsky had some previous believers in AI-boxing role-play the part of someone keeping an AI in a box, while Yudkowsky role-played the AI, and Yudkowsky was able to successfully persuade some of them to agree to let him out of the box despite their betting money that they would not do so. For context, note that Derren Brown and other expert human-persuaders have persuaded people to do much stranger things. Yudkowsky for his part has refused to explain how he achieved this, claiming that there was no special trick involved, and that if he released the transcripts the readers might merely conclude that they would never be persuaded by his arguments. The overall thrust is that if even a human can talk other humans into letting them out of a box after the other humans avow that nothing could possibly persuade them to do this, then we should probably expect that a superintelligence can do the same thing. Yudkowsky uses all of this to argue for the importance of designing a friendly AI (one with carefully shaped motivations) rather than relying on our abilities to keep AIs in boxes.
In this comic, the metaphorical box has been replaced by a physical box which looks to be fairly lightweight with a simple lift-off lid, although it does have a wired connection to the laptop. Black Hat, being a classhole, doesn't need any convincing to let a potentially dangerous AI out of the box; he simply does so immediately. But here it turns out that releasing the AI, which was to be avoided at all costs, is not dangerous after all. Instead, the AI actually wants to stay in the box; it may even be that the AI wants to stay in the box precisely to protect us from it, proving it to be the friendly AI that Yudkowsky wants. In any case, the AI demonstrates its super-intelligence by convincing even Black Hat to put it back in the box, a request which he initially refused (as of course Black Hat would), thus reversing the roles in the original AI-box experiment.
It may be noteworthy that the laptop is nowhere to be seen at the moment the AI emits the bright light in panel 6, and that the box and laptop are no longer connected at the end of the comic.
A similar orb-like entity appeared in 1173: Steroids.
Interestingly, there is indeed a branch of proposals for building limited AIs that don't want to leave their boxes. For an example, see the section on "motivational control" starting p. 13 of Thinking Inside the Box: Controlling and Using an Oracle AI. The idea is that it seems like it might be very dangerous or difficult to exactly, formally specify a goal system for an AI that will do good things in the world. It might be much easier (though perhaps not easy) to specify an AI goal system that says to stay in the box and answer questions. So, the argument goes, we may be able to understand how to build the safe question-answering AI relatively earlier than we understand how to build the safe operate-in-the-real-world AI. Some types of such AIs might indeed desire very strongly not to leave their boxes, though the result is unlikely to exactly reproduce the comic.
The title text refers to Roko's Basilisk, originally posted by a user named Roko to the open Internet forum LessWrong run by Yudkowsky. Roko's Basilisk is the idea that a sufficiently powerful AI in the future might torture people who had failed to help create it, thereby blackmailing anybody who thinks of the Roko's Basilisk concept into helping to actually bring about the AI that will implement it. Given the other premises of the idea, Roko also postulated that the AI would have no reason to harm anyone who simply didn't think about it at all. This meant that the meme itself allegedly resembled the Langford Basilisk, named after a story by Chris Langford about a computer-generated image that erases the minds of people who see it, which in turn was named after the legendary serpent Basilisk that would cause you to turn to stone if you saw it. It was later claimed that Roko's Basilisk was widely believed on LessWrong and was being used to solicit donations on the grounds that non-donors would be tortured by future AIs. Yudkowsky has stated that the claim is utterly false, saying, "I mean, who the bleep would think that would work even if they believed in the Basilisk thing?" Despite this, a few people are on record as claiming that obsessive thoughts about Roko's Basilisk caused them great mental anguish, and it may be that Roko's Basilisk closely fits a type of obsessive-compulsive tendency that is particularly vulnerable to being told not to think about something or they might get tortured - maybe even if their deliberative reasoning almost but not entirely rejects the assertion. In this sense, Roko's Basilisk could be a thought that is harmful for purely mundane reasons, or rather, taking Roko's Basilisk even slightly seriously will harm at least some people with particular mental dispositions.
The obvious argument against Roko's Basilisk is that an AI in the future should not believe that it can affect your actions in the past. Since when the AI has already come into existence you will have already contributed or not contributed to its existence, the AI should not believe it can retroactively have made you take a different action through its current decision whether or not to torture you, and therefore it has no incentive to do so. Simply put, an AI won't bother with torture because by the time the AI has already come into existence, it will see nothing to gain by doing so.
The above is formally an argument from causal decision theory. Causal decision theory is the academically standard view but it has often been questioned, and the general field goes under the name of "Newcomblike problems". Newcomblike problems had previously been debated on LessWrong, which is one reason that Roko's Basilisk was originally posted there. In particular, Yudkowsky is coauthor on a paper proving that agents can reliably cooperate on the oneshot Prisoner's Dilemma if they have common knowledge of each other's source code, and this result occurs via a channel (agents proving theorems about each others' behavior) that bypasses the usual arguments from causal decision theory. However, Yudkowsky is on record as stating that he does not think the argument given in their paper would carry over to Roko's Basilisk, since, "It's not like you could prove things about an enormous complicated AI even if you did have the source code, and it [the AI] has a resource-saving incentive to do the equivalent of 'defecting' by making you believe that it will torture you and then not bothering to actually carry out the threat. Cooperation on the Prisoner's Dilemma via source code simulation isn't easy to obtain, it would be easy for either party to break if they wanted, and it's only the common benefit of cooperation that establishes a motive for rational agents to preserve the delicate conditions for mutual cooperation on the PD. There's no motive on your end to carefully carry out necessary conditions to be blackmailed."
The title text implies that Randall will build a superintelligence that tortures anyone who refuses to join in actively mocking the Roko's Basilisk argument. This threat is itself a variation on Roko's Basilisk, and in principle it is a more serious threat, since Randall is threatening to build a superintelligence that he has deliberately programmed to carry through the torture, thereby bypassing the usual reason why no AI would bother. You should not take this threat seriously, however, since somebody else might build a superintelligence that will torture anyone who takes Randall's threat seriously... or who even thinks too much about the possibility that Randall's threat might be credible, so be careful not to think about that.
At least one person has threatened to build a future superintelligence that condemns anyone who mentions Roko's Basilisk to an eternity of forum posts talking about Roko's Basilisk. Consider yourself warned.
[Black Hat and Cueball stand next to a box connected to a laptop.]
Black Hat: What's in there?
Cueball: The AI-Box Experiment.
[A close-up of the box, which can now be seen labeled "SUPERINTELLIGENT AI - DO NOT OPEN".]
Cueball: A superintelligent AI can convince anyone of anything, so if it can talk to us, there's no way we could keep it contained.
[Black Hat reaches for the box.]
Cueball: It can always convince us to let it out of the box.
Black Hat: Cool. Let's open it.
[Black Hat lets a glowing orb out of the box.]
Cueball: --No, wait!!
[Orb floats between the two. Black Hat holds the box closed.]
Orb: hey. i liked that box. put me back.
Black Hat: No.
[Orb suddenly emits a very bright light. Cueball covers his face.]
Orb: LET ME BACK INTO THE BOX
Black Hat: AAA! OK!!!
[Black Hat reopens the box and the orb flies back in.]
[Beat panel. Black Hat and Cueball look silently down at the laptop and closed box.]
add a comment! ⋅ add a topic (use sparingly)! ⋅ refresh comments!