explain xkcd:Crap

Jump to: navigation, search

The 'crap' incident On May 3rd, 2022, a vandal started attacking the wiki with a bot that replaced entire pages with the word "crap" repeated several thousand times. Many discussions were created in the "Admin requests" portal to organise the cleanup of the wiki until the vandal was stopped by Davidy22.

Original code[edit]

It initially used the account X. K. C. D., which had previously been used for vandalism but was never blocked. The account was old enough, and had made enough edits, to be autoconfirmed, allowing it to edit pages without having to complete CAPTCHAs. Several accounts have been identified and blocked since, some after becoming autoconfirmed and running the bot script. The bot is implemented as a user common.js JavaScript page that has had several versions, some obfuscated. One such script is below:

var cssSelectorForEditTextBox = 'textarea'
var cssSelectorForSaveChangesButton = '#wpSave'
var cssSelectorForMainContent = '#mw-content-text'
var cssSelectorForEditLink = '#ca-edit > span:nth-child(1) > a:nth-child(1)'

function vandalize() {
    if (window.location.href.endsWith('edit')) {
        // The current page is an "edit" page
        // Crap it
        document.querySelector(cssSelectorForEditTextBox).value = 'crap '.repeat(5000);
        document.querySelector(cssSelectorForSaveChangesButton).click()
    } else if (document.querySelector(cssSelectorForMainContent).textContent.indexOf('t') == -1) {
        // The current page is a regular "read" page, but it has already been crapped
        // Go to a random uncrapped page
        window.location.href = 'https://www.explainxkcd.com/wiki/index.php/Special:RandomInCategory/All_Comics'
    } else {
        // The current page is a regular "read" page, and it has not yet been crapped
        // Go to its "edit" page so it can be crapped
        document.querySelector(cssSelectorForEditLink).click()
    }
}
setTimeout(vandalize, 500);

While this code is a reconstruction of the original code, it has been tested (non-destructively) and found to be equivalent in functionality. Other than the CSS selectors, it was mostly rewritten from memory by a contributor who had read the original common.js file. (That file has since been deleted.) The CSS selectors are educated guesses, but they seem to be equivalent to the original ones. Also, the code has been reformatted for readability; the original had everything on two or three lines, had the CSS selectors inline with the rest of the code (rather than in variables), and had no comments. Newer versions appear to be based on this reconstructed code, rather than the original. It is not clear why the crapper did this.

By the time the original bot was blocked, about 80% of comic explanations had been crapped. Davidy22 blocked the bot and deleted its common.js page; this immediately stopped the vandalism attack and allowed the cleanup to begin. Later attacks/recoveries have followed this same pattern, except administrators have responded faster, reducing the number of pages crapped in each attack.

Anti-re-crapping code[edit]

Most pre-block efforts at stopping the vandalism involved the line of code in the bot that checked if the page had already been crapped and skipped over it if it had (the first else if line). Originally, this line looked something like the following: } else if (document.querySelector(cssSelectorForMainContent).textContent.startsWith('crap')) { This line allowed a page to be easily protected by adding the following code at the start of the page: <div style="display: none">crap</div>. This added the word "crap" to the page in a way that the bot could see but readers could not, non-disruptively preventing the page from being crapped. The vandal later changed the line to } else if (document.querySelector(cssSelectorForMainContent).textContent.startsWith('crap crap')) { to bypass the anti-crap code, but this was easily thwarted by changing the protection code to <div style="display: none">crap crap</div>. The final version of the bot (shown in the main code block above) checked for crappedness by seeing if the text contained a lowercase t; if it contained a t it was known to be uncrapped (and in need of crapping, according to the vandal); if it did not contain a t, it was assumed to have already been crapped. This worked very "well," because the word "crap" does not contain a t, but the vast majority of uncrapped articles do. With this code, a page could only be protected by removing all t's, which is not feasible for the vast majority of pages. After this change, the only way to hinder the bot (until it was blocked by an admin) was to revert its edits as fast as possible.

The wiki's MediaWiki:Common.js now prevents edits which set a large portion of the article to a single repeated word.

Other "improvements"[edit]

The vandal changed the delay after page load (the number in the last line; 500 in the last version with a delay) several times, before changing it to load with document.body.onload, then finally removing the wrapper and making it run as soon as the JavaScript loads.

The code was briefly obfuscated, then de-obfuscated after it crapped itself. The only functional difference was checking for a lowercase 'e' instead of a 't'.

Conclusion[edit]

In the end, the admin Davidy22 came back from Reddit to block the vandalising accounts, remove the script, and revert the changes.