Main Page

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search

Welcome to the explain xkcd wiki!
We have an explanation for all 2934 xkcd comics, and only 6 (0%) are incomplete. Help us finish them!

Latest comic

Go to this comic explanation

Bloom Filter
Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they're the right one you can never be sure.
Title text: Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they're the right one you can never be sure.


Ambox notice.png This explanation may be incomplete or incorrect: PROBABLY CREATED - Please change this comment when editing this page. Do NOT delete this tag too soon.

The comic is referring to a Bloom Filter, a data structure that is used for approximate membership queries and cardinality estimation using a bounded amount of memory. That is, after a series of objects are added to the bloom filter, given another object, the bloom filter can be queried to see if that object has already been added to it, with a chance of a false positive answer that depends on the size of the bloom filter. Or, the bloom filter can be queried for an approximate count of the objects that have been added to the bloom filter already.

A bloom filter uses a large bit array, and a number of hashing functions that produce indexes into this array. When a value is added to the set, it's hashed with each function, and the corresponding bits in the array are set to 1. To test if a value is in the set you hash it with all the functions, and check if all the bits are 1. If they are, the value may be in the set, but there can also be false positives because each hash collides with some other value in the set (assuming reasonable hash functions, a different element for each hash). But if any of the bits is 0, you know for sure the value is not in the set. The higher the ratio between the size of the bit array and the number of elements in the set, the smaller the false positive rate is (10 bits/element has about 1% false positives.

The joke in the comic is that Cueball has a 1-bit Bloom filter. When the set is empty, it accurately reports that any value is not in the set. But as soon as anything is added to the set, it has a very large false positive rate, since that single bit will be set and everything will hash to that index. Similarly the cardinality estimation is (correctly) 0 initially, but after the first addition the estimate will be "somewhere between 1 and infinity" which is not a terribly useful estimate.

There's also no point in having multiple hash functions for a 1-bit filter, since there's only one possible hash value.

The title text references how bloom filters are always accurate in saying that an element is not in the list (bloom filters are not correct), but you can never be sure if an element is actually in the list (when a bloom filter actually is correct), because of false positives.


Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.
[Ponytail holds out her hand to Cueball, who is holding a paper with a 1 on it.]
Ponytail: Does your set contai-
Cueball: Yeah, probably.
[Caption below the panel:]
One-Bit Bloom Filter

Is this out of date? Clicking here will fix that.

New here?

Last 7 days (Top 10)

Lots of people contribute to make this wiki a success. Many of the recent contributors, listed above, have just joined. You can do it too! Create your account here.

You can read a brief introduction about this wiki at explain xkcd. Feel free to sign up for an account and contribute to the wiki! We need explanations for comics, characters, themes and everything in between. If it is referenced in an xkcd web comic, it should be here.

  • There are incomplete explanations listed here. Feel free to help out by expanding them!


Don't be a jerk!

There are a lot of comics that don't have set-in-stone explanations; feel free to put multiple interpretations in the wiki page for each comic.

If you want to talk about a specific comic, use its discussion page.

Please only submit material directly related to (and helping everyone better understand) xkcd... and of course only submit material that can legally be posted (and freely edited). Off-topic or other inappropriate content is subject to removal or modification at admin discretion, and users who repeatedly post such content will be blocked.

If you need assistance from an admin, post a message to the Admin requests board.