Editing Talk:936: Password Strength

Jump to: navigation, search
Ambox notice.png Please sign your posts with ~~~~

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 83: Line 83:
 
* (Secondly: The "punctuation" should have 5, not 4 bits of entropy. There are 32 (2^5) ASCII punctuation characters (POSIX class [:punct:]). But I assume this is a lapse.)
 
* (Secondly: The "punctuation" should have 5, not 4 bits of entropy. There are 32 (2^5) ASCII punctuation characters (POSIX class [:punct:]). But I assume this is a lapse.)
 
Can someone enlighten me? --[[Special:Contributions/162.158.91.236|162.158.91.236]] 17:31, 19 September 2015 (UTC)
 
Can someone enlighten me? --[[Special:Contributions/162.158.91.236|162.158.91.236]] 17:31, 19 September 2015 (UTC)
:I have missed the sentence "Randall assumes only the 16 most common characters are used in practice (4 bits)". Hm. There is a huge list with real world passwords out there, leaking from RockYou in 2009. After some processing to remove passwords containing characters that are not printable ASCII characters (ñ, £, ๅ, NBSP, EOT, ...), the list contains about 14329849 unique passwords from about 32585010 accounts (there are some garbage "passwords" like HTML code fragments). The following are the number of accounts using a password containing a particular printable character (one or more tokens of a particular type):
+
:I have missed the sentence "Randall assumes only the 16 most common characters are used in practice (4 bits)". Hm. There is a huge list with real world passwords out there, leaking from RockYou in 2009. After some processing to remove UTF-8 passwords, the list contained about 14329849 unique passwords from about 32585010 accounts. The following are the number of accounts using a password containing some (ASCII) punctuation or space characters:
 
  <nowiki>
 
  <nowiki>
 
226673 .
 
226673 .
Line 91: Line 91:
 
104224 @
 
104224 @
 
95237 *
 
95237 *
92802   (space)
+
92802 (space)
 
60002 #
 
60002 #
 
36522 /
 
36522 /
Line 118: Line 118:
 
939 }
 
939 }
 
502 |
 
502 |
 
(NB: 1222815 accounts were using a password containing at least one of these.)
 
 
</nowiki>
 
</nowiki>
 
:Sorry, I have no "citation". But you can play with the leaked RockYou password list yourself. Here is a way to reach that playground:
 
:Sorry, I have no "citation". But you can play with the leaked RockYou password list yourself. Here is a way to reach that playground:
Line 125: Line 123:
 
$ # Download the compressed list (57 MiB; I have no idea what "skullsecurity"
 
$ # Download the compressed list (57 MiB; I have no idea what "skullsecurity"
 
$ # is, it was simply the first find and I assume it's the said list):
 
$ # is, it was simply the first find and I assume it's the said list):
$ wget http://downloads.skullsecurity.org/passwords/rockyou-withcount.txt.bz2
+
$ wget 'http://downloads.skullsecurity.org/passwords/rockyou-withcount.txt.bz2'
  
$ # Decompress the list (243 MiB), or, to speak more exact, it's a table:
+
$ # Decompress the list (243 MiB), or, more exact spoken, it's a table:
 
$ bzip2 -dk rockyou-withcount.txt.bz2
 
$ bzip2 -dk rockyou-withcount.txt.bz2
  
Line 139: Line 137:
 
   49952 iloveyou
 
   49952 iloveyou
  
$ # The following command processes the table to remove lines with passwords
+
$ # The following command processes the table to remove lines having non-ASCII
$ # containing characters that are not printable ASCII characters (14541
+
$ # characters or non-printable ASCII characters in the password, and lines
$ # lines/passwords, 18038 accounts), and lines insisting that there were some
+
$ # insisting that there were some accounts with no password. Moreover, the
$ # accounts with no password (1 line, 340 accounts). Moreover, the command
+
$ # command removes every space character not belonging to a password, makes
$ # removes every space character not belonging to a password, makes the rows
+
$ # the rows tab-delimited and writes the result in a file called "ry"
$ # tab-delimited and writes the result in a file called "ry" (161 MiB; many
+
$ # (161 MiB).
$ # bloating spaces removed).
+
$ LC_ALL=C sed -nr 's/^ *([1-9][0-9]*) ([[:print:]]+)$/\1\t\2/p' rockyou-withcount.txt > ry
$ LC_ALL=C sed -n 's/^ *\([1-9][0-9]*\) \([[:print:]]\{1,\}\)$/\1\t\2/p' rockyou-withcount.txt >ry
 
  
 
$ # The following are shell functions to build commands. They will be explained
 
$ # The following are shell functions to build commands. They will be explained
$ # below using examples (I can not express myself well in this language).
+
ä # below using examples (I can not express myself well in this language).
 
$ counta() { LC_ALL=C awk 'BEGIN { FS = "\t"; p = 0; a = 0 } { if ($2 ~ /'"$(printf %s "$1" | sed 'sI/I\\/Ig')"'/) { p++; a += $1 } } END { print a " (" p ")" }' "$2" ;}
 
$ counta() { LC_ALL=C awk 'BEGIN { FS = "\t"; p = 0; a = 0 } { if ($2 ~ /'"$(printf %s "$1" | sed 'sI/I\\/Ig')"'/) { p++; a += $1 } } END { print a " (" p ")" }' "$2" ;}
 
$ countap() { LC_ALL=C awk 'BEGIN { FS = "\t"; p = 0; a = 0 } { if ($2 ~ /'"$(printf %s "$1" | sed 'sI/I\\/Ig')"'/) { p++; a += $1; print $0 } } END { print a " (" p ")" }' "$2" ;}
 
$ countap() { LC_ALL=C awk 'BEGIN { FS = "\t"; p = 0; a = 0 } { if ($2 ~ /'"$(printf %s "$1" | sed 'sI/I\\/Ig')"'/) { p++; a += $1; print $0 } } END { print a " (" p ")" }' "$2" ;}
Line 160: Line 157:
 
671599 (188855)
 
671599 (188855)
  
$ # The first operand of the above command is a extended regular expression
+
$ # The first operand of this command is a extended regular expression (ERE),
$ # (ERE). The second operand is a file, namely the previously generated file
+
$ # namely "love". The second operand of this command is a file, namely the
$ # called "ry", that is the (processed) table. The first number of the output
+
$ # obove generated file called "ry", that is the (processed) table. The first
$ # means: "That many accounts were using a password matching the ERE." The
+
$ # number of the output means: "That many accounts were using a password
$ # second number inside parentheses means: "That many unique passwords matching
+
$ # matching the ERE." The second number in parentheses means: "That many unique
$ # the ERE." If the first number is greater than the second number, some
+
$ # passwords matching the ERE." If the first number is greater than the second
$ # accounts sharing the same password (we will see this clearly in one of the
+
$ # number, some accounts sharing the same password. We will see this clearly in
$ # examples below).
+
$ # some examples below.
  
 
$ # Count how many accounts were using a password containing at least one
 
$ # Count how many accounts were using a password containing at least one
Line 179: Line 176:
 
144 (45)
 
144 (45)
  
$ # Count how many accounts were using a password containing exactly one numeric
+
$ # Count how many accounts were using a password containing exactly one
$ # character:
+
$ # numeric character:
 
$ counta '^[0-9]$' ry
 
$ counta '^[0-9]$' ry
 
55 (10)
 
55 (10)
Line 198: Line 195:
 
55 (10)
 
55 (10)
  
$ # Obove we see the second command at work. You see what it does and what it
+
# Here we see the second command in action. You see what it does and what it
$ # does different. And here we see clearly the meaning of the first number and
+
# does different. And here we see clearly the meaning of the first and the
$ # the second number inside parentheses.
+
# second number in parentheses.
  
 
$ # Count how many accounts were using a password containing at least one
 
$ # Count how many accounts were using a password containing at least one
Line 216: Line 213:
 
$ counta '^[0-9]' ry
 
$ counta '^[0-9]' ry
 
6409397 (3283946)
 
6409397 (3283946)
 
$ # Count how many accounts were using a password containing only numeric
 
$ # characters:
 
$ counta '^[0-9]+$' ry
 
5192990 (2346744)
 
  
 
$ # And, last but not least, count how many accounts were using a password
 
$ # And, last but not least, count how many accounts were using a password
Line 229: Line 221:
 
3 (3)
 
3 (3)
  
$ # Yes, there are some. 14 million unique passwords are a lot. Let's see what
+
$ # Yes, there are some. 14 million passwords are a lot. Let's see what exactly
$ # exactly was used:
+
$ # was used:
 
$ countap '[tT]r[o0]ub[a4]d[o0]r' ry
 
$ countap '[tT]r[o0]ub[a4]d[o0]r' ry
 
1 troubador1
 
1 troubador1

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)

Templates used on this page: