DocumentCode :
2554949
Title :
The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords
Author :
Bonneau, J.
Author_Institution :
Comput. Lab., Univ. of Cambridge, Cambridge, UK
fYear :
2012
fDate :
20-23 May 2012
Firstpage :
538
Lastpage :
552
Abstract :
We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker´s desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.
Keywords :
Web sites; message authentication; statistical analysis; Shannon entropy; Yahoo! users; anonymized corpus; anonymized password histograms; demographic factors; graphical feedback; guessing difficulty; guessing entropy; optimal offline dictionary attack; partial guessing metrics; password distributions; privacy concerns; secret distribution; security engineering; security motivations; site usage characteristics; statistical treatment; user-chosen passwords; Cryptography; Dictionaries; Entropy; Measurement; Privacy; Semantics; authentication; computer security; data mining; information theory; statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Security and Privacy (SP), 2012 IEEE Symposium on
Conference_Location :
San Francisco, CA
ISSN :
1081-6011
Print_ISBN :
978-1-4673-1244-8
Electronic_ISBN :
1081-6011
Type :
conf
DOI :
10.1109/SP.2012.49
Filename :
6234435
Link To Document :
بازگشت