مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic Generation of Authentication Questions from Private Messages

Abstract :

In this paper, we propose a method of automatically generating authentication questions in social network services (SNSs) and mail account services. When a malicious user obtains a password of some SNS or mail account, the user can access private messages posted/sent to or from the account. If it is an SNS account, the user can also access messages posted in closed SNS groups the account participates in. In order to prevent it, many systems pose additional questions when a suspicious user tries to login to an account or tries to access messages in a closed group. Our method automatically generates such authentication questions for an account or group by using the messages in that account or group. Our method shows one of the messages with substituting one noun with a blank, and ask the accessing user what word was there. To detect fake users, we need to select a noun that is easy enough for the authentic user to remember but is sufficiently difficult for fake users to infer based on general knowledge and information on the Web. We select a noun based on two factors. First, for each candidate noun, we compute its co-occurrence degrees on the Web with other words in the same message. If a noun has high co-occurrence degrees with other words in the message, the noun must be easy for fake users to infer. Second, our system collects coordinate terms (co-hyponyms) of each candidate noun, and calculate the same co-occurrence degrees of them. If there are coordinate terms that have higher co-occurrence degrees than a candidate noun, we expect that the noun is difficult for fake users to infer because those coordinate terms seem to fake users more likely to be the answer. We developed four methods of noun selection based on these two factors. Our preliminary experiment shows that the selection only based on the former factor produces more difficult questions than the selection based on the both factors, but it often chooses uncommon words that are difficult to remember even for authentic users. On the other hand, the selection based on the both factors often chooses more common words, but it sometimes chooses words that even fake users can easily guess.