A k-anonymity method based on search engine query statistics for disaster impact statements

Author

Oguri, Hiroki ; Sonehara, Noboru

Author_Institution

Multidisciplinary Sch., Inf. Dept., NIFTY Corp., Grad. University for Adv. Studies, Tokyo, Japan

fYear

2014

fDate

8-12 Sept. 2014

Firstpage

447

Lastpage

454

Abstract

Privacy is a major concern in the management of big data, especially for datasets that contain sensitive personal information. Personal information is frequently used in marketing analyses, and we can also use it to evaluate the damage situation at the time of a disaster. One model that is widely used to protect privacy is k-anonymity, which can be generally defined as a clustering method in which any record in a dataset is indistinguishable from at least (k-1) other records in the same dataset. Most approaches to k-anonymity suffer from huge information loss due to the abstraction of continuous numerical and categorical attributes that have a hierarchical structure. It is difficult to use conventional k-anonymity with actual Internet services because of the computational complexity and value loss stemming from the loss of information. In this paper, we propose an anonymous algorithm that can respond to both the marketing and disaster analyzing. In ordinary times, we can analyze personal data with this algorithm using SEM price, and in times of disaster, we ensure information anonymity according to the number of times a searched word appears and distribute only the necessary information. This approach makes it possible to calculate only the necessary data and to maintain a sufficient k-anonym zed level. Application of this method to actual data showed that using an index number of the occurrences of the search term makes it is possible to anonymize the information with preferentially partitioning disaster locations.

Keywords

Big Data; Internet; query processing; search engines; statistics; Internet services; SEM price; anonymous algorithm; big data; computational complexity; disaster impact statements; index number; information anonymity; k-anonym zed level; k-anonymity method; partitioning disaster locations; personal data; privacy protection; search engine query statistics; search term; sensitive personal information; Availability; Security; Algorithm; Big Data mining; Big Data security; Privacy preserving; k-anonymity;

fLanguage

English

Publisher

ieee

Conference_Titel

Availability, Reliability and Security (ARES), 2014 Ninth International Conference on

Conference_Location

Fribourg

Type

conf

DOI

10.1109/ARES.2014.68

Filename

6980317