مرکز منطقه ای اطلاع رساني علوم و فناوري - Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud

DocumentCode :

88349

Title :

Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud

Author :

Xuyun Zhang ; Wanchun Dou ; Jian Pei ; Nepal, Surya ; Chi Yang ; Chang Liu ; Jinjun Chen

Author_Institution :

Fac. of Eng. & IT, Univ. of Technol., Sydney, NSW, Australia

Volume :

Issue :

fYear :

2015

fDate :

Aug. 1 2015

Firstpage :

2293

Lastpage :

2307

Abstract :

Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacy-sensitive information, which brings about privacy concerns potentially if the information is released or shared to third-parties in cloud. A practical and widely-adopted technique for data privacy preservation is to anonymize data via generalization to satisfy a given privacy model. However, most existing privacy preserving approaches tailored to small-scale data sets often fall short when encountering big data, due to their insufficiency or poor scalability. In this paper, we investigate the local-recoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes, and model the problem of local recoding as a proximity-aware clustering problem. A scalable two-phase clustering approach consisting of a t-ancestors clustering (similar to k-means) algorithm and a proximity-aware agglomerative clustering algorithm is proposed to address the above problem. We design the algorithms with MapReduce to gain high scalability by performing data-parallel computation in cloud. Extensive experiments on real-life data sets demonstrate that our approach significantly improves the capability of defending the proximity privacy breaches, the scalability and the time-efficiency of local-recoding anonymization over existing approaches.

Keywords :

Big Data; cloud computing; data privacy; distributed processing; pattern clustering; MapReduce; big data anonymization; big data applications; business; cloud computing; data-parallel computation; electronic health records; healthcare; local-recoding anonymization; local-recoding problem; privacy concerns; privacy-sensitive information; proximity privacy breaches; proximity-aware agglomerative clustering algorithm; proximity-aware clustering problem; proximity-aware local-recoding anonymization; scalable IT infrastructure; scalable big data privacy preservation; scalable two-phase clustering approach; small-scale data sets; t-ancestors clustering algorithm; Big data; Couplings; Data models; Data privacy; Numerical models; Privacy; Scalability; Big Data; Big data; Cloud Computing; Data Anonymization; MapReduce; Proximity Privacy; cloud computing; data anonymization; mapreduce; proximity privacy;

fLanguage :

English

Journal_Title :

Computers, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9340

Type :

jour

DOI :

10.1109/TC.2014.2360516

Filename :

6911981

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=88349