DocumentCode
755486
Title
Anonymizing Classification Data for Privacy Preservation
Author
Fung, Benjamin C M ; Wang, Ke ; Yu, Philip S.
Author_Institution
Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC
Volume
19
Issue
5
fYear
2007
fDate
5/1/2007 12:00:00 AM
Firstpage
711
Lastpage
725
Abstract
Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to an individual\´s privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of nonidentifying attributes such as {Sex, Zip, Birthdate}. A useful approach to combat such linking attacks, called k-anonymization, is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Previous work attempted to find an optimal k-anonymization that minimizes some data distortion metric. We argue that minimizing the distortion to the training data is not relevant to the classification goal that requires extracting the structure of predication on the "future" data. In this paper, we propose a k-anonymization solution for classification. Our goal is to find a k-anonymization, not necessarily optimal in the sense of minimizing data distortion, which preserves the classification structure. We conducted intensive experiments to evaluate the impact of anonymization on the classification on future data. Experiments on real-life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements
Keywords
data analysis; data mining; data privacy; pattern classification; anonymous classification data; k-anonymization solution; privacy preservation; Back; Data analysis; Data mining; Data privacy; Data security; Information security; Joining processes; Medical diagnostic imaging; Protection; Training data; Privacy protection; anonymity; classification; data mining; data sharing.; integrity; security;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2007.1015
Filename
4138206
Link To Document