DocumentCode :
2850304
Title :
Bottom-up generalization: a data mining solution to privacy protection
Author :
Wang, Ke ; Yu, Philip S. ; Chakraborty, Sourav
Author_Institution :
Simon Fraser Univ., Burnaby, BC, Canada
fYear :
2004
fDate :
1-4 Nov. 2004
Firstpage :
249
Lastpage :
256
Abstract :
The well-known privacy-preserved data mining modifies existing data mining techniques to randomized data. In this paper, we investigate data mining as a technique for masking data, therefore, termed data mining based privacy protection. This approach incorporates partially the requirement of a targeted data mining task into the process of masking data so that essential structure is preserved in the masked data. The idea is simple but novel: we explore the data generalization concept from data mining as a way to hide detailed information, rather than discover trends and patterns. Once the data is masked, standard data mining techniques can be applied without modification. Our work demonstrated another positive use of data mining technology: not only can it discover useful patterns, but also mask private information. We consider the following privacy problem: a data holder wants to release a version of data for building classification models, but wants to protect against linking the released data to an external source for inferring sensitive information. We adapt an iterative bottom-up generalization from data mining to generalize the data. The generalized data remains useful to classification but becomes difficult to link to other sources. The generalization space is specified by a hierarchical structure of generalizations. A key is identifying the best generalization to climb up the hierarchy at each iteration. Enumerating all candidate generalizations is impractical. We present a scalable solution that examines at most one generalization in each iteration for each attribute involved in the linking.
Keywords :
data mining; data privacy; generalisation (artificial intelligence); pattern clustering; bottom-up generalization; classification model; data generalization; data holder; data masking; data mining; privacy protection; randomized data; Biomedical engineering; Chemical engineering; Councils; Couplings; Data mining; Data privacy; Iris; Joining processes; Protection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
Print_ISBN :
0-7695-2142-8
Type :
conf
DOI :
10.1109/ICDM.2004.10110
Filename :
1410291
Link To Document :
بازگشت