Title :
A decision tree based quasi-identifier perturbation technique for preserving privacy in data mining
Author :
Dai, Bi-Ru ; Lin, Yang-Tze
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ. of Sci. & Technol., Taipei
Abstract :
Classification is an important issue in data mining, and decision tree is one of the most popular techniques for classification analysis. Some data sources contain private personal information that people are unwilling to reveal. The disclosure of person-specific data is possible to endanger thousands of people, and therefore the dataset should be protected before it is released for mining. However, techniques to hide private information usually modify the original dataset without considering influences on the prediction accuracy of a classification model. In this paper, we propose an algorithm to protect personal privacy for classification model based on decision tree. Our goal is to hide all person-specific information with minimized data perturbation. Furthermore, the prediction capability of the decision tree classifier can be maintained. As demonstrated in the experiments, the proposed algorithm can successfully hide private information with fewer disturbances of the classifier.
Keywords :
data mining; data privacy; decision trees; classification analysis; data mining; data privacy; decision tree classifier; quasiidentifier perturbation technique; Classification tree analysis; Clustering algorithms; Computer science; Data engineering; Data mining; Data privacy; Decision trees; Perturbation methods; Predictive models; Protection; classification; decision tree; preserving privacy; quasi-identifier;
Conference_Titel :
Research Challenges in Information Science, 2009. RCIS 2009. Third International Conference on
Conference_Location :
Fez
Print_ISBN :
978-1-4244-2864-9
Electronic_ISBN :
978-1-4244-2865-6
DOI :
10.1109/RCIS.2009.5089282