Title :
Exploring new privacy approaches in a scalable classification framework
Author :
Saravanan, M. ; Thoufeeq, A.M. ; Akshaya, S. ; Jayasre Manchari, V.L.
Author_Institution :
Ericsson Res. India, Ericsson India Global Services Pvt. Ltd., Chennai, India
Abstract :
Recent advancements in Information and Communication Technologies (ICT) enable many organizations to collect, store and control massive amount of various types of details of individuals from their regular transactions (credit card, mobile phone, smart meter etc.). While using these wealth of information for Personalized Recommendations provides enormous opportunities for applying data mining (or machine learning) tasks, there is a need to address the challenge of preserving individuals privacy during the time of running predictive analytics on Big Data. Privacy Preserving Data Mining (PPDM) on these applications is particularly challenging, because it involves and process large volume of complex, heterogeneous, and dynamic details of individuals. Ensuring that privacy-protected data remains useful in intended applications, such as building accurate data mining models or enabling complex analytic tasks, is essential. Differential Privacy has been tried with few of the PPDM methods and is immune to attacks with auxiliary information. In this paper, we propose a distributed implementation based on Map Reduce computing model for C4.5 Decision Tree algorithm and run extensive experiments on three different datasets using Hadoop Cluster. The novelty of this work is to experiment two different privacy methods: First method is to use perturbed data on decision tree algorithm for prediction in privacy-preserving data sharing and the second method is based on applying raw data to the privacy-preserving decision tree algorithm for private data analysis. In addition to this, we propose the combination of the methods as hybrid technique to maintain accuracy (Utility) and privacy in an acceptable level. The proposed privacy approaches has two potential benefits in the context of data mining tasks: it allows the service providers to outsource data mining tasks without exposing the raw data, and it allows data providers to share data access to third parties while limiting privacy - isks.
Keywords :
data mining; data privacy; decision trees; learning (artificial intelligence); C4.5 decision tree algorithm; Hadoop Cluster; ICT; big data; differential privacy; information and communication technologies; machine learning; map reduce computing model; personalized recommendation; privacy preserving data mining; private data analysis; scalable classification; Big data; Classification algorithms; Data privacy; Decision trees; Noise; Privacy; Scalability; Hybrid data privacy; Map Reduce Framework; Privacy Approaches; Privacy Preserving data Mining; Scalability;
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2014 International Conference on
DOI :
10.1109/DSAA.2014.7058075