DocumentCode
2141415
Title
Probabilistic frameworks for privacy-aware data mining
Author
Ghosh, Joydeep
Author_Institution
Schlumberger Centennial Chair in Eng., Univ. of Texas at Austin, Austin, TX
fYear
2008
fDate
17-20 June 2008
Abstract
Often several cooperating parties would like to have a global view of their joint data for various data mining objectives, but cannot reveal the contents of individual records due to privacy, ownership or competitive considerations. In this talk, we present a probabilistic framework for resolving such seemingly contradictory goals. Rather than sharing parts of the original or perturbed data, the framework shares the parameters of suitable probabilistic models built at each local data site. We mathematically show that the best representative of all the data is a certain ldquomeanrdquo model, and empirically show that this model can be approximated quite well by generating artificial samples from the underlying distributions using Markov chain Monte Carlo techniques, and then fitting a combined global model with a chosen parametric form to these samples. We also propose a new measure that quantifies privacy in such situations based on information theoretic concepts, and show that decreasing privacy leads to a higher quality of the combined model and vice versa. The method can also be applied to situations where different local datasets may not have identical features by using certain maximum likelihood and maximum entropy principles. We provide empirical results on different data types with continuous vector, categorical and directional attributes to highlight the generality of our framework. The results show that high quality distributed clustering or classification can be achieved with little privacy loss and low communication cost.
Keywords
Markov processes; Monte Carlo methods; data mining; data privacy; maximum entropy methods; maximum likelihood estimation; pattern classification; pattern clustering; probability; Markov chain Monte Carlo techniques; classification; distributed clustering; maximum entropy principle; maximum likelihood principles; privacy-aware data mining; probabilistic frameworks; Artificial neural networks; Books; Competitive intelligence; Data analysis; Data engineering; Data mining; Mathematical model; Privacy; Societies; Web mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligence and Security Informatics, 2008. ISI 2008. IEEE International Conference on
Conference_Location
Taipei
Print_ISBN
978-1-4244-2414-6
Electronic_ISBN
978-1-4244-2415-3
Type
conf
DOI
10.1109/ISI.2008.4565014
Filename
4565014
Link To Document