Title :
A novel approach of data sanitization by noise addition and knowledge discovery by clustering
Author :
Hadi Abdullah;Ahsan Siddiqi;Fuad Bajaber
Author_Institution :
University of Richmond, Richmond, USA
Abstract :
Security of published data cannot be less important as compared to unpublished data or the data which is not made public. Therefore, PII (Personally Identifiable Information) is removed and data sanitized when organizations recording large volumes of data publish that data. However, this approach of ensuring data privacy and security can result in loss of utility of that published data for knowledge discovery. Therefore, a balance is required between privacy and the utility needs of published data. In this paper we study this delicate balance by evaluating four data mining clustering techniques for knowledge discovery and propose two privacy/utility quantification parameters. We subsequently perform number of experiments to statistically identify which clustering technique is best suited with desirable level of privacy/utility while noise is incrementally increased by simultaneously degrading data accuracy, completeness and consistency.
Keywords :
"Data privacy","Privacy","Databases","Knowledge discovery","Data security"
Conference_Titel :
Computer Networks and Information Security (WSCNIS), 2015 World Symposium on
Print_ISBN :
978-1-4799-9906-4
DOI :
10.1109/WSCNIS.2015.7368283