DocumentCode :
3095959
Title :
Ontology-Enhanced Interactive Anonymization in Domain-Driven Data Mining Outsourcing
Author :
Loh, Brian C. S. ; Then, Patrick H. H.
Author_Institution :
Sch. of Eng., Swinburne Univ. of Technol., Kuching, Malaysia
fYear :
2010
fDate :
13-14 Sept. 2010
Firstpage :
9
Lastpage :
14
Abstract :
This paper focuses on a domain-driven data mining outsourcing scenario whereby a data owner publishes data to an application service provider who returns mining results. To ensure data privacy against an un-trusted party, anonymization, a widely used technique capable of preserving true attribute values and supporting various data mining algorithms is required. Several issues emerge when anonymization is applied in a real world outsourcing scenario. The majority of methods have focused on the traditional data mining paradigm, therefore they do not implement domain knowledge nor optimize data for domain-driven usage. Furthermore, existing techniques are mostly non-interactive in nature, providing little control to users while assuming their natural capability of producing Domain Generalization Hierarchies (DGH). Moreover, previous utility metrics have not considered attribute correlations during generalization. To successfully obtain optimal data privacy and actionable patterns in a real world setting, these concerns need to be addressed. This paper proposes an anonymization framework for aiding users in a domain-driven data mining outsourcing scenario. The framework involves several components designed to anonymize data while preserving meaningful or actionable patterns that can be discovered after mining. In contrast with existing works for traditional data-mining, this framework integrates domain ontology knowledge during DGH creation to retain value meanings after anonymization. In addition, users can implement constraints based on their mining tasks thereby controlling how data generalization is performed. Finally, attribute correlations are calculated to ensure preservation of important features. Preliminary experiments show that an ontology-based DGH manages to preserve semantic meaning after attribute generalization. Also, using Chi-Square as a correlation measure can possibly improve attribute selection before generalization.
Keywords :
business data processing; data mining; data privacy; ontologies (artificial intelligence); outsourcing; Chi-Square; application service provider; data privacy; domain generalization hierarchies; domain-driven data mining outsourcing; ontology-enhanced interactive anonymization; Correlation; Data privacy; Diseases; Heart; Measurement; Outsourcing; anonymization; data publishing; domain-driven data mining; outsourcing; privacy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data, Privacy and E-Commerce (ISDPE), 2010 Second International Symposium on
Conference_Location :
Buffalo, NY
Print_ISBN :
978-1-4244-8377-8
Electronic_ISBN :
978-0-7695-4203-4
Type :
conf
DOI :
10.1109/ISDPE.2010.7
Filename :
5636274
Link To Document :
بازگشت