DocumentCode :
2209627
Title :
Enforcing Vocabulary k-Anonymity by Semantic Similarity Based Clustering
Author :
Liu, Junqiang ; Wang, Ke
Author_Institution :
Simon Fraser Univ., Burnaby, BC, Canada
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
899
Lastpage :
904
Abstract :
Web query logs provide a rich wealth of information, but also present serious privacy risks. We consider publishing vocabularies, bags of query-terms extracted from web query logs, which has a variety of applications. We aim at preventing identity disclosure of such bag-valued data. The key feature of such data is the extreme sparsity, which renders conventional anonymization techniques not working well in retaining enough utility. We propose a semantic similarity based clustering approach to address the issue. We measure the semantic similarity between two vocabularies by a weighted bipartite matching and present a greedy algorithm to cluster vocabularies by the semantic similarities. Extensive experiments on the AOL query log show that our approach retains more data utility than existing approaches.
Keywords :
Internet; greedy algorithms; pattern clustering; publishing; query processing; vocabulary; AOL query log; Web query log; anonymization technique; bag valued data; cluster vocabulary k-anonymity; data utility; greedy algorithm; publishing vocabulary; query-term extraction; semantic similarity based clustering; weighted bipartite matching; Anonymity; bag-valued data; privacy; web query logs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.59
Filename :
5694058
Link To Document :
بازگشت