Title :
A Disc-based Approach to Data Summarization and Privacy Preservation
Author :
Ge, Rong ; Ester, Martin ; Jin, Wen ; Hu, Zengjian
Author_Institution :
Simon Fraser Univ., Burnaby, BC
Abstract :
Data summarization has been recognized as a fundamental operation in database systems and data mining with important applications such as data compression and privacy preservation. While the existing methods such as CF-values and DataBubbles may perform reasonably well, they cannot provide any guarantees on the quality of their results. In this paper, we introduce a summarization approach for numerical data based on discs formalizing the notion of quality. Our objective is to find a minimal set of discs, i.e. spheres satisfying a radius and a significance constraint, covering the given dataset. Since the proposed problem is NP-complete, we design two different approximation algorithms. These algorithms have a quality guarantee, but they do not scale well to large databases. However, the machinery from approximation algorithms allows a precise characterization of a further, heuristic algorithm. This heuristic, efficient algorithm exploits multi-dimensional index structures and can be well-integrated with database systems. The experiments show that our heuristic algorithm generates summaries that outperform the state-of-the-art data bubbles in terms of internal measures as well as in terms of external measures when using the data summaries as input for clustering methods
Keywords :
computational complexity; data compression; data mining; data privacy; database indexing; disc storage; pattern clustering; security of data; NP-complete problem; approximation algorithm; data bubbles; data clustering; data compression; data mining; data privacy preservation; data summarization; database system; disc-based approach; heuristic algorithm; multidimensional index structure; Algorithm design and analysis; Approximation algorithms; Clustering algorithms; Data compression; Data mining; Data privacy; Database systems; Heuristic algorithms; Indexes; Machinery;
Conference_Titel :
Scientific and Statistical Database Management, 2006. 18th International Conference on
Conference_Location :
Vienna
Print_ISBN :
0-7695-2590-3
DOI :
10.1109/SSDBM.2006.6