مرکز منطقه ای اطلاع رساني علوم و فناوري - Finding representative set from massive data

DocumentCode :

2864934

Title :

Finding representative set from massive data

Author :

Pan, Feng ; Wang, Wei ; Tung, Anthony K H ; Yang, Jiong

Author_Institution :

North Carolina Univ., Chapel Hill, NC, USA

fYear :

2005

fDate :

27-30 Nov. 2005

Abstract :

In the information age, data is pervasive. In some applications, data explosion is a significant phenomenon. The massive data volume poses challenges to both human users and computers. In this project, we propose a new model for identifying representative set from a large database. A representative set is a special subset of the original dataset, which has three main characteristics: It is significantly smaller in size compared to the original dataset. It captures the most information from the original dataset compared to other subsets of the same size. It has low redundancy among the representatives it contains. We use information-theoretic measures such as mutual information and relative entropy to measure the representativeness of the representative set. We first design a greedy algorithm and then present a heuristic algorithm that delivers much better performance. We run experiments on two real datasets and evaluate the effectiveness of our representative set in terms of coverage and accuracy. The experiments show that our representative set attains expected characteristics and captures information more efficiently.

Keywords :

entropy; greedy algorithms; very large databases; data explosion; greedy algorithm; heuristic algorithm; information-theoretic measures; large database; massive data volume; mutual information; relative entropy; Algorithm design and analysis; Application software; Clustering algorithms; Databases; Entropy; Explosions; Greedy algorithms; Humans; Interference; Mutual information;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining, Fifth IEEE International Conference on

ISSN :

1550-4786

Print_ISBN :

0-7695-2278-5

Type :

conf

DOI :

10.1109/ICDM.2005.69

Filename :

1565697

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2864934