Finding an λ-representative subset from massive data

Author

Jin Zhang ; Qiang Wei ; Guoqing Chen

Author_Institution

Res. Center for Contemporary Manage., Tsinghua Univ., Beijing, China

fYear

2013

fDate

24-28 June 2013

Firstpage

585

Lastpage

590

Abstract

Retrieving representative information from large-scale data becomes an important research issue nowadays. This paper focuses on certain aspects of representativeness in database queries and web search, and proposes an approach to extracting a subset of results from original search results in light of high coverage and low redundancy. In the paper, the notion of λ-Represent is introduced based on similarities and related fuzzy operations, which enables us to describe the λ-Represent relationship between the sets of data objects. Then, the λ-Representative problem is formulated as an extension of the typical set covering problem, which leads to developing a heuristic algorithm (namely, LamRep) to cope with the problem effectively. In LamRep, a “vote” mechanism is proposed to overcome the limitation of the naive greedy algorithm. Data experiments on benchmark data show that LamRep outperforms the other approaches.

Keywords

greedy algorithms; heuristic programming; query processing; λ-Represent relationship; λ-Representative problem; λ-Representative subset; LamRep; Web search; data objects; database queries; fuzzy operations; greedy algorithm; heuristic algorithm; large-scale data; massive data; representative information retrieval; vote mechanism; Algorithm design and analysis; Benchmark testing; Data mining; Databases; Greedy algorithms; Redundancy; Web search;

fLanguage

English

Publisher

ieee

Conference_Titel

IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint

Conference_Location

Edmonton, AB

Type

conf

DOI

10.1109/IFSA-NAFIPS.2013.6608466

Filename

6608466