Title :
Finding an λ-representative subset from massive data
Author :
Jin Zhang ; Qiang Wei ; Guoqing Chen
Author_Institution :
Res. Center for Contemporary Manage., Tsinghua Univ., Beijing, China
Abstract :
Retrieving representative information from large-scale data becomes an important research issue nowadays. This paper focuses on certain aspects of representativeness in database queries and web search, and proposes an approach to extracting a subset of results from original search results in light of high coverage and low redundancy. In the paper, the notion of λ-Represent is introduced based on similarities and related fuzzy operations, which enables us to describe the λ-Represent relationship between the sets of data objects. Then, the λ-Representative problem is formulated as an extension of the typical set covering problem, which leads to developing a heuristic algorithm (namely, LamRep) to cope with the problem effectively. In LamRep, a “vote” mechanism is proposed to overcome the limitation of the naive greedy algorithm. Data experiments on benchmark data show that LamRep outperforms the other approaches.
Keywords :
greedy algorithms; heuristic programming; query processing; λ-Represent relationship; λ-Representative problem; λ-Representative subset; LamRep; Web search; data objects; database queries; fuzzy operations; greedy algorithm; heuristic algorithm; large-scale data; massive data; representative information retrieval; vote mechanism; Algorithm design and analysis; Benchmark testing; Data mining; Databases; Greedy algorithms; Redundancy; Web search;
Conference_Titel :
IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint
Conference_Location :
Edmonton, AB
DOI :
10.1109/IFSA-NAFIPS.2013.6608466