• DocumentCode
    3166252
  • Title

    Finding an λ-representative subset from massive data

  • Author

    Jin Zhang ; Qiang Wei ; Guoqing Chen

  • Author_Institution
    Res. Center for Contemporary Manage., Tsinghua Univ., Beijing, China
  • fYear
    2013
  • fDate
    24-28 June 2013
  • Firstpage
    585
  • Lastpage
    590
  • Abstract
    Retrieving representative information from large-scale data becomes an important research issue nowadays. This paper focuses on certain aspects of representativeness in database queries and web search, and proposes an approach to extracting a subset of results from original search results in light of high coverage and low redundancy. In the paper, the notion of λ-Represent is introduced based on similarities and related fuzzy operations, which enables us to describe the λ-Represent relationship between the sets of data objects. Then, the λ-Representative problem is formulated as an extension of the typical set covering problem, which leads to developing a heuristic algorithm (namely, LamRep) to cope with the problem effectively. In LamRep, a “vote” mechanism is proposed to overcome the limitation of the naive greedy algorithm. Data experiments on benchmark data show that LamRep outperforms the other approaches.
  • Keywords
    greedy algorithms; heuristic programming; query processing; λ-Represent relationship; λ-Representative problem; λ-Representative subset; LamRep; Web search; data objects; database queries; fuzzy operations; greedy algorithm; heuristic algorithm; large-scale data; massive data; representative information retrieval; vote mechanism; Algorithm design and analysis; Benchmark testing; Data mining; Databases; Greedy algorithms; Redundancy; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint
  • Conference_Location
    Edmonton, AB
  • Type

    conf

  • DOI
    10.1109/IFSA-NAFIPS.2013.6608466
  • Filename
    6608466