• DocumentCode
    66
  • Title

    AML: Efficient Approximate Membership Localization within a Web-Based Join Framework

  • Author

    Li, Zhixu ; Sitbon, Laurianne ; Wang, Liwei ; Zhou, Xiaofang ; Du, Xiaoyong

  • Author_Institution
    Sch. of Inf. Technol. & Electr. Eng., Univ. of Queensland, Brisbane, QLD, Australia
  • Volume
    25
  • Issue
    2
  • fYear
    2013
  • fDate
    Feb. 2013
  • Firstpage
    298
  • Lastpage
    310
  • Abstract
    In this paper, we propose a new type of Dictionary-based Entity Recognition Problem, named Approximate Membership Localization (AML). The popular Approximate Membership Extraction (AME) provides a full coverage to the true matched substrings from a given document, but many redundancies cause a low efficiency of the AME process and deteriorate the performance of real-world applications using the extracted substrings. The AML problem targets at locating nonoverlapped substrings which is a better approximation to the true matched substrings without generating overlapped redundancies. In order to perform AML efficiently, we propose the optimized algorithm P-Prune that prunes a large part of overlapped redundant matched substrings before generating them. Our study using several real-word data sets demonstrates the efficiency of P-Prune over a baseline method. We also study the AML in application to a proposed web-based join framework scenario which is a search-based approach joining two tables using dictionary-based entity recognition from web documents. The results not only prove the advantage of AML over AME, but also demonstrate the effectiveness of our search-based approach.
  • Keywords
    Internet; dictionaries; document handling; string matching; AME process; AML problem; P-prune algorithm; Web documents; Web-based join framework; approximate membership extraction; approximate membership localization; dictionary-based entity recognition problem; extracted substrings; nonoverlapped substring localization; overlapped redundant matched substrings; real-word data sets; search-based approach; true matched substrings; Approximation algorithms; Approximation methods; Correlation; Dictionaries; Pattern matching; Web search; AML; Web-based join; approximate membership location;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2011.178
  • Filename
    5989807