• DocumentCode
    71665
  • Title

    Query Analytics over Probabilistic Databases with Unmerged Duplicates

  • Author

    Ioannou, Ekaterini ; Garofalakis, Minos

  • Author_Institution
    Sch. of Electron. & Comput. Eng., Tech. Univ. of Crete, Chania, Greece
  • Volume
    27
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 1 2015
  • Firstpage
    2245
  • Lastpage
    2260
  • Abstract
    Recent entity resolution approaches exhibit benefits when addressing the problem through unmerged duplicates: instances describing real-world objects are not merged based on apriori thresholds or human intervention, instead relevant resolution information is employed for evaluating resolution decisions during query processing using “possible worlds” semantics. In this paper, we present the first known approach for efficiently handling complex analytical queries over probabilistic databases with unmerged duplicates. We propose the ENTITY-JOIN operator that allows expressing complex aggregation and iceberg/top-k queries over joins between tables with unmerged duplicates and other database tables. Our technical content includes a novel indexing structure for efficient access to the entity resolution information and novel techniques for the efficient evaluation of complex probabilistic queries that retrieve analytical and summarized information over a (potentially, huge) collection of possible resolution worlds. Our extensive experimental evaluation verifies the benefits of our approach.
  • Keywords
    database management systems; indexing; merging; probability; query processing; ENTITY-JOIN operator; complex aggregation; complex analytical query handling; complex probabilistic queries; entity resolution approach; human intervention; iceberg-top-k queries; indexing structure; possible world semantics; probabilistic databases; query analytics; query processing; unmerged duplicates; Aggregates; Couplings; Data models; Indexing; Probabilistic logic; Semantics; Entity resolution; entity resolution; probabilistic databases; probabilistic databases.; umerged duplicates; unmerged duplicates;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2015.2405507
  • Filename
    7045501