• DocumentCode
    140802
  • Title

    In-RDBMS inverted indexes revisited

  • Author

    Rae, Ian ; Halverson, Alan ; Naughton, J.F.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Wisconsin-Madison, Madison, WI, USA
  • fYear
    2014
  • fDate
    March 31 2014-April 4 2014
  • Firstpage
    352
  • Lastpage
    363
  • Abstract
    Every major open-source and commercial RDBMS offers some form of support for full-text search using inverted indexes. When providing this support, some developers have implemented specialized indexes that adapt techniques from the Information Retrieval (IR) community to work in a database setting, while others have opted to rely on the standard relational query engine to process inverted index lookups. This choice is an important one, since the storage formats and algorithms used can vary greatly between a specialized index and a relational index, but these alternatives have not been thoroughly compared in the same system. Our work explores the differences in implementation and performance of three representative environments for an in-RDBMS inverted index: an in-RDBMS IR engine, a row-oriented relational query engine, and a column-oriented relational query engine. We found that a specialized IR engine integrated into the RDBMS can provide more than an order of magnitude speedup over both the row- and column-oriented relational query engines for conjunctive and phrase queries. For warm queries, this advantage is largely algorithmic, and we show that by using ZigZag merge join to accelerate conjunctive and phrase query processing, relational inverted indexes can provide performance comparable to a specialized in-RDBMS IR engine with no change to the underlying storage format. Compression and index format, in contrast, have more impact on cold queries, where the IR and column-oriented engines are able to outperform the row-oriented engine, even with ZigZag merge join.
  • Keywords
    database indexing; query processing; relational databases; IR community; ZigZag merge join; column-oriented engines; column-oriented relational query engine; commercial RDBMS; conjunctive queries; database setting; full-text search; in-RDBMS IR engine; in-RDBMS inverted indexes; information retrieval; inverted index lookups; open-source RDBMS; phrase queries; relational database management systems; relational index; relational query engine; row-oriented engine; row-oriented relational query engine; specialized index; specialized indexes; storage formats; Algorithm design and analysis; Communities; Encoding; Engines; Indexes; Servers; Standards;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2014 IEEE 30th International Conference on
  • Conference_Location
    Chicago, IL
  • Type

    conf

  • DOI
    10.1109/ICDE.2014.6816664
  • Filename
    6816664