• DocumentCode
    2080969
  • Title

    Fast in-memory XPath search using compressed indexes

  • Author

    Arroyuelo, Diego ; Claude, Francisco ; Maneth, Sebastian ; Maakinen, Veli ; Navarro, Gonzalo ; Nguyen, Kim ; Siren, Jouni ; Valimaki, Niko

  • Author_Institution
    Yahoo! Res. Latin America, Santiago, Chile
  • fYear
    2010
  • fDate
    1-6 March 2010
  • Firstpage
    417
  • Lastpage
    428
  • Abstract
    A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document´s text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced. It stores the tree structure of an XML document using a bit array of opening and closing brackets plus a sequence of labels, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.
  • Keywords
    XML; document handling; query languages; search engines; text analysis; tree data structures; SXSI system; SXSI system perform; XML document; XPath query engine; XPath query language; XPathMark queries; compressed indexes; global compressed self index; in-memory XPath search; text indexes; text search; tree automata; tree indexes; Australia; Automata; Computer science; Data structures; Database languages; Engines; Random access memory; Read-write memory; Tree data structures; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2010 IEEE 26th International Conference on
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    978-1-4244-5445-7
  • Electronic_ISBN
    978-1-4244-5444-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2010.5447858
  • Filename
    5447858