• DocumentCode
    2835487
  • Title

    Indexing Structured Documents with Suffix Arrays

  • Author

    B´ez, Y.A. ; Jiménez, Rafael C Carrasco

  • Author_Institution
    Dept. de Inf., Univ. Agraria de La Habana, San Jose de las Lajas, Cuba
  • fYear
    2012
  • fDate
    18-21 June 2012
  • Firstpage
    43
  • Lastpage
    48
  • Abstract
    Path indexes based on suffix trees have shown to be highly efficient structures when dealing with digital collection that consists of structured documents, since they provide a fast response to queries including structural requirements. Nevertheless, when the collection consists of highly heterogeneous documents, suffix trees may be too memory demanding. In such cases, the use of a suffix array as the underlying data storage permits a considerable reduction in space requirements, partially because suffix arrays are a remarkably light data structure and partially because they do not store redundant information regarding the textual content. We describe how a suffix array can be used as the data structure which stores the structural index in a retrieval system and provides a virtual index of all sub paths in the digital collection. We also show how an auxiliary ternary search tree can accelerate the resolution of structural queries with only a marginal increase in memory usage.
  • Keywords
    SQL; indexing; query processing; text analysis; tree data structures; tree searching; virtual storage; auxiliary ternary search tree; data storage; data structure; digital collection; heterogeneous document; memory usage; path index; retrieval system; structural query processing; structured document indexing; suffix array; suffix tree; textual content; virtual index; Acceleration; Arrays; Indexing; Memory management; XML; XML; path index; suffix array; ternary search tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Its Applications (ICCSA), 2012 12th International Conference on
  • Conference_Location
    Salvador
  • Print_ISBN
    978-1-4673-1691-0
  • Type

    conf

  • DOI
    10.1109/ICCSA.2012.17
  • Filename
    6257608