Title :
Indexing Structured Documents with Suffix Arrays
Author :
B´ez, Y.A. ; Jiménez, Rafael C Carrasco
Author_Institution :
Dept. de Inf., Univ. Agraria de La Habana, San Jose de las Lajas, Cuba
Abstract :
Path indexes based on suffix trees have shown to be highly efficient structures when dealing with digital collection that consists of structured documents, since they provide a fast response to queries including structural requirements. Nevertheless, when the collection consists of highly heterogeneous documents, suffix trees may be too memory demanding. In such cases, the use of a suffix array as the underlying data storage permits a considerable reduction in space requirements, partially because suffix arrays are a remarkably light data structure and partially because they do not store redundant information regarding the textual content. We describe how a suffix array can be used as the data structure which stores the structural index in a retrieval system and provides a virtual index of all sub paths in the digital collection. We also show how an auxiliary ternary search tree can accelerate the resolution of structural queries with only a marginal increase in memory usage.
Keywords :
SQL; indexing; query processing; text analysis; tree data structures; tree searching; virtual storage; auxiliary ternary search tree; data storage; data structure; digital collection; heterogeneous document; memory usage; path index; retrieval system; structural query processing; structured document indexing; suffix array; suffix tree; textual content; virtual index; Acceleration; Arrays; Indexing; Memory management; XML; XML; path index; suffix array; ternary search tree;
Conference_Titel :
Computational Science and Its Applications (ICCSA), 2012 12th International Conference on
Conference_Location :
Salvador
Print_ISBN :
978-1-4673-1691-0
DOI :
10.1109/ICCSA.2012.17