DocumentCode
2835487
Title
Indexing Structured Documents with Suffix Arrays
Author
B´ez, Y.A. ; Jiménez, Rafael C Carrasco
Author_Institution
Dept. de Inf., Univ. Agraria de La Habana, San Jose de las Lajas, Cuba
fYear
2012
fDate
18-21 June 2012
Firstpage
43
Lastpage
48
Abstract
Path indexes based on suffix trees have shown to be highly efficient structures when dealing with digital collection that consists of structured documents, since they provide a fast response to queries including structural requirements. Nevertheless, when the collection consists of highly heterogeneous documents, suffix trees may be too memory demanding. In such cases, the use of a suffix array as the underlying data storage permits a considerable reduction in space requirements, partially because suffix arrays are a remarkably light data structure and partially because they do not store redundant information regarding the textual content. We describe how a suffix array can be used as the data structure which stores the structural index in a retrieval system and provides a virtual index of all sub paths in the digital collection. We also show how an auxiliary ternary search tree can accelerate the resolution of structural queries with only a marginal increase in memory usage.
Keywords
SQL; indexing; query processing; text analysis; tree data structures; tree searching; virtual storage; auxiliary ternary search tree; data storage; data structure; digital collection; heterogeneous document; memory usage; path index; retrieval system; structural query processing; structured document indexing; suffix array; suffix tree; textual content; virtual index; Acceleration; Arrays; Indexing; Memory management; XML; XML; path index; suffix array; ternary search tree;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Its Applications (ICCSA), 2012 12th International Conference on
Conference_Location
Salvador
Print_ISBN
978-1-4673-1691-0
Type
conf
DOI
10.1109/ICCSA.2012.17
Filename
6257608
Link To Document