DocumentCode :
2048857
Title :
Backwards Search in Context Bound Text Transformations
Author :
Petri, Matthias ; Navarro, Gonzalo ; Culpepper, J. Shane ; Puglisi, Simon J.
fYear :
2011
fDate :
21-24 June 2011
Firstpage :
82
Lastpage :
91
Abstract :
The Burrows-Wheeler Transform (BWT) is the basis for many of the most effective compression and self-indexing methods used today. A key to the versatility of the BWT is the ability to search for patterns directly in the transformed text. A backwards search for a pattern P can be performed on a transformed text by iteratively determining the range of suffixes that match P. The search can be further enhanced by constructing a wavelet tree over the output of the BWT in order to emulate a suffix array. In this paper, we investigate new algorithms for search derived from a variation of the BWT whereby rotations are only sorted to a depth k, commonly referred to as a context bound transform. Interestingly, this BWT variant can be used to mimic a k-gram index, which are used in a variety of applications that need to efficiently return occurrences in text position order. In this paper, we present the first backwards search algorithms on the k-BWT, and show how to construct a self-index containing many of the attractive properties of a k-gram index.
Keywords :
indexing; pattern matching; text analysis; trees (mathematics); wavelet transforms; Burrows-Wheeler transform; backwards search; compression method; context bound text transformations; k-gram index; pattern matching problem; self-indexing method; suffix array; wavelet tree; Arrays; Context; Educational institutions; Indexes; Pattern matching; Sorting; Transforms; backwards search; bwt; context bound transform; k-gram index; text indexing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression, Communications and Processing (CCP), 2011 First International Conference on
Conference_Location :
Palinuro
Print_ISBN :
978-1-4577-1458-0
Electronic_ISBN :
978-0-7695-4528-8
Type :
conf
DOI :
10.1109/CCP.2011.18
Filename :
6061006
Link To Document :
بازگشت