DocumentCode :
2742358
Title :
Compressed index for dynamic text
Author :
Hon, Wing-Kai ; Lam, Tak-Wah ; Sadakane, Kunihiko ; Sung, Wing-Kin ; Yiu, Siu-Ming
Author_Institution :
Dept. of Comput. Sci., Hong Kong Univ., China
fYear :
2004
fDate :
23-25 March 2004
Firstpage :
102
Lastpage :
111
Abstract :
This paper investigates how to index a text which is subject to updates. The best solution in the literature (P.Ferragina, et al., 1998) is based on suffix tree using O(n log n) bits of storage, where n is the length of the text. It supports finding all occurrences of a pattern P in O(|P|+occ) time, where occ is the number of occurrences. Each text update consists of inserting or deleting a substring of length y and can be supported in O(y+√(n)) time. In this paper, we initiate the study of compressed index using only O(n log |Σ|) bits of space, where Σ denotes the alphabet. Our solution supports finding all occurrences of a pattern P in O(|P| Iog2n(logεn+log|Σ|)+occlog1+εn) time, while insertion or deletion of a substring of length y can be done in O((y+√(n)) Iog2+ε n) amortized tune, where 0<ε≤1. The core part of our data structure is based on the recent work on compressed suffix trees (CST) and compressed suffix arrays (CSA).
Keywords :
data compression; indexing; text analysis; tree data structures; compressed index; compressed suffix arrays; compressed suffix trees; data structure; dynamic text; Computer science; DNA; Data compression; Data engineering; Entropy; Error correction; Indexing; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2004. Proceedings. DCC 2004
ISSN :
1068-0314
Print_ISBN :
0-7695-2082-0
Type :
conf
DOI :
10.1109/DCC.2004.1281455
Filename :
1281455
Link To Document :
بازگشت