DocumentCode :
3027533
Title :
A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation
Author :
Sadakane, Kunihiko
Author_Institution :
Dept. of Inf. Sci., Tokyo Univ., Japan
fYear :
1998
fDate :
30 Mar-1 Apr 1998
Firstpage :
129
Lastpage :
138
Abstract :
We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an array of indexes of suffixes is called a suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the Burrows-Wheeler (see Technical Report 124, Digital SRC Research Report, 1994) transformation in the block sorting text compression, therefore fast sorting algorithms are desired. We compare algorithms for making suffix arrays of Bentley-Sedgewick (see Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, p.360-9, 1997), Andersson-Nilsson (see 35th Symp. on Foundations of Computer Science, p.714-21, 1994) and Karp-Miller-Rosenberg (1972) and making suffix trees of Larsson (see Data Compression Conference, p.190-9, 1996) on the speed and required memory and propose a new algorithm which is fast and memory efficient by combining them. We also define a measure of difficulty of sorting suffixes: average match length. Our algorithm is effective when the average match length of a text is large, especially for large databases
Keywords :
data compression; query processing; sorting; string matching; tree data structures; word processing; Andersson-Nilsson array; Bentley-Sedgewick array; Burrows-Wheeler transformation; Karp-Miller-Rosenberg array; Larsson suffix trees; average match length; block sorting text compression; compact data structure; fast sorting algorithms; large databases; lexicographic order; memory efficient algorithm; on-line string searches; speed; suffix arrays; suffix sorting; suffix tree; Computer science; Data compression; Databases; Length measurement; Sorting; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1998. DCC '98. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-8186-8406-2
Type :
conf
DOI :
10.1109/DCC.1998.672139
Filename :
672139
Link To Document :
بازگشت