DocumentCode
610382
Title
On shortest unique substring queries
Author
Jian Pei ; Wu, W.C.-H. ; Mi-Yen Yeh
Author_Institution
Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
fYear
2013
fDate
8-12 April 2013
Firstpage
937
Lastpage
948
Abstract
In this paper, we tackle a novel type of interesting queries - shortest unique substring queries. Given a (long) string S and a query point q in the string, can we find a shortest substring containing q that is unique in S? We illustrate that shortest unique substring queries have many potential applications, such as information retrieval, bioinformatics, and event context analysis. We develop efficient algorithms for online query answering. First, we present an algorithm to answer a shortest unique substring query in O(n) time using a suffix tree index, where n is the length of string S. Second, we show that, using O(n·h) time and O(n) space, we can compute a shortest unique substring for every position in a given string, where h is variable theoretically in O(n) but on real data sets often much smaller than n and can be treated as a constant. Once the shortest unique substrings are pre-computed, shortest unique substring queries can be answered online in constant time. In addition to the solid algorithmic results, we empirically demonstrate the effectiveness and efficiency of shortest unique substring queries on real data sets.
Keywords
query processing; bioinformatics; event context analysis; information retrieval; online query answering; shortest unique substring queries; suffix tree index; Algorithm design and analysis; Bioinformatics; Context; Indexes; Organisms; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location
Brisbane, QLD
ISSN
1063-6382
Print_ISBN
978-1-4673-4909-3
Electronic_ISBN
1063-6382
Type
conf
DOI
10.1109/ICDE.2013.6544887
Filename
6544887
Link To Document