• DocumentCode
    610382
  • Title

    On shortest unique substring queries

  • Author

    Jian Pei ; Wu, W.C.-H. ; Mi-Yen Yeh

  • Author_Institution
    Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    937
  • Lastpage
    948
  • Abstract
    In this paper, we tackle a novel type of interesting queries - shortest unique substring queries. Given a (long) string S and a query point q in the string, can we find a shortest substring containing q that is unique in S? We illustrate that shortest unique substring queries have many potential applications, such as information retrieval, bioinformatics, and event context analysis. We develop efficient algorithms for online query answering. First, we present an algorithm to answer a shortest unique substring query in O(n) time using a suffix tree index, where n is the length of string S. Second, we show that, using O(n·h) time and O(n) space, we can compute a shortest unique substring for every position in a given string, where h is variable theoretically in O(n) but on real data sets often much smaller than n and can be treated as a constant. Once the shortest unique substrings are pre-computed, shortest unique substring queries can be answered online in constant time. In addition to the solid algorithmic results, we empirically demonstrate the effectiveness and efficiency of shortest unique substring queries on real data sets.
  • Keywords
    query processing; bioinformatics; event context analysis; information retrieval; online query answering; shortest unique substring queries; suffix tree index; Algorithm design and analysis; Bioinformatics; Context; Indexes; Organisms; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544887
  • Filename
    6544887