• DocumentCode
    1245569
  • Title

    Antisequential suffix sorting for BWT-based data compression

  • Author

    Baron, Dror ; Bresler, Yoram

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Rice Univ., Houston, TX, USA
  • Volume
    54
  • Issue
    4
  • fYear
    2005
  • fDate
    4/1/2005 12:00:00 AM
  • Firstpage
    385
  • Lastpage
    397
  • Abstract
    Suffix sorting requires ordering all suffixes of all symbols in an input sequence and has applications in running queries on large texts and in universal lossless data compression based on the Burrows Wheeler transform (BWT). We propose a new suffix lists data structure that leads to three fast, antisequential, and memory-efficient algorithms for suffix sorting. For a length-N input over a size-|X| alphabet, the worst-case complexities of these algorithms are Θ(N2), O(|X|N log(N/|X|)), and O(N√|X|log(N/|X|)), respectively. Furthermore, simulation results indicate performance that is competitive with other suffix sorting methods. In contrast, the suffix sorting methods that are fastest on standard test corpora have poor worst-case performance. Therefore, in comparison with other suffix sorting methods, suffix lists offer a useful trade off between practical performance and worst-case behavior. Another distinguishing feature of suffix lists is that these algorithms are simple; some of them can be implemented in VLSI. This could accelerate suffix sorting by at least an order of magnitude and enable high-speed BWT-based compression systems.
  • Keywords
    VLSI; computational complexity; data compression; sorting; transforms; BWT-based data compression; Burrows Wheeler transform; VLSI; antisequential suffix sorting; memory-efficient algorithm; source coding; worst-case complexity; Acceleration; Compression algorithms; Data compression; Data structures; Distributed computing; Hardware; Sorting; Source coding; Testing; Very large scale integration; Index Terms- Burrows Wheeler transform; VLSI.; data compression; source coding; suffix sorting;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2005.56
  • Filename
    1401858