DocumentCode
659608
Title
Parallel and memory-efficient Burrows-Wheeler transform
Author
Hayashi, Shin´ichiro ; Taura, Koichi
Author_Institution
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
43
Lastpage
50
Abstract
In large scale applications of genome analysis, compressed indexes are very useful to avoid the memory usage imposed by suffix arrays. FM-Index is one of the most important such compressed indexes. While it is very memory efficient, its construction requires much larger working memory during construction than the final index itself. Specifically, the Burrows-Wheeler transform (BWT), which is computed on its way to constructing FM-Index, requires a suffix array, which is larger than BWT and FM-Index. To address this, there have been many efforts to compute the Burrows-Wheeler in a smaller space, one of the fastest of which achieves a linear time complexity. Since it is unlikely that any sequential algorithm achieves a sublinear time complexity, the next logical step is a parallel algorithm that achieves a sublinear critical path with a space requirement smaller than suffix arrays. In this paper, we propose such an algorithm. It is based on a divide-and-conquer algorithm, which can be efficiently executed with task parallel models. We evaluated our algorithm on human genome data and obtained almost the same speed with BWT-IS, which is the known fastest algorithm.
Keywords
biology computing; computational complexity; genomics; parallel algorithms; transforms; BWT; BWT-IS algorithm; Burrows-Wheeler transform; FM-Index; compressed indexes; divide-and-conquer algorithm; genome analysis; human genome data; linear time complexity; parallel algorithm; sequential algorithm; space requirement; suffix arrays; task parallel models; Arrays; Bioinformatics; Genomics; Indexes; Merging; Time complexity; Transforms; Burrows-Wheeler transform; parallel computing; suffix array;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691757
Filename
6691757
Link To Document