Title :
Parallel and memory-efficient Burrows-Wheeler transform
Author :
Hayashi, Shin´ichiro ; Taura, Koichi
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
Abstract :
In large scale applications of genome analysis, compressed indexes are very useful to avoid the memory usage imposed by suffix arrays. FM-Index is one of the most important such compressed indexes. While it is very memory efficient, its construction requires much larger working memory during construction than the final index itself. Specifically, the Burrows-Wheeler transform (BWT), which is computed on its way to constructing FM-Index, requires a suffix array, which is larger than BWT and FM-Index. To address this, there have been many efforts to compute the Burrows-Wheeler in a smaller space, one of the fastest of which achieves a linear time complexity. Since it is unlikely that any sequential algorithm achieves a sublinear time complexity, the next logical step is a parallel algorithm that achieves a sublinear critical path with a space requirement smaller than suffix arrays. In this paper, we propose such an algorithm. It is based on a divide-and-conquer algorithm, which can be efficiently executed with task parallel models. We evaluated our algorithm on human genome data and obtained almost the same speed with BWT-IS, which is the known fastest algorithm.
Keywords :
biology computing; computational complexity; genomics; parallel algorithms; transforms; BWT; BWT-IS algorithm; Burrows-Wheeler transform; FM-Index; compressed indexes; divide-and-conquer algorithm; genome analysis; human genome data; linear time complexity; parallel algorithm; sequential algorithm; space requirement; suffix arrays; task parallel models; Arrays; Bioinformatics; Genomics; Indexes; Merging; Time complexity; Transforms; Burrows-Wheeler transform; parallel computing; suffix array;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691757