DocumentCode :
2989935
Title :
High performance implementation of planted motif problem using suffix trees
Author :
Dasari, Naga Shailaja ; Ranjan, Desh ; Zubair, Mohammad
Author_Institution :
Old Dominion Univ., Norfolk, VA, USA
fYear :
2011
fDate :
4-8 July 2011
Firstpage :
200
Lastpage :
206
Abstract :
In this paper we present a high performance implementation of suffix tree based solution to the planted motif problem on two different parallel architectures: NVIDIA GPU and Intel Multicore machines. An (I,d) planted motif problem(PMP) is defined as: Given a sequence of n DNA sequences, each of length L, find M, the set of sequences (or motifs) of length I which have at least one d-neighbor in each of the n sequences. Here, a d-neighbor of a sequence is a sequence of same length that differs in at-most d positions. PMP is a well studied problem in computational biology. It is useful in developing methods for finding transcription factor binding sites, sequence classification and for building phylogenetic trees. The problem is computationally challenging to solve, for example a (19,7) PMP takes 9.9 hours on a sequential machine. Many approaches to solve planted motif problem can be found in literature. One approach is based on use of suffix tree data structure. Though suffix tree based methods are the most efficient ones for solving large planted motif problems on sequential machines, they are quite difficult to parallelize. We present suffix tree based parallel solutions for PMP on NVIDIA GPU and Intel Multicore architectures that are efficient and scalable. The solutions are based on a suffix tree algorithm previously presented but use extensive adaptation to individual architectures to ensure that the implementations work efficiently and scale well.
Keywords :
DNA; biology computing; computer graphic equipment; coprocessors; genetics; multiprocessing systems; parallel architectures; tree data structures; DNA sequences; Intel multicore machines; NVIDIA GPU; computational biology; parallel architecture; phylogenetic trees; planted motif problem; sequence classification; sequential machines; suffix tree data structure; time 9.9 hr; transcription factor binding sites; Algorithm design and analysis; Graphics processing unit; Instruction sets; Multicore processing; Random access memory; BitBased; DNA; PMP; gSPELLER; mSPELLER; multicore; parallel;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2011 International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-61284-380-3
Type :
conf
DOI :
10.1109/HPCSim.2011.5999825
Filename :
5999825
Link To Document :
بازگشت