DocumentCode :
167336
Title :
Constructing Similarity Graphs from Large-Scale Biological Sequence Collections
Author :
Zola, Jaroslaw
Author_Institution :
Rutgers Discovery Inf. Inst., Rutgers Univ., Piscataway, NJ, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
500
Lastpage :
507
Abstract :
Detecting similar pairs in large biological sequence collections is one of the most commonly performed tasks in computational biology. With the advent of high throughput sequencing technologies the problem regained significance as data sets with millions of sequences became ubiquitous. This paper is an initial report on our parallel, distributed memory and sketching-based approach to constructing large-scale sequence similarity graphs. We develop load balancing techniques, derived from multi-way number partitioning and work stealing, to manage computational imbalance and ensure scalability on thousands of processors. Our experimental results show that the method is efficient, and can be used to analyze data sets with millions of DNA sequences in acceptable time limits.
Keywords :
biocomputing; data analysis; graph theory; resource allocation; DNA sequences; computational biology; data set analysis; large-scale biological sequence collections; large-scale sequence similarity graphs; load balancing techniques; multiway number partitioning; parallel distributed memory; processors; sketching-based approach; throughput sequencing technologies; work stealing; Biology; Indexes; Load management; Matrix decomposition; Program processors; Scalability; Silicon; load balancing; min-wise independent permutations; parallel computational biology; sequence similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
Type :
conf
DOI :
10.1109/IPDPSW.2014.63
Filename :
6969429
Link To Document :
بازگشت