Title :
Phylogenetic Analysis Using MapReduce Programming Model
Author :
G.M. Siddesh;K.G. Srinivasa;Ishank Mishra;Abhinav Anurag;Eklavya Uppal
Author_Institution :
Dept. of Inf. Sci. &
fDate :
5/1/2015 12:00:00 AM
Abstract :
Phylogenetic analysis has become essential part of research on the evolutionary tree of life. Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic distance" between the sequences being classified, and therefore they require multiple sequence alignments as an input. Distance methods attempt to construct an all-to-all matrix from the sequence query set describing the distance between each sequence pair. Dynamic algorithms like Needleman-Wunsch algorithm (NWA) and Smith-Waterman algorithm (SWA) produce accurate alignments, but are computation intensive and are limited to the number and size of the sequences. The paper focuses towards optimizing phylogenetic analysis of large quantities of data using the hadoop Map/Reduce programming model. The proposed approach depends on NWA to produce sequence alignments and neighbor-joining methods, specifically UPGMA (Unweighted Pair Group Method with Arithmetic mean) to produce rooted trees. The experimental results demonstrate that proposed solution achieve significant improvements with respect to performance and throughput. The dynamic nature of the NWA coupled with data and computational parallelism of hadoop MapReduce programming model improves the throughput and accuracy of sequence alignment. Hence the proposed approach intends to carve out a new methodology towards optimizing phylogenetic analysis by achieving significant performance gain.
Keywords :
"Phylogeny","Clustering algorithms","Algorithm design and analysis","Heuristic algorithms","Programming","Computational modeling","Analytical models"
Conference_Titel :
Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International
DOI :
10.1109/IPDPSW.2015.57