Title :
Novel Parallelization Schemes for Large-Scale Likelihood-based Phylogenetic Inference
Author :
Stamatakis, Alexandros ; Aberer, Andre J.
Author_Institution :
Exelixis Lab., Heidelberg Inst. for Theor. Studies, Heidelberg, Germany
Abstract :
The molecular data avalanche generated by novel wet-lab sequencing technologies allows for reconstructing phylogenies (evolutionary trees) using hundreds of complete genomes as input data. Therefore, scalable codes are required to infer trees on these data under likelihood-based models of molecular evolution. We recently introduced a checkpointable and scalable MPI-based code for this purpose called RAxML-Light and are currently using it for several real-world data analysis projects. It turned out that the scalability of RAxML-Light is nonetheless still limited because of the fork-join parallelization approach that is deployed. To this end, we introduce a novel, generally applicable, approach to computing the phylogenetic likelihood in parallel on whole-genome datasets and implement it in ExaML (Exascale Maximum Likelihood). ExaML executes up to 3.2 times faster than RAxML-Light because of the more efficient parallelization and communication scheme, while implementing exactly the same tree search algorithm. Moreover, the new parallelization approach exhibits lower code complexity and a more appropriate structure for implementing fault tolerance with respect to hardware failures.
Keywords :
application program interfaces; biology computing; data analysis; evolution (biological); fault tolerant computing; genetics; message passing; parallel processing; tree searching; ExaML; RAxML-Light scalability; checkpointable MPI-based code; code complexity; communication scheme; evolutionary trees; exascale maximum likelihood; fault tolerance; fork-join parallelization approach; hardware failures; large-scale likelihood-based phylogenetic inference; likelihood-based models; molecular data avalanche; molecular evolution; parallelization scheme; phylogenies reconstructing; real-world data analysis projects; scalable MPI-based code; scalable codes; tree search algorithm; wet-lab sequencing technologies; whole-genome datasets; Bayes methods; Computational modeling; Phylogeny; Random access memory; Sequential analysis; Shape; Standards; MPI; likelihood; parallelization; phylogenetics;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4673-6066-1
DOI :
10.1109/IPDPS.2013.70