Title :
Algorithms for Genome-Scale Phylogenetics Using Gene Tree Parsimony
Author :
Bansal, Mukul S. ; Eulenstein, Oliver
Author_Institution :
Comput. Sci. & Artificial Intell. Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
Abstract :
The use of genomic data sets for phylogenetics is complicated by the fact that evolutionary processes such as gene duplication and loss, or incomplete lineage sorting (deep coalescence) cause incongruence among gene trees. One well-known approach that deals with this complication is gene tree parsimony, which, given a collection of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, a lack of efficient algorithms has limited the use of this approach. Here, we present efficient algorithms for SPR and TBR-based local search heuristics for gene tree parsimony under the 1) duplication, 2) loss, 3) duplication-loss, and 4) deep coalescence reconciliation costs. These novel algorithms improve upon the time complexities of previous algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. Our algorithms provide a substantial improvement in runtime and scalability compared to previous implementations and enable large-scale gene tree parsimony analyses using any of the four reconciliation costs. Our algorithms have been implemented in the software packages DupTree and iGTP, and have already been used to perform several compelling phylogenetic studies.
Keywords :
evolution (biological); genetics; genomics; software packages; trees (mathematics); DupTree software packages; SPR-based local search heuristics; TBR-based local search heuristics; deep coalescence reconciliation; gene duplication-loss; gene tree parsimony; genome-scale phylogenetics; genomic data sets; iGTP software packages; Algorithm design and analysis; Bioinformatics; Complexity theory; Genomics; Phylogeny; Search problems; Vegetation; Gene tree parsimony; gene duplication; gene loss; incomplete lineage sorting; minimizing deep coalescences (MDC); phylogenetics; phylogenomics;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2013.103