• DocumentCode
    3001749
  • Title

    Inference of Huge Trees under Maximum Likelihood

  • Author

    Izquierdo-Carrasco, Fernando ; Stamatakis, Alexandros

  • Author_Institution
    Heidelberg Inst. for Theor. Studies, Heidelberg, Germany
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    2490
  • Lastpage
    2493
  • Abstract
    The wide adoption of Next-Generation Sequencing technologies in recent years has generated an avalanche of genetic data, which poses new challenges for large-scale maximum likelihood-based phylogenetic analyses. Improving the scalability of search algorithms and reducing the high memory requirements for computing the likelihood represent major computational challenges in this context. We have introduced methods for solving these key problems and provided respective proof-of-concept implementations. Moreover, we have developed a new tree search strategy that can reduce run times by more than 50% while yielding equally good trees (in the statistical sense). To reduce memory requirements, we explored the applicability of external memory (out-of-core) algorithms as well as a concept that trades memory for additional computations in the likelihood function. The latter concept, only induces a surprisingly small increase in overall execution times. When trading 50% of the required RAM for additional computations, the average execution time increase- because of additional computations-amounts to only 15%. All concepts presented here are sufficiently generic such that they can be applied to all programs that rely on the phylogenetic likelihood function. Thereby, the approaches we have developed will contribute to enable large-scale inferences of whole-genome phylogenies.
  • Keywords
    biology computing; maximum likelihood estimation; tree searching; RAM; huge trees; large-scale inference; large-scale maximum likelihood; next-generation sequencing technology; phylogenetic analysis; phylogenetic likelihood function; search algorithm scalability; tree search strategy; whole-genome phylogenies; Algorithm design and analysis; DNA; Memory management; Phylogeny; Random access memory; Vectors; Vegetation; Phylogenetic likelihood function; RAxML; memory requirements; memory vs. runtime trade-offs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-0974-5
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2012.309
  • Filename
    6270876