DocumentCode
16436
Title
An Algorithm for Constructing Principal Geodesics in Phylogenetic Treespace
Author
Nye, Tom M. W.
Author_Institution
Sch. of Math. & Stat., Newcastle Univ., Newcastle upon Tyne, UK
Volume
11
Issue
2
fYear
2014
fDate
March-April 2014
Firstpage
304
Lastpage
315
Abstract
Most phylogenetic analyses result in a sample of trees, but summarizing and visualizing these samples can be challenging. Consensus trees often provide limited information about a sample, and so methods such as consensus networks, clustering and multidimensional scaling have been developed and applied to tree samples. This paper describes a stochastic algorithm for constructing a principal geodesic or line through treespace which is analogous to the first principal component in standard principal components analysis. A principal geodesic summarizes the most variable features of a sample of trees, in terms of both tree topology and branch lengths, and it can be visualized as an animation of smoothly changing trees. The algorithm performs a stochastic search through parameter space for a geodesic which minimizes the sum of squared projected distances of the data points. This procedure aims to identify the globally optimal principal geodesic, though convergence to locally optimal geodesics is possible. The methodology is illustrated by constructing principal geodesics for experimental and simulated data sets, demonstrating the insight into samples of trees that can be gained and how the method improves on a previously published approach. A java package called GeoPhytter for constructing and visualizing principal geodesics is freely available from www.ncl.ac.uk/ ntmwn/geophytter.
Keywords
biology computing; data visualisation; differential geometry; genetics; genomics; pattern clustering; principal component analysis; stochastic processes; GeoPhytter; branch lengths; consensus networks; consensus trees; data points; first principal component; globally optimal principal geodesic; java package; locally optimal geodesics; multidimensional scaling; parameter space; pattern clustering; phylogenetic analyses; principal geodesics visualization; smoothly changing tree animation; squared projected distances; standard principal components analysis; stochastic algorithm; stochastic search; tree samples; tree topology; treespace; Bioinformatics; Computational biology; Measurement; Phylogeny; Principal component analysis; Topology; Vegetation; Phylogeny; principal components analysis; treespace;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2014.2309599
Filename
6755452
Link To Document