Title :
Consensus methods using phylogenetic databases
Author :
Kulkarni, Mahesh M. ; Moret, Bernard M E
Author_Institution :
Dept. of Comput. Sci., New Mexico Univ., Albuquerque, NM, USA
Abstract :
With the increasing use and size of phytogenies, the output of reconstruction programs must be stored for future reference, in which case post-tree analyses such as consensus must be run from a database. We set out to determine whether such analyses can be run at a reasonable cost; we chose consensus (which summarizes the information from many trees into a single tree) because of its general applicability and because it creates a severe demand on the database by requiring examination of every edge of every tree. We preprocess the data (trees) to create tables that support consensus computations, using our own extensions to the PhyloDB schema of Nakhleh et al. For each of the three consensus methods (strict, majority, and greedy), we compare the database computation with the memory-resident computation using the Phylip consensus programs. We use a large selection of datasets of varying sizes (up to 1,000 trees of up to 1,500 taxa each) and of varying degrees of commonality. The computations from the database are very practical: they often run faster, and never run more than 5 times slower, than the computations in main memory using Phylip. The additional storage costs are easily handled by any database system, while the preprocessing costs remain reasonable. Thus suitable preprocessing of phylogenetic data allows post-tree analyses to be run directly from the database at much the same cost as current memory-resident analyses.
Keywords :
biology computing; evolution (biological); genetics; tree data structures; Phylip consensus program; PhyloDB schema; data preprocessing; database computation; memory-resident computation; phylogenetic database; post-tree analysis; Computer science; Costs; Data analysis; Database systems; History; Information analysis; Maximum likelihood estimation; Organisms; Phylogeny; Relational databases;
Conference_Titel :
Computational Systems Bioinformatics Conference, 2005. Workshops and Poster Abstracts. IEEE
Print_ISBN :
0-7695-2442-7
DOI :
10.1109/CSBW.2005.43