Title :
Issues with the PipeAlign phylogenomics toolkit in identifying protein subfamilies
Author :
Kehyayan, Christine ; Butler, Gregory
Author_Institution :
Dept. of Comput. Sci., Concordia Univ., Montreal, QC, Canada
Abstract :
Automated protein function annotation is extremely important in computational biology for its low cost. Standard sequence similarity comparison methods for annotation have limited specificity in identifying orthologs and paralogs. Phylogenomic methods are gaining popularity for their role in identifying orthologs and paralogs with the help of evolutionary information and sequence data. Pipelines have been developed for phylogenomic classification of proteins. Two such pipelines are PhyloFacts and PipeAlign. Given a protein of interest, these pipelines identify functional subfamilies for the protein superfamily. Subfamilies hold orthologs and paralogs and can later be used to identify orthologous groups. We evaluate the performance of PipeAlign with respect to both consistency in the generated subfamilies and phylogeny. We use the predefined subfamilies of PhyloFacts as a reference to compare the generated subfamilies of related reference sequences in PipeAlign. In the consistency analysis, we compare the compositions of the generated functional subfamilies with different related reference sequences, and use the predefined PhyloFacts subfamilies for the corresponding sequences as a measure of consistency. In the phylogenetic analysis, we compare the evolutionary distances of the members of the same and different generated subfamilies from PipeAlign.
Keywords :
biology computing; evolution (biological); genetics; genomics; molecular biophysics; proteins; automated protein function; computational biology; ortholog identification; paralog identification; phylogenomic method; pipealign phylogenomics toolkit; protein subfamilies identification; sequence similarity comparison method; Availability; Clustering methods; Computational biology; Cost function; Databases; Hidden Markov models; Phylogeny; Pipelines; Protein engineering; Sequences;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
Conference_Location :
Montreal, QC
Print_ISBN :
978-1-4244-6766-2
DOI :
10.1109/CIBCB.2010.5510344