Issues with the PipeAlign phylogenomics toolkit in identifying protein subfamilies

Author

Kehyayan, Christine ; Butler, Gregory

Author_Institution

Dept. of Comput. Sci., Concordia Univ., Montreal, QC, Canada

fYear

2010

fDate

2-5 May 2010

Firstpage

1

Lastpage

6

Abstract

Automated protein function annotation is extremely important in computational biology for its low cost. Standard sequence similarity comparison methods for annotation have limited specificity in identifying orthologs and paralogs. Phylogenomic methods are gaining popularity for their role in identifying orthologs and paralogs with the help of evolutionary information and sequence data. Pipelines have been developed for phylogenomic classification of proteins. Two such pipelines are PhyloFacts and PipeAlign. Given a protein of interest, these pipelines identify functional subfamilies for the protein superfamily. Subfamilies hold orthologs and paralogs and can later be used to identify orthologous groups. We evaluate the performance of PipeAlign with respect to both consistency in the generated subfamilies and phylogeny. We use the predefined subfamilies of PhyloFacts as a reference to compare the generated subfamilies of related reference sequences in PipeAlign. In the consistency analysis, we compare the compositions of the generated functional subfamilies with different related reference sequences, and use the predefined PhyloFacts subfamilies for the corresponding sequences as a measure of consistency. In the phylogenetic analysis, we compare the evolutionary distances of the members of the same and different generated subfamilies from PipeAlign.

Keywords

biology computing; evolution (biological); genetics; genomics; molecular biophysics; proteins; automated protein function; computational biology; ortholog identification; paralog identification; phylogenomic method; pipealign phylogenomics toolkit; protein subfamilies identification; sequence similarity comparison method; Availability; Clustering methods; Computational biology; Cost function; Databases; Hidden Markov models; Phylogeny; Pipelines; Protein engineering; Sequences;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on

Conference_Location

Montreal, QC

Print_ISBN

978-1-4244-6766-2

Type

conf

DOI

10.1109/CIBCB.2010.5510344

Filename

5510344