DocumentCode
3032004
Title
Issues with the PipeAlign phylogenomics toolkit in identifying protein subfamilies
Author
Kehyayan, Christine ; Butler, Gregory
Author_Institution
Dept. of Comput. Sci., Concordia Univ., Montreal, QC, Canada
fYear
2010
fDate
2-5 May 2010
Firstpage
1
Lastpage
6
Abstract
Automated protein function annotation is extremely important in computational biology for its low cost. Standard sequence similarity comparison methods for annotation have limited specificity in identifying orthologs and paralogs. Phylogenomic methods are gaining popularity for their role in identifying orthologs and paralogs with the help of evolutionary information and sequence data. Pipelines have been developed for phylogenomic classification of proteins. Two such pipelines are PhyloFacts and PipeAlign. Given a protein of interest, these pipelines identify functional subfamilies for the protein superfamily. Subfamilies hold orthologs and paralogs and can later be used to identify orthologous groups. We evaluate the performance of PipeAlign with respect to both consistency in the generated subfamilies and phylogeny. We use the predefined subfamilies of PhyloFacts as a reference to compare the generated subfamilies of related reference sequences in PipeAlign. In the consistency analysis, we compare the compositions of the generated functional subfamilies with different related reference sequences, and use the predefined PhyloFacts subfamilies for the corresponding sequences as a measure of consistency. In the phylogenetic analysis, we compare the evolutionary distances of the members of the same and different generated subfamilies from PipeAlign.
Keywords
biology computing; evolution (biological); genetics; genomics; molecular biophysics; proteins; automated protein function; computational biology; ortholog identification; paralog identification; phylogenomic method; pipealign phylogenomics toolkit; protein subfamilies identification; sequence similarity comparison method; Availability; Clustering methods; Computational biology; Cost function; Databases; Hidden Markov models; Phylogeny; Pipelines; Protein engineering; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
Conference_Location
Montreal, QC
Print_ISBN
978-1-4244-6766-2
Type
conf
DOI
10.1109/CIBCB.2010.5510344
Filename
5510344
Link To Document