• DocumentCode
    3032004
  • Title

    Issues with the PipeAlign phylogenomics toolkit in identifying protein subfamilies

  • Author

    Kehyayan, Christine ; Butler, Gregory

  • Author_Institution
    Dept. of Comput. Sci., Concordia Univ., Montreal, QC, Canada
  • fYear
    2010
  • fDate
    2-5 May 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Automated protein function annotation is extremely important in computational biology for its low cost. Standard sequence similarity comparison methods for annotation have limited specificity in identifying orthologs and paralogs. Phylogenomic methods are gaining popularity for their role in identifying orthologs and paralogs with the help of evolutionary information and sequence data. Pipelines have been developed for phylogenomic classification of proteins. Two such pipelines are PhyloFacts and PipeAlign. Given a protein of interest, these pipelines identify functional subfamilies for the protein superfamily. Subfamilies hold orthologs and paralogs and can later be used to identify orthologous groups. We evaluate the performance of PipeAlign with respect to both consistency in the generated subfamilies and phylogeny. We use the predefined subfamilies of PhyloFacts as a reference to compare the generated subfamilies of related reference sequences in PipeAlign. In the consistency analysis, we compare the compositions of the generated functional subfamilies with different related reference sequences, and use the predefined PhyloFacts subfamilies for the corresponding sequences as a measure of consistency. In the phylogenetic analysis, we compare the evolutionary distances of the members of the same and different generated subfamilies from PipeAlign.
  • Keywords
    biology computing; evolution (biological); genetics; genomics; molecular biophysics; proteins; automated protein function; computational biology; ortholog identification; paralog identification; phylogenomic method; pipealign phylogenomics toolkit; protein subfamilies identification; sequence similarity comparison method; Availability; Clustering methods; Computational biology; Cost function; Databases; Hidden Markov models; Phylogeny; Pipelines; Protein engineering; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
  • Conference_Location
    Montreal, QC
  • Print_ISBN
    978-1-4244-6766-2
  • Type

    conf

  • DOI
    10.1109/CIBCB.2010.5510344
  • Filename
    5510344