• DocumentCode
    3601917
  • Title

    An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification

  • Author

    Esfahani, Mohammad Shahrokh ; Dougherty, Edward R.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
  • Volume
    12
  • Issue
    6
  • fYear
    2015
  • Firstpage
    1304
  • Lastpage
    1321
  • Abstract
    Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
  • Keywords
    biochemistry; cellular biophysics; genomics; molecular biophysics; optimisation; probability; proteins; Dirichlet distribution; discrete phenotype classification; error estimation; gene-protein signaling pathways; genomic data; genomic setting; mammalian cell cycle; mathematical tools; multinomial distribution; negatively impact classifier design; optimal Bayesian classifier; optimization paradigms; optimization-based framework; p53 pathway model; probabilistic structure; training data; Bayes methods; Bioinformatics; Computational biology; Genomics; Proteins; Phenotype classification; biological pathways; optimal Bayesian classifier; prior probability construction; regularized expected mean log-likelihood;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2015.2424407
  • Filename
    7089209