DocumentCode :
3239621
Title :
Bayesian multivariate Poisson model for RNA-seq classification
Author :
Knight, Joseph ; Ivanov, Ivan ; Dougherty, Edward
Author_Institution :
Dept. of Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
fYear :
2013
fDate :
17-19 Nov. 2013
Firstpage :
96
Lastpage :
97
Abstract :
High dimensional data and small samples make genomic/proteomic classifier design and error estimation virtually impossible without the use of prior information [1]. Dalton and Dougherty utilize prior biological knowledge via a Bayesian approach that considers a prior distribution on an uncertainty class of feature-label distributions [2], [3]. While their general framework is very broad, the focus their attention on multinomial and Gaussian models, for which they derive closed-form solutions of the minimum mean squared error (MMSE) error estimate, the MSE of the error estimate, and an optimal Bayesian classifier (OBC) classifier relative to the prior distribution. Sequencing datasets consist of the number of reads found to map to specific regions of a reference genome. As such, they are often modeled with a discrete distribution, such as the Poisson. For this reason, Gaussian and multinomial distributions are not ideal for sequence-based datasets. Thus, we introduce a multivariate Poisson model (MP) and the associated MP OBC for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior classification performance for more complex synthetic datasets and comparable performance to the top classifiers in other simpler synthetic datasets.
Keywords :
Bayes methods; Markov processes; Monte Carlo methods; Poisson distribution; RNA; biology computing; genomics; least mean squares methods; pattern classification; proteomics; Bayesian multivariate Poisson model; Gaussian distributions; Gaussian models; MCMC approach; MMSE; Monte Carlo Markov chain approach; RNA-seq classification; associated MP OBC; closed-form solutions; complex synthetic datasets; discrete distribution; error estimation; feature-label distributions; genomic-proteomic classifier design; high dimensional data; minimum mean squared error; multinomial distributions; multinomial models; optimal Bayesian classifier classifier; reference genome; sequencing datasets; Bayes methods; Bioinformatics; Biological system modeling; Data models; Gene expression; Genomics; Sequential analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genomic Signal Processing and Statistics (GENSIPS), 2013 IEEE International Workshop on
Conference_Location :
Houston, TX
Print_ISBN :
978-1-4799-3461-4
Type :
conf
DOI :
10.1109/GENSIPS.2013.6735946
Filename :
6735946
Link To Document :
بازگشت