DocumentCode
2890401
Title
Semi-supervised Learning of Alternatively Spliced Exons Using Co-training
Author
Tangirala, Karthik ; Caragea, Doina
Author_Institution
CIS Dept., Kansas State Univ., Manhattan, KS, USA
fYear
2011
fDate
12-15 Nov. 2011
Firstpage
243
Lastpage
246
Abstract
Alternative splicing is a phenomenon that gives rise to multiple mRNA transcripts from a single gene. It is believed that a large number of genes undergoes alternative splicing. Predicting alternative splicing events is a problem of great interest, as it can help the understanding of transcript diversity. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to learn accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi- supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider two sets of features for constructing views for the problem of predicting alternatively spliced exons: exonic splicing enhancers and intronic regulatory sequences. We use the Naive Bayes Multinomial algorithm as a base classifier in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone.
Keywords
Bayes methods; bioinformatics; genetics; learning (artificial intelligence); pattern classification; polynomials; alternative splicing event prediction; alternatively spliced exons prediction; classifier; co-training algorithm; data labeling; exonic splicing enhancer; genes; genomic data; intronic regulatory sequence; mRNA transcripts; naive Bayes multinomial algorithm; semisupervised learning; sequencing technology; supervised machine learning approach; Bioinformatics; Genomics; Kernel; Machine learning; Prediction algorithms; Splicing; Training; alternative splicing; alternatively spliced and constitutive exons; co-training; semi-supervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
Conference_Location
Atlanta, GA
Print_ISBN
978-1-4577-1799-4
Type
conf
DOI
10.1109/BIBM.2011.87
Filename
6120443
Link To Document