DocumentCode
2145008
Title
Identification of mRNA poly(A) signal patterns
Author
Wu, Xiaohui ; Liu, Qi ; Tang, Meishuang ; Zhang, Huanghui ; Yao, Junfeng ; Ji, Guoli
Author_Institution
Dept. of Autom., Xiamen Univ., Xiamen, China
fYear
2011
fDate
15-18 June 2011
Firstpage
570
Lastpage
575
Abstract
The poly(A) signal patterns surrounding the poly(A) site in model plant Arabidopsis thaliana were generated, selected and verified. First, candidate nucleotide patterns of different signal regions were generated based on their conservatism, using the TFxIDF index of vector space model that is widely used in text categorization. Then, effective features were selected through a genetic algorithm based wrapper feature selection method. Finally, a boosting method called Adaboost.M1 was adopted to verify the feature subset by identifying poly(A) sites. The results showed that our feature selection method could significantly reduce the dimension of feature space to enhance the classifier performance to a large extent. Moreover, the selected features could be used to improve the parameters of the poly(A) site recognition model, thus enhanced the prediction accuracy greatly. This study will not only enhance our understanding of poly(A) signals, but also concisely show a poly(A) site recognition model by applying classifier on the feature space.
Keywords
adaptive systems; biology computing; feature extraction; learning (artificial intelligence); pattern recognition; text analysis; Adaboost.Ml; TFxIDF index; arabidopsis thaliana; genetic algorithm based wrapper feature selection method; mRNA poly(A) signal patterns identification; prediction accuracy enhancement; text categorization; vector space model; Accuracy; Classification algorithms; Genetic algorithms; Hidden Markov models; Predictive models; Tin; Training; Adaboost.M1; Poly(A) Signal; TFxIDF; Wrapper;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on
Conference_Location
Istanbul
Print_ISBN
978-1-61284-919-5
Type
conf
DOI
10.1109/INISTA.2011.5946151
Filename
5946151
Link To Document