Title :
Promoter prediction in eukaryotes using soft computing techniques
Author :
Premalatha, C. ; Aravindan, Chandrabose ; Kannan, K.
Author_Institution :
SASTRA Univ., Thanjavur, India
Abstract :
In molecular biology, in silico identification of eukaryotic promoters is a challenging task. Currently available classifiers generate either poor sensitivity or specificity. In this paper, we propose a support vector machine classifier, referred to as PSVM, to recognize the human pol.II promoters using markov model for extracting features representing k-mer frequency, along with features representing other transcription signals such as TATA box, GC box etc. This classifier is trained using data set comprising 1862 promoters and 1759 non promoters in human genome and takes only 12 parameters to classify a given sequence as promoter or not. Among the 20 verified promoters in human chromosome 22, PSVM recognizes 18. Also it successfully identifies all the 14 well annotated exons of human chromosome 22 as non promoters. When 90% of data is used to train PSVM, it yields a sensitivity of 93.55% and specificity of 98.86% which are significantly better than previously reported results and also those of online promoter prediction tools such as NNPP, ProScan, and TSSG. Thus, k-mer frequency represented by markov model of order k, TATA box, GC box, CAAT box, Init box, and CpG island can be a valuable combination of features for predicting eukaryotic pol.II promoters.
Keywords :
Markov processes; biology computing; molecular biophysics; pattern classification; support vector machines; CAAT box; CpG island; GC box; Init box; Markov model; NNPP; PSVM; ProScan; TATA box; TSSG; eukaryotic promoter prediction; feature extraction; human chromosome; human genome; k-mer frequency; molecular biology; online promoter prediction tools; soft computing techniques; support vector machine classifier; Bioinformatics; Feature extraction; Genomics; Humans; Markov processes; Support vector machines; Training;
Conference_Titel :
Recent Advances in Intelligent Computational Systems (RAICS), 2011 IEEE
Conference_Location :
Trivandrum
Print_ISBN :
978-1-4244-9478-1
DOI :
10.1109/RAICS.2011.6069368