Title :
ProSVM and ProK-means: Novel methods for promoter prediction
Author :
Arslan, Huseyin ; Ilguner, Y. ; Can, Tolga
Author_Institution :
Comput. Eng., METU, Ankara, Turkey
Abstract :
It is important to identify promoter regions to improve genome annotation and understand transcriptional regulation. In order to identify such regions accurately, transcription start sites (TSS) need to be identified correctly. Looking at the current genome annotation projects, it is not yet a common solution for the problem which is about identification of the transcription initiation regions. There are some drawbacks of the current methods which identify the core promoter regions. First, most of such methods require huge amounts of training data. Second, they are similar to black box methods, so output predictions are difficult to interpret. In this work, for identification of core promoter regions, we propose a supervised and an unsupervised method. We use support vector machines as a supervised method and k-means as an unsupervised method using physical properties of DNA sequences. Finally, we evaluate and compare our results with ProSOM [1] results. We show that ProSVM is able to achieve much higher recall rates compared to ProSOM and, therefore, is more accurate compared to ProSOM overall.
Keywords :
DNA; bioinformatics; biological techniques; cellular biophysics; genetics; genomics; molecular biophysics; support vector machines; DNA sequences; black box methods; core promoter regions; genome annotation; proSOM; proSVM technique; prok-means technique; promoter prediction; support vector machines; transcription initiation regions; transcription start sites; transcriptional regulation; unsupervised method; DNA; Genomics; Kernel; Polynomials; Proteins; Support vector machines; Training;
Conference_Titel :
Health Informatics and Bioinformatics (HIBIT), 2011 6th International Symposium on
Conference_Location :
Izmir
Print_ISBN :
978-2-4673-4394-4
DOI :
10.1109/HIBIT.2011.6450810