DocumentCode :
593660
Title :
The effect of representative training dataset selection on the classification performance of the promoter sequences
Author :
Yaman, A.G. ; Can, Tolga
Author_Institution :
Natural & Appl. Sci./Comput. Eng. Dept., METU, Ankara, Turkey
fYear :
2011
fDate :
2-5 May 2011
Firstpage :
55
Lastpage :
58
Abstract :
Promoter prediction is an important task for genome annotation. The aim of this study is to build a classification method for promoter prediction. Base-stacking energy values of dinucleotides are used for feature extraction and Support Vector Machines (SVMs) are used for classification. Human genome promoter sequences are used as the positive training data and three types of datasets are prepared as the negative data including intergenic and transcribed sequences. Best results are achieved by selecting equal number of random sequences from intergenic and transcribed sequences while preparing the negative datasets.
Keywords :
biology computing; feature extraction; genetics; genomics; macromolecules; molecular biophysics; molecular configurations; pattern classification; support vector machines; SVM; base-stacking energy values; classification performance; dinucleotides; feature extraction; genome annotation; human genome promoter sequences; intergenic sequences; negative datasets; promoter prediction; random sequences; representative training dataset selection effect; support vector machines; transcribed sequences; Bioinformatics; DNA; Feature extraction; Genomics; Sensitivity; Support vector machines; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Health Informatics and Bioinformatics (HIBIT), 2011 6th International Symposium on
Conference_Location :
Izmir
Print_ISBN :
978-2-4673-4394-4
Type :
conf
DOI :
10.1109/HIBIT.2011.6450809
Filename :
6450809
Link To Document :
بازگشت