Title :
A novel kernel for sequences classification
Author :
Yan, Chun ; Wang, Zheng-zhi ; Gao, Qing-Bin ; Du, Yao-Hua
Author_Institution :
Inst. of Autom., Nat. Univ. of Defense Technol., Changsha, China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
In this paper, a novel kernel, called position weight subsequences kernel (PWSK), is introduced for identifying gene sequences. String subsequences kernel (SSK), which is based on string alignment, performs well for text categorization problems. For gene sequences identification, not only the comprised subsequences but also the positions of them are important. To integrate the position information, the decay factor of match position in SSK was replaced by position weight in PWSK. By doing this, PWSK can integrate both the content and position information of subsequences. This kernel was used for splice site identification and the experimental results demonstrated its efficiency. The sensitivities for donor sites and acceptor sites are 94% and 95%, respectively, and the specificities for them are 96% and 96%. The performance is better than that of SSK. The reason is that the content of sequence alone is not enough to interpret splicing, and it is necessary to include the position information. Compared with the existing approaches, PWSK achieves better sensitivities for both the donor sites and the acceptor sites, and the specificities of them are comparable.
Keywords :
biology computing; genetics; pattern classification; support vector machines; gene sequence identification; position weight subsequences kernel; sequence classification; splice site identification; string alignment; string subsequences kernel; support vector machines; text categorization problems; Bioinformatics; Dynamic programming; Genomics; Kernel; Pulse width modulation; Splicing; Support vector machine classification; Support vector machines; Testing; Text categorization;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598840