DocumentCode :
83771
Title :
Biological Sequence Classification with Multivariate String Kernels
Author :
Kuksa, Pavel P.
Author_Institution :
Machine Learning Dept., NEC Labs. America, Inc., Princeton, NJ, USA
Volume :
10
Issue :
5
fYear :
2013
fDate :
Sept.-Oct. 2013
Firstpage :
1201
Lastpage :
1210
Abstract :
String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.
Keywords :
DNA; biochemistry; bioinformatics; classification; data analysis; learning (artificial intelligence); molecular biophysics; molecular configurations; proteins; DNA sequences; amino acid physicochemical descriptors; amino acid sequences; biological sequence classification; biological sequence profiles; discrete 1D string data; fold prediction; multiclass biological sequence classification problems; multivariate string kernels; protein sequence classification tasks; protein superfamily; remote homology detection; sequence analysis; sequential data analysis; string kernel-based machine learning; structured data analysis; Amino acids; Kernel; Machine learning; Protein sequence; Quantization; Sequential analysis; Biological sequence classification; kernel methods;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2013.15
Filename :
6475934
Link To Document :
بازگشت