DocumentCode
83771
Title
Biological Sequence Classification with Multivariate String Kernels
Author
Kuksa, Pavel P.
Author_Institution
Machine Learning Dept., NEC Labs. America, Inc., Princeton, NJ, USA
Volume
10
Issue
5
fYear
2013
fDate
Sept.-Oct. 2013
Firstpage
1201
Lastpage
1210
Abstract
String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.
Keywords
DNA; biochemistry; bioinformatics; classification; data analysis; learning (artificial intelligence); molecular biophysics; molecular configurations; proteins; DNA sequences; amino acid physicochemical descriptors; amino acid sequences; biological sequence classification; biological sequence profiles; discrete 1D string data; fold prediction; multiclass biological sequence classification problems; multivariate string kernels; protein sequence classification tasks; protein superfamily; remote homology detection; sequence analysis; sequential data analysis; string kernel-based machine learning; structured data analysis; Amino acids; Kernel; Machine learning; Protein sequence; Quantization; Sequential analysis; Biological sequence classification; kernel methods;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2013.15
Filename
6475934
Link To Document