DocumentCode :
2681955
Title :
Partial profile alignment kernels for protein classification
Author :
Ngo, Thanh ; Kuang, Rui
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
fYear :
2009
fDate :
17-21 May 2009
Firstpage :
1
Lastpage :
4
Abstract :
Remote homology detection and fold recognition are the central problems in protein classification. In real applications, kernel algorithms that are both accurate and efficient are required for classification of large databases. We explore a class of partial profile alignment kernels to be used with support vector machines (SVMs) for remote homology detection and fold recognition. While existing profile-based kernels use the whole profiles to determine the similarity between pairs of proteins, the partial profile alignment kernels are derived from part of the position specific scoring matrices (PSSMs) in the profiles for alignment. Specifically, at each position in the PSSM, only amino acids in the mutation neighborhood of the corresponding amino acid in the original protein sequence are considered for alignment to remove noise and improve computing efficiency. Our experiments on SCOP bench datasets show that the partial profile alignment kernels achieved overall better classification results for both fold recognition and remote homology detection than profile kernels and profile-alignment kernels. In addition, our algorithm using only a fraction of the profiles saves the cost of computing the kernels significantly, compared to the full-profile alignment methods.
Keywords :
bioinformatics; matrix algebra; molecular biophysics; pattern classification; proteins; support vector machines; PSSM; SCOP bench datasets; SVM; amino acid mutation neighborhood; fold recognition; kernel algorithms; partial profile alignment; position specific scoring matrices; protein sequence classification; remote homology detection; support vector machines; Amino acids; Computer science; Databases; Frequency; Genetic mutations; Hidden Markov models; Kernel; Matrices; Protein engineering; Protein sequence;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genomic Signal Processing and Statistics, 2009. GENSIPS 2009. IEEE International Workshop on
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4244-4761-9
Electronic_ISBN :
978-1-4244-4762-6
Type :
conf
DOI :
10.1109/GENSIPS.2009.5174328
Filename :
5174328
Link To Document :
بازگشت