Title :
PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM
Author :
Mak, Man-Wai ; Guo, Jian ; Kung, Sun-Yuan
Author_Institution :
Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong
Abstract :
The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method - PairProSVM - to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST, and the pairwise profile alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino acid compositions even if most of the homologous sequences have been removed. PairProSVM was evaluated on Huang and Li´s and Gardy et al.´s protein data sets. The overall accuracies on these data sets reach 75.3 percent and 91.9 percent, respectively, which are higher than or comparable to those obtained by sequence alignment and composition-based methods.
Keywords :
biochemistry; biology computing; cellular biophysics; genetics; molecular biophysics; pattern classification; proteins; support vector machines; Gardy et al.´s protein data set; Huang-and-Li´s protein data sets; PSI-BLAST; PairProSVM; amino acid compositions; functional annotations; homologous sequence alignment; local pairwise profile alignment; pairwise profile alignment scores; protein sequence profiles; protein subcellular localization method; proteomics research; support vector machine classifier; Kernel Methods; Mercer condition; Subcellular localization; Support Vector Machines; profile alignment; Algorithms; Amino Acid Sequence; Artificial Intelligence; Molecular Sequence Data; Pattern Recognition, Automated; Proteins; Sequence Alignment; Sequence Analysis, Protein; Software; Structure-Activity Relationship; Subcellular Fractions; Tissue Distribution;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2007.70256