Title :
Time-series approach to protein classification problem
Author :
Gupta, Ravi, Jr. ; Mittal, Ankush, Jr. ; Singh, Kuldip, Jr. ; Narang, Vipin, Jr. ; Roy, Sujoy
Author_Institution :
Dept. of Electron. & Comput. Eng., Indian Inst. of Technol. Roorkee, Roorkee, India
Abstract :
In this paper, a wavelet-based time-series approach for protein classification problem was presented. A novel feature vector based on the variation of seven physicochemical properties (hydrophobicity, electronic, isoelectric point, polarity, volume, composition, and molecular weight) of amino acids was proposed in this article. The feature vector contains the wavelet variance information of physico-chemical properties of protein sequences. The dimension of the proposed feature vector is only 35 when compared with 400-dimensional feature vector for G protein coupled receptors technique(GPCR) pred and 512-dimensional feature vector for fast Fourier transform(FFT)-based approaches. The low dimension of the feature vector will facilitate the development of computational and memory-efficient classifiers for drug discovery applications. Experiments were performed on the complete data set that is available at GPCR database(GPCRDB). Tests were also conducted on unseen or independent data sets to measure the generalization capability of the proposed classification technique. Performance comparison with GPCRpred and FFT- based approaches shows that the proposed approach performs equally well with the existing programs. The proposed approach can also be applied for prediction of protein structural classes, identification of membrane protein type, enzyme family classification, and many others.
Keywords :
biochemistry; biological techniques; biomembranes; drugs; fast Fourier transforms; molecular biophysics; molecular weight; proteins; time series; vectors; wavelet transforms; 400-dimensional feature vector; 512-dimensional feature vector; G protein coupled receptors technique; amino acid composition; amino acid volume; computational-efficient classifier; drug discovery applications; electronic property; enzyme family classification; feature vector; hydrophobicity; isoelectric point; membrane protein-type identification; memory-efficient classifier; molecular weight; physicochemical property; polarity; protein classification problem; protein sequences; protein structural classes; wavelet variance information; wavelet-based time-series approach; Amino acids; Bioinformatics; Diseases; Drugs; Encoding; Extracellular; Hidden Markov models; Pharmaceuticals; Proteins; Spatial databases; Algorithms; Amino Acid Sequence; Amino Acids; Databases, Protein; Models, Chemical; Pattern Recognition, Automated; Peptide Mapping; Physicochemical Phenomena; Proteomics; Receptors, G-Protein-Coupled; Reproducibility of Results;
Journal_Title :
Engineering in Medicine and Biology Magazine, IEEE
DOI :
10.1109/MEMB.2009.932903