• DocumentCode
    966178
  • Title

    A Novel and Efficient Technique for Identification and Classification of GPCRs

  • Author

    Gupta, Ravi ; Mittal, Ankush ; Singh, Kuldip

  • Author_Institution
    Dept. of Electron. & Comput. Eng., Indian Inst. of Technol.-Roorkee, Roorkee
  • Volume
    12
  • Issue
    4
  • fYear
    2008
  • fDate
    7/1/2008 12:00:00 AM
  • Firstpage
    541
  • Lastpage
    548
  • Abstract
    G-protein coupled receptors (GPCRs) play a vital role in different biological processes, such as regulation of growth, death, and metabolism of cells. GPCRs are the focus of significant amount of current pharmaceutical research since they interact with more than 50% of prescription drugs. The dipeptide-based support vector machine (SVM) approach is the most accurate technique to identify and classify the GPCRs. However, this approach has two major disadvantages. First, the dimension of dipeptide-based feature vector is equal to 400. The large dimension makes the classification task computationally and memory wise inefficient. Second, it does not consider the biological properties of protein sequence for identification and classification of GPCRs. In this paper, we present a novel-feature-based SVM classification technique. The novel features are derived by applying wavelet-based time series analysis approach on protein sequences. The proposed feature space summarizes the variance information of seven important biological properties of amino acids in a protein sequence. In addition, the dimension of the feature vector for proposed technique is equal to 35. Experiments were performed on GPCRs protein sequences available at GPCRs Database. Our approach achieves an accuracy of 99.9%, 98.06%, 97.78%, and 94.08% for GPCR superfamily, families, subfamilies, and subsubfamilies (amine group), respectively, when evaluated using fivefold cross-validation. Further, an accuracy of 99.8%, 97.26%, and 97.84% was obtained when evaluated on unseen or recall datasets of GPCR superfamily, families, and subfamilies, respectively. Comparison with dipeptide-based SVM technique shows the effectiveness of our approach.
  • Keywords
    biology computing; pattern classification; proteins; support vector machines; time series; wavelet transforms; G-protein coupled receptors; SVM classification technique; amino acids; maximal overlap wavelet transform; pattern recognition framework; protein classification; protein sequences; support vector machine; wavelet-based time series analysis; G-protein coupled receptors (GPCRs); GPCRs; Maximal overlap wavelet transform; Protein classification; Support vector machine; maximal overlap wavelet transform; protein classification; support vector machine (SVM); Algorithms; Amino Acid Sequence; Artificial Intelligence; Molecular Sequence Data; Pattern Recognition, Automated; Receptors, G-Protein-Coupled; Sequence Alignment; Sequence Analysis, Protein;
  • fLanguage
    English
  • Journal_Title
    Information Technology in Biomedicine, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1089-7771
  • Type

    jour

  • DOI
    10.1109/TITB.2007.911308
  • Filename
    4378201