• DocumentCode
    12227
  • Title

    A Tri-Gram Based Feature Extraction Technique Using Linear Probabilities of Position Specific Scoring Matrix for Protein Fold Recognition

  • Author

    Paliwal, Kuldip K. ; Sharma, Ashok ; Lyons, James ; Dehzangi, Abdollah

  • Author_Institution
    Sch. of Eng., Griffith Univ., Brisbane, QLD, Australia
  • Volume
    13
  • Issue
    1
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    44
  • Lastpage
    50
  • Abstract
    In biological sciences, the deciphering of a three dimensional structure of a protein sequence is considered to be an important and challenging task. The identification of protein folds from primary protein sequences is an intermediate step in discovering the three dimensional structure of a protein. This can be done by utilizing feature extraction technique to accurately extract all the relevant information followed by employing a suitable classifier to label an unknown protein. In the past, several feature extraction techniques have been developed but with limited recognition accuracy only. In this study, we have developed a feature extraction technique based on tri-grams computed directly from Position Specific Scoring Matrices. The effectiveness of the feature extraction technique has been shown on two benchmark datasets. The proposed technique exhibits up to 4.4% improvement in protein fold recognition accuracy compared to the state-of-the-art feature extraction techniques.
  • Keywords
    biochemistry; feature extraction; molecular biophysics; probability; proteins; benchmark datasets; limited recognition accuracy; linear probability; position specific scoring matrices; position specific scoring matrix; primary protein sequences; protein fold recognition accuracy; state-of-the-art feature extraction techniques; three dimensional structure; trigram based feature extraction technique; Accuracy; Amino acids; Feature extraction; Protein engineering; Protein sequence; Support vector machines; Feature extraction technique; position specific scoring matrix (PSSM); protein fold recognition; support vector machine (SVM); tri-gram;
  • fLanguage
    English
  • Journal_Title
    NanoBioscience, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1536-1241
  • Type

    jour

  • DOI
    10.1109/TNB.2013.2296050
  • Filename
    6750119