• DocumentCode
    419598
  • Title

    An efficient technique for protein sequence clustering and classification

  • Author

    Vijaya, P.A. ; Murty, M. Narasimha ; Subramanian, D.K.

  • Author_Institution
    Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
  • Volume
    2
  • fYear
    2004
  • fDate
    23-26 Aug. 2004
  • Firstpage
    447
  • Abstract
    A technique to reduce time and space during protein sequence clustering and classification is presented. During training and testing phase, the similarity score value between a pair of sequences is determined by selecting a portion of the sequence instead of the entire sequence. It is like selecting a subset of features for sequence data sets. The experimental results of the proposed method show that the classification accuracy (CA) using the prototypes generated/used does not degrade much but the training and testing time are reduced significantly. Thus the experimental results indicate that the similarity score need not be calculated by considering the entire length of the sequence for achieving a good CA. Even space requirement is reduced during execution phase. We have tested this using K-medians, supervised K-medians and nearest neighbour classifier (NNC) techniques.
  • Keywords
    biology computing; molecular biophysics; pattern classification; pattern clustering; proteins; sequences; K-medians technique; classification accuracy; nearest neighbour classifier technique; protein sequence classification; protein sequence clustering; sequence data sets; supervised K-medians technique; Pattern recognition; Protein sequence;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2128-2
  • Type

    conf

  • DOI
    10.1109/ICPR.2004.1334254
  • Filename
    1334254