• DocumentCode
    3519474
  • Title

    On the Role of Local Matching for Efficient Semi-supervised Protein Sequence Classification

  • Author

    Kuksa, Pavel ; Huang, Pai-Hsi ; Pavlovic, Vladimir

  • Author_Institution
    Dept. of Comput. Sci., Rutgers Univ., Piscataway, NJ
  • fYear
    2008
  • fDate
    3-5 Nov. 2008
  • Firstpage
    217
  • Lastpage
    222
  • Abstract
    Recent studies in protein sequence analysis have leveraged the power of unlabeled data. For example, the profile and mismatch neighborhood kernels have shown significant improvements over classifiers estimated under the fully supervised setting. In this study, we present a principled and biologically motivated framework that more effectively exploits the unlabeled data by only utilizing regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias kernel estimations that rely on unlabeled data, we also propose a method to remove this bias and improve performance of resulting classifiers.Combined with a computationally efficient sparse family of string kernels, our proposed framework achieves state-of-the-art accuracy in semi-supervised protein remote homology detection on three large unlabeled databases.
  • Keywords
    bioinformatics; pattern classification; proteins; proteomics; classifier performance; local matching; overly represented sequences; prediction accuracy; protein sequence analysis; semi-supervised protein remote homology detection; semi-supervised protein sequence classification; sparse string kernel family; uncurated databases; unlabeled databases; Accuracy; Bioinformatics; Biology computing; Computer science; Databases; Information resources; Kernel; Labeling; Performance gain; Protein sequence; inexact matching; protein classification; semi-supervised learning; sequence classification; sparse spatial sample kernels; string kernels;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine, 2008. BIBM '08. IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-0-7695-3452-7
  • Type

    conf

  • DOI
    10.1109/BIBM.2008.52
  • Filename
    4684895