• DocumentCode
    772689
  • Title

    Feature Selection for Self-Supervised Classification With Applications to Microarray and Sequence Data

  • Author

    Kung, Sun-Yuan ; Mak, Man-Wai

  • Author_Institution
    Princeton Univ., Princeton, NJ
  • Volume
    2
  • Issue
    3
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    297
  • Lastpage
    309
  • Abstract
    Learning strategies are traditionally divided into two categories: unsupervised learning and supervised learning. In contrast, for feature selection, there are four different categories of training scenarios: (1) unsupervised; (2) (regular) supervised; (3) self-supervised (SS); and (4) doubly supervised. Many genomic applications naturally arise in either (regular) supervised or self-supervised formulation. The distinction of these two supervised scenarios lies in whether the class labels are assigned to the samples versus the features. This paper explains how to convert an SS formulation into a symmetric doubly supervised (SDS) formulation by a pairwise approach. The SDS formulation offers more explicit information for effective feature selection than the SS formulation. To harness this information, the paper adopts a selection scheme called vector-index-adaptive SVM (VIA-SVM), which is based on the fact that the support vectors can be subdivided into different groups each offering quite distinct prediction performance. Simulation studies validate that VIA-SVM performs very well for time-course microarray data. This paper further proposes a fusion strategy to integrate the diversified information embedded in the SDS formulation. Simulation studies on protein sequence data for subcelluar localization confirm that the prediction can be significantly improved by combining VIA-SVM with relevance scores (e.g., SNR) and redundancy metrics (e.g., Euclidean distance).
  • Keywords
    pattern classification; support vector machines; unsupervised learning; feature selection; genomic applications; learning strategies; microarray-sequence data; pairwise approach; self-supervised classification; subcelluar localization; symmetric doubly supervised formulation; time-course microarray data; unsupervised learning; vector-index-adaptive SVM; Bioinformatics; Euclidean distance; Genomics; Machine learning; Predictive models; Protein sequence; Supervised learning; Support vector machines; Training data; Unsupervised learning; Doubly supervised; SVM; feature selection; pairwise approach; self-supervised; subcellular localization; symmetric doubly supervised; vectorization;
  • fLanguage
    English
  • Journal_Title
    Selected Topics in Signal Processing, IEEE Journal of
  • Publisher
    ieee
  • ISSN
    1932-4553
  • Type

    jour

  • DOI
    10.1109/JSTSP.2008.923843
  • Filename
    4550556