• DocumentCode
    1134677
  • Title

    A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis

  • Author

    Cheng, Qiang

  • Author_Institution
    Comput. Sci. Dept., Southern Illinois Univ. Carbondale, Carbondale, IL, USA
  • Volume
    7
  • Issue
    4
  • fYear
    2010
  • Firstpage
    636
  • Lastpage
    646
  • Abstract
    Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein´s persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.
  • Keywords
    biology computing; feature extraction; genetics; learning (artificial intelligence); Greenshtein persistence; conjugate gradient-based primal-dual interior-point techniques; feature extraction; high-dimensional data; large-scale problems; machine learning; microarray gene analysis; multivariate features; pattern recognition; sparse learning machine; Bioinformatics; Cancer; Data mining; Feature extraction; Large-scale systems; Machine learning; Pattern analysis; Pattern recognition; Support vector machine classification; Support vector machines; High-dimensional data; bias; cancer classification; convex optimization; feature selection; microarray gene analysis.; persistence; primal-dual interior-point optimization; Artificial Intelligence; Gene Expression Profiling; Multivariate Analysis; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2009.8
  • Filename
    4770093