• DocumentCode
    1496146
  • Title

    Cancer Classification from Gene Expression Data by NPPC Ensemble

  • Author

    Ghorai, Santanu ; Mukherjee, Anirban ; Sengupta, Sanghamitra ; Dutta, Pranab K.

  • Author_Institution
    Dept. of Electron. & Commun. Eng., MCKV Inst. of Eng., Howrah, India
  • Volume
    8
  • Issue
    3
  • fYear
    2011
  • Firstpage
    659
  • Lastpage
    671
  • Abstract
    The most important application of microarray in gene expression analysis is to classify the unknown tissue samples according to their gene expression levels with the help of known sample expression levels. In this paper, we present a nonparallel plane proximal classifier (NPPC) ensemble that ensures high classification accuracy of test samples in a computer-aided diagnosis (CAD) framework than that of a single NPPC model. For each data set only, a few genes are selected by using a mutual information criterion. Then a genetic algorithm-based simultaneous feature and model selection scheme is used to train a number of NPPC expert models in multiple subspaces by maximizing cross-validation accuracy. The members of the ensemble are selected by the performance of the trained models on a validation set. Besides the usual majority voting method, we have introduced minimum average proximity-based decision combiner for NPPC ensemble. The effectiveness of the NPPC ensemble and the proposed new approach of combining decisions for cancer diagnosis are studied and compared with support vector machine (SVM) classifier in a similar framework. Experimental results on cancer data sets show that the NPPC ensemble offers comparable testing accuracy to that of SVM ensemble with reduced training time on average.
  • Keywords
    cancer; cellular biophysics; classification; genetic algorithms; genetics; medical diagnostic computing; molecular biophysics; support vector machines; NPPC ensemble; SVM classifier; cancer classification; computer-aided diagnosis; cross-validation accuracy; decision combiner; feature selection; gene expression; genetic algorithm; majority voting method; microarray; minimum average proximity; model selection; mutual information; nonparallel plane proximal classifier; support vector machine; Application software; Cancer; Diseases; Filters; Gene expression; Genetics; Mutual information; Support vector machine classification; Support vector machines; Testing; Cancer classification; classifier ensemble; combination of multiple classifiers; microarray data analysis; proximal classifier.; Algorithms; Artificial Intelligence; Computational Biology; Databases, Genetic; Gene Expression Profiling; Humans; Neoplasms; Oligonucleotide Array Sequence Analysis; Reproducibility of Results;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2010.36
  • Filename
    5467034