Title :
Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery
Author_Institution :
Dept. of Math., Eastern Michigan Univ., Ypsilanti, MI, USA
Abstract :
As a well-established feature selection algorithm, principal component analysis (PCA) is often combined with the state-of-the-art classification algorithms to identify cancer molecular patterns in microarray data. However, the algorithm´s global feature selection mechanism prevents it from effectively capturing the latent data structures in the high-dimensional data. In this study, we investigate the benefit of adding nonnegative constraints on PCA and develop a nonnegative principal component analysis algorithm (NPCA) to overcome the global nature of PCA. A novel classification algorithm NPCA-SVM is proposed for microarray data pattern discovery. We report strong classification results from the NPCA-SVM algorithm on five benchmark microarray data sets by direct comparison with other related algorithms. We have also proved mathematically and interpreted biologically that microarray data will inevitably encounter overfitting for an SVM/PCA-SVM learning machine under a Gaussian kernel. In addition, we demonstrate that nonnegative principal component analysis can be used to capture meaningful biomarkers effectively.
Keywords :
cancer; medical computing; molecular biophysics; principal component analysis; Gaussian kernel; NPCA-SVM algorithm; PCA nonnegative constraints; SVM-PCA-SVM learning machine; biomarkers; cancer molecular pattern discovery; microarray data pattern discovery; nonnegative principal component analysis; Bioinformatics; Biomarkers; Cancer; Classification algorithms; Gene expression; Genomics; Independent component analysis; Principal component analysis; Proteins; Robustness; Biomarker discovery; classification; feature selection; overfitting.; Algorithms; Animals; Gene Expression Profiling; Gene Expression Regulation, Neoplastic; Humans; Neoplasms; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Principal Component Analysis;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2009.36