Title :
Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection
Author :
Niijima, Satoshi ; Okuno, Yasushi
Author_Institution :
Dept. of Pharmacolnformatics, Kyoto Univ., Kyoto, Japan
Abstract :
Until recently, numerous feature selection techniques have been proposed and found wide applications in genomics and proteomics. For instance, feature/gene selection has proven to be useful for biomarker discovery from microarray and mass spectrometry data. While supervised feature selection has been explored extensively, there are only a few unsupervised methods that can be applied to exploratory data analysis. In this paper, we address the problem of unsupervised feature selection. First, we extend Laplacian linear discriminant analysis (LLDA) to unsupervised cases. Second, we propose a novel algorithm for computing LLDA, which is efficient in the case of high dimensionality and small sample size as in microarray data. Finally, an unsupervised feature selection method, called LLDA-based recursive feature elimination (LLDA-RFE), is proposed. We apply LLDA-RFE to several public data sets of cancer microarrays and compare its performance with those of Laplacian score and SVD-entropy, two state-of-the-art unsupervised methods, and with that of Fisher score, a supervised filter method. Our results demonstrate that LLDA-RFE outperforms Laplacian score and shows favorable performance against SVD-entropy. It performs even better than Fisher score for some of the data sets, despite the fact that LLDA-RFE is fully unsupervised.
Keywords :
Laplace equations; cancer; cellular biophysics; genetics; genomics; linear algebra; molecular biophysics; proteomics; recursive estimation; singular value decomposition; Fisher score; Laplacian linear discriminant analysis; Laplacian score; SVD-entropy; cancer microarrays; recursive feature elimination; supervised filter method; unsupervised feature selection; Biology and genetics; Feature evaluation and selection; Medicine; Unsupervised feature selection; graph Laplacian; linear discriminant analysis; microarray data analysis.; Algorithms; Artificial Intelligence; Computational Biology; Discriminant Analysis; Gene Expression Profiling; Gene Expression Regulation, Neoplastic; Humans; Models, Statistical; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Programming Languages; Software;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2007.70257