Title :
High dimensional gene expression data dimension reduction
Author :
Chao, Shi ; Lihui, Chen
Author_Institution :
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore
Abstract :
Gene expression data analysis is a new approach in cancer diagnosis. Feature selection is an important preprocessing step in gene expression data clustering. In this paper, we demonstrate the effectiveness of feature grouping approach in feature dimension reduction. In our proposed framework, large number of features is grouped to form several feature subsets. By criteria of clustering accuracy, one feature subset is chosen as the candidate subset for further processing by PCA or entropy ranking, and the final feature subset are formed by selecting the features from top ranked ones. Advantage of the framework is that it considers both subset and individual feature´s discrimination power, also it requires little information about the class label. A prototype of the proposed framework has been implemented and tested on the leukemia data set. The results have given positive support to the framework.
Keywords :
biology computing; cancer; data analysis; data reduction; feature extraction; pattern clustering; principal component analysis; PCA; cancer diagnosis; data analysis; data clustering; feature selection; high dimensional gene expression data dimension reduction; leukemia data set; principal component analysis; Cancer; Clustering algorithms; DNA; Data analysis; Data engineering; Diseases; Gene expression; Genetics; Neoplasms; Testing;
Conference_Titel :
Cybernetics and Intelligent Systems, 2004 IEEE Conference on
Print_ISBN :
0-7803-8643-4
DOI :
10.1109/ICCIS.2004.1460457