DocumentCode :
574988
Title :
Clustering high dimensional gene expression data via two step feature filtering
Author :
Chen, Jianjiao ; Song, Anping ; Zhang, Wu
fYear :
2011
fDate :
Nov. 29 2011-Dec. 1 2011
Firstpage :
299
Lastpage :
303
Abstract :
Due to the importance of gene expression data in cancer diagnosis and treatment, microarray gene expression data have attracted more and more attentions from cancer researchers in recent years. However, in real-world computational analysis, such data common meet with the curse of dimensionality due to the tens of thousands of measures of gene expression level versus the small number of samples. therefore, developing effective clustering method is a challenging problem for high dimensional dataset. Here, we use two step feature filtering and dimensional reduction methods to reduce the dimension of gene expression data. At first, we extract a subset of genes based on ReliefF and Fast Correlation-Based Filter (FCBF). Then, the clustering approach of k-means (KM), KM with principal component analysis (PCA), KM with random projection (RP), respectively is implemented on the reduced gene dataset and generates the resulting data of clusters of cancer samples. Experimental results on the small round blue-cell tumor (SRBCT) data set demonstrate that two step feature filtering can significantly improve the performance of KM clustering algorithm and contribute to the application of PCA and RP in high dimensional space and that the effectiveness and efficiency of our proposed scheme in addressing high dimensional gene expression data.
Keywords :
biology computing; cancer; pattern clustering; principal component analysis; FCBF; KM; PCA; RP; ReliefF; SRBCT; cancer diagnosis; cancer treatment; dimensional reduction methods; fast correlation-based filter; high dimensional gene expression data clustering; k-means; principal component analysis; random projection; real-world computational analysis; reduced gene dataset; small round blue-cell tumor data set; two step feature filtering; Accuracy; Algorithm design and analysis; Cancer; Clustering algorithms; Filtering; Gene expression; Principal component analysis; Gene Filtering; Kmeans; Random Projection; SRBCT;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Sciences and Convergence Information Technology (ICCIT), 2011 6th International Conference on
Conference_Location :
Seogwipo
Print_ISBN :
978-1-4577-0472-7
Type :
conf
Filename :
6316624
Link To Document :
بازگشت