Title :
PCGEN: A Practical Approach to Projected Clustering and its Application to Gene Expression Data
Author :
Bouguessa, Mohamed ; Wang, Shengrui
Author_Institution :
Dept. of Comput. Sci., Sherbrooke Univ., Que.
fDate :
March 1 2007-April 5 2007
Abstract :
Clustering samples in gene expression data has always been a major challenge because of the high dimensionality of the input space (typically in the tens of thousands) and the small number of samples (typically less than a hundred). Moreover, clusters may hide in subspaces with very low dimensionalities. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. These challenges motivate our effort to propose a new and efficient partitional distance-based projected clustering algorithm for clustering samples in gene expression data. Our algorithm is capable of detecting projected clusters of extremely low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full-dimensional space. The suitability of our proposal has been demonstrated through an empirical study using public microarray datasets.
Keywords :
biology computing; genetics; pattern clustering; PCGEN; clustering samples; gene expression data; high dimensionality; partitional distance-based projected clustering; public microarray datasets; Application software; Cancer; Clustering algorithms; Computational intelligence; Computer science; Data mining; Embedded computing; Gene expression; Partitioning algorithms; Testing;
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
DOI :
10.1109/CIDM.2007.368939