Title :
An improved clustering technique based on statistical model preprocessing for gene expression dataset
Author :
Tajunisha, N. ; Saravanan, V.
Author_Institution :
Dept. of Comput. Sci., Sri Ramakrishna Coll. of Arts & Sci. for Women, Coimbatore, India
Abstract :
Data mining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. Within a gene expression matrix there are usually several particular macroscopic phenotypes of samples. Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Currently most of the research work focuses on the supervised analysis, relatively less attention has been paid to unsupervised approaches which are important when domain knowledge is incomplete or hard to obtain. The standard k-means clustering algorithm is used for many practical applications. But its output is quite sensitive to initial positions of cluster centers. In this paper, we present a new framework for clustering microarray data with informative genes. We proposed statistical method to find informative genes and we have proposed a method to find initial centroid for k-means clustering. Here in our work, initial clusters are formed with fixed initial centroid and then we have used statistical method to find informative genes which are used in turn to obtain an improved clustering. By comparing the result of original and new approach, it was found that the results obtained are more accurate.
Keywords :
biology computing; data mining; genetics; pattern clustering; statistical analysis; biomedical industry; data mining; domain knowledge; fixed initial centroid; gene expression analysis; k-means clustering algorithm; microarray data clustering; statistical model preprocessing; Accuracy; Algorithm design and analysis; Clustering algorithms; Gene expression; Iris; Partitioning algorithms; Principal component analysis; informative gene; initial centroid; k-means; microarray gene data;
Conference_Titel :
Trendz in Information Sciences & Computing (TISC), 2010
Conference_Location :
Chennai
Print_ISBN :
978-1-4244-9007-3
DOI :
10.1109/TISC.2010.5714606