DocumentCode
2527773
Title
An improved clustering technique based on statistical model preprocessing for gene expression dataset
Author
Tajunisha, N. ; Saravanan, V.
Author_Institution
Dept. of Comput. Sci., Sri Ramakrishna Coll. of Arts & Sci. for Women, Coimbatore, India
fYear
2010
fDate
17-19 Dec. 2010
Firstpage
46
Lastpage
49
Abstract
Data mining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. Within a gene expression matrix there are usually several particular macroscopic phenotypes of samples. Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Currently most of the research work focuses on the supervised analysis, relatively less attention has been paid to unsupervised approaches which are important when domain knowledge is incomplete or hard to obtain. The standard k-means clustering algorithm is used for many practical applications. But its output is quite sensitive to initial positions of cluster centers. In this paper, we present a new framework for clustering microarray data with informative genes. We proposed statistical method to find informative genes and we have proposed a method to find initial centroid for k-means clustering. Here in our work, initial clusters are formed with fixed initial centroid and then we have used statistical method to find informative genes which are used in turn to obtain an improved clustering. By comparing the result of original and new approach, it was found that the results obtained are more accurate.
Keywords
biology computing; data mining; genetics; pattern clustering; statistical analysis; biomedical industry; data mining; domain knowledge; fixed initial centroid; gene expression analysis; k-means clustering algorithm; microarray data clustering; statistical model preprocessing; Accuracy; Algorithm design and analysis; Clustering algorithms; Gene expression; Iris; Partitioning algorithms; Principal component analysis; informative gene; initial centroid; k-means; microarray gene data;
fLanguage
English
Publisher
ieee
Conference_Titel
Trendz in Information Sciences & Computing (TISC), 2010
Conference_Location
Chennai
Print_ISBN
978-1-4244-9007-3
Type
conf
DOI
10.1109/TISC.2010.5714606
Filename
5714606
Link To Document