DocumentCode :
478740
Title :
Validation Measures for Clustering Algorithms Incorporating Biological Information
Author :
Datta, Soupayan ; Datta, Soupayan
Author_Institution :
Sch. of Public Health & Inf. Sci., Louisville Univ., KY
Volume :
1
fYear :
2006
fDate :
20-24 June 2006
Firstpage :
131
Lastpage :
135
Abstract :
A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. A closely related problem is that of selecting a clustering algorithm that is optimal in some way from a rather impressive list of clustering algorithms that currently exist. In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional consistency, so that a good clustering algorithm should have a small value for these measures. We illustrate our methods using two sets of expression profiles obtained from a breast cancer data set. Six well known clustering algorithms UPGMA, k-means, Diana, Fanny, model-based and SOM were evaluated. Whereas the exact ordering depends on the particular data set (expression profiles) used and the validation measure employed, overall UPGMA appears to be the optimal for this cancer data set that we considered
Keywords :
biology computing; cancer; data handling; genetics; pattern clustering; statistical analysis; UPGMA; biological functional consistency; biological information; breast cancer data set; clustering algorithm; gene expression profiles; statistical consistency; Biological system modeling; Biology; Breast cancer; Clustering algorithms; Clustering methods; Gene expression; Information analysis; Performance analysis; Public healthcare; Stability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Computational Sciences, 2006. IMSCCS '06. First International Multi-Symposiums on
Conference_Location :
Hanzhou, Zhejiang
Print_ISBN :
0-7695-2581-4
Type :
conf
DOI :
10.1109/IMSCCS.2006.139
Filename :
4673536
Link To Document :
بازگشت