Title :
On similarity measures for cluster analysis in clinical laboratory examination databases
Author :
Hirano, Shoji ; Sun, Xiaguang ; Tsumoto, Shusaku
Author_Institution :
Dept. of Med. Informatics, Shimane Med. Univ., Izumo, Japan
Abstract :
This paper discusses how the conventional similarity measure works on the practical medical data set. The similarity measure used was linear combination of the Mahalanobis distance between numerical attributes and the Hamming distance between nominal attributes. We performed clustering experiments on the meningoencephalitis data set using the similarity measure in conjunction with four types of clustering algorithms: single- and complete-linkage agglomerative hierarchical clustering, Ward´s method and rough clustering. Usefulness of the similarity measure was evaluated from the following viewpoints: (1) quality of the generated clusters; and (2) clinical reasonability of the attributes used to generate the high-quality clusters. The results show that the best clusters were obtained using Ward´s method where the clinically reasonable attributes were selected. It suggests that this similarity measures would be applicable to the medical data sets.
Keywords :
data mining; database management systems; medical computing; pattern clustering; Hamming distance; Mahalanobis distance; Ward method; agglomerative hierarchical clustering; medical data sets; meningoencephalitis data set; pattern clustering; rough clustering; similarity measure; Biomedical informatics; Clustering algorithms; Clustering methods; Databases; Hamming distance; Laboratories; Measurement standards; Microorganisms; Performance evaluation; Sun;
Conference_Titel :
Computer Software and Applications Conference, 2002. COMPSAC 2002. Proceedings. 26th Annual International
Print_ISBN :
0-7695-1727-7
DOI :
10.1109/CMPSAC.2002.1045170