Title :
Comparison of Distance metrics for hierarchical data in medical databases
Author :
Hassan, Diman ; Aickelin, Uwe ; Wagner, Christoph
Author_Institution :
Sch. of Comput. Sci., Univ. of Nottingham, Nottingham, UK
Abstract :
Distance metrics are broadly used in different research areas and applications, such as bio-informatics, data mining and many other fields. However, there are some metrics, like pg-gram and Edit Distance used specifically for data with a hierarchical structure. Other metrics used for non-hierarchical data are the geometric and Hamming metrics. We have applied these metrics to The Health Improvement Network (THIN) database which has some hierarchical data. The THIN data has to be converted into a tree-like structure for the first group of metrics. For the second group of metrics, the data are converted into a frequency table or matrix, then for all metrics, all distances are found and normalised. Based on this particular data set, our research question: which of these metrics is useful for THIN data?. This paper compares the metrics, particularly the pogram metric on finding the similarities of patients´ data. It also investigates the similar patients who have the same close distances as well as the metrics suitability for clustering the whole patient population. Our results show that the two groups of metrics perform differently as they represent different structures of the data. Nevertheless, all the metrics could represent some similar data of patients as well as discriminate sufficiently well in clustering the patient population using k-means clustering algorithm.
Keywords :
data mining; database management systems; medical information systems; pattern clustering; Hamming metrics; THIN database; bioinformatics; data mining; distance metrics comparison; edit distance; frequency table; geometric metrics; hierarchical data; hierarchical structure; k-means clustering algorithm; medical databases; patient population; the health improvement network; tree like structure; Data mining; Databases; Drugs; Equations; Hamming distance; Mathematical model; Measurement;
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
DOI :
10.1109/IJCNN.2014.6889554