Title :
Clustering approaches for data with missing values: Comparison and evaluation
Author :
Himmelspach, Ludmila ; Conrad, Stefan
Author_Institution :
Inst. of Comput. Sci., Heinrich-Heine-Univ. Dusseldorf, Düsseldorf, Germany
Abstract :
Traditional clustering methods were developed to analyse complete data sets. Faults during the data collection, data transfer or data cleaning often lead to missing values in data so that common clustering methods can not be used for the data analysis. Therefore, in these cases clustering methods which can handle missing values in data are of great use. In this paper we discuss different approaches proposed in the literature for adapting partitioning clustering algorithms for dealing with missing values in data. We analyse them on two appropriate data sets and compare them with each other. We give particular attention to the analysis of the accuracy of these methods depending on the different missing-data mechanisms and the percentage of missing values in the data sets.
Keywords :
data handling; data mining; data cleaning; data clustering; data collection; data missing value determination; data set analysis; data transfer; partitioning clustering algorithms; Accuracy; Clustering algorithms; Clustering methods; Distributed databases; Estimation; Partitioning algorithms; Prototypes;
Conference_Titel :
Digital Information Management (ICDIM), 2010 Fifth International Conference on
Conference_Location :
Thunder Bay, ON
Print_ISBN :
978-1-4244-7572-8
DOI :
10.1109/ICDIM.2010.5664691