Title :
Comparison of K-Means clustering and statistical outliers in reducing medical datasets
Author :
Santhanam, T. ; Padmavathi, M.S.
Author_Institution :
Dept. of Comput. Sci., D.G. Vaishnav Coll., Chennai, India
Abstract :
Data reduction is a process of reducing the datasets in volume, almost used in all real time applications. Although there are several techniques available, many researchers have used K-Means clustering in reducing the datasets. In this paper, three different methods were used to replace missing values with mean, median and a predicted score; the cleaned datasets were reduced using K-Means clustering and Statistical Outlier detection. This research work compares the data reduction percentage performed by K-Means and Statistical Outliers for all the three methods of imputation. The experimental result proves that, the reduction rate of outliers is less than K-Means clustering.
Keywords :
data reduction; medical administrative data processing; pattern clustering; statistical analysis; K-means clustering; medical dataset reduction; statistical outlier detection; Cleaning; Clustering algorithms; Computer science; Data mining; Data models; Diabetes; Medical diagnostic imaging; Data Reduction; K-Means clustering; Missing values; Outliers;
Conference_Titel :
Science Engineering and Management Research (ICSEMR), 2014 International Conference on
Print_ISBN :
978-1-4799-7614-0
DOI :
10.1109/ICSEMR.2014.7043602