Comparison of K-Means clustering and statistical outliers in reducing medical datasets

Author

Santhanam, T. ; Padmavathi, M.S.

Author_Institution

Dept. of Comput. Sci., D.G. Vaishnav Coll., Chennai, India

fYear

2014

Firstpage

1

Lastpage

6

Abstract

Data reduction is a process of reducing the datasets in volume, almost used in all real time applications. Although there are several techniques available, many researchers have used K-Means clustering in reducing the datasets. In this paper, three different methods were used to replace missing values with mean, median and a predicted score; the cleaned datasets were reduced using K-Means clustering and Statistical Outlier detection. This research work compares the data reduction percentage performed by K-Means and Statistical Outliers for all the three methods of imputation. The experimental result proves that, the reduction rate of outliers is less than K-Means clustering.

Keywords

data reduction; medical administrative data processing; pattern clustering; statistical analysis; K-means clustering; medical dataset reduction; statistical outlier detection; Cleaning; Clustering algorithms; Computer science; Data mining; Data models; Diabetes; Medical diagnostic imaging; Data Reduction; K-Means clustering; Missing values; Outliers;

fLanguage

English

Publisher

ieee

Conference_Titel

Science Engineering and Management Research (ICSEMR), 2014 International Conference on

Print_ISBN

978-1-4799-7614-0

Type

conf

DOI

10.1109/ICSEMR.2014.7043602

Filename

7043602