DocumentCode :
2259714
Title :
Imbalanced data classifier by using ensemble fuzzy c-means clustering
Author :
Kocyigit, Yucel ; Seker, Huseyin
Author_Institution :
Bio-Health Inf. Res. Group, De Montfort Univ., Leicester, UK
fYear :
2012
fDate :
5-7 Jan. 2012
Firstpage :
952
Lastpage :
955
Abstract :
Pattern classifiers developed with the imbalanced data set tend to classify an object to the class with the highest number of samples, resulting in higher overall classifier accuracy but lower sensitivity. A new approach based on a dynamic under-sampling procedure is therefore proposed to improve the classification of imbalanced datasets that are quite common in bio-medicine. To overcome a class imbalance, the dataset is resampled by using the ensemble fuzzy c-means clustering method. The under-sampling procedure is then applied to the majority class to balance the size of the classes. Compared to the existing classifiers, the proposed method yields not only higher classification accuracy and sensitivity but also more stable classification performance under different data sets, classifiers and their parameters, indicating that it is independent of particular clustering or classification methods.
Keywords :
data analysis; fuzzy set theory; pattern classification; pattern clustering; biomedicine; dynamic under-sampling procedure; ensemble fuzzy c-means clustering; imbalanced data classifier; imbalanced data set; pattern classifiers; Biological system modeling; Diabetes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Biomedical and Health Informatics (BHI), 2012 IEEE-EMBS International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4577-2176-2
Electronic_ISBN :
978-1-4577-2175-5
Type :
conf
DOI :
10.1109/BHI.2012.6211746
Filename :
6211746
Link To Document :
بازگشت