DocumentCode :
3580097
Title :
Multi-exemplar based clustering for imbalanced data
Author :
Yangtao Wang ; Lihui Chen
Author_Institution :
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2014
Firstpage :
1068
Lastpage :
1073
Abstract :
Clustering is an important unsupervised technique of data analysis to find the underlining information of the unlabelled data. Many clustering approaches have been developed and reported in the literature and some of them are widely applied in real world problems such as k-means and fuzzy k-means. However, when handling imbalanced data in which the classes have very different sizes, the performance of these algorithms may not be very effective. The results of these algorithms always generate clusters with similar sizes which is called "uniform effect". To prevent uniform effect and improve the clustering performance, we proposed a new approach called multi-exemplar merging clustering(MEMC) for imbalanced data in this paper. Our approach is composed of two stages of processing: multiple exemplars identification stage and exemplars merging stage. In the first stage, multiple exemplars which are the data objects selected to represent the data set are identified using MEAP. In the second stage, the exemplars are merged based on the proposed overlapping measure(OM) which reflects the degree of overlapping between clusters. Experimental results on several synthetic and real world data sets are conducted to show the effectiveness of our proposed approach on imbalanced data clustering.
Keywords :
data analysis; merging; pattern clustering; MEAP; MEMC; OM; data analysis; data objects; exemplars merging stage; fuzzy k-means; imbalanced data handling; multiexemplar merging clustering; multiple exemplars identification stage; overlapping measure; real world data sets; synthetic data sets; unsupervised technique; Breast cancer; Clustering algorithms; Data analysis; Electronic mail; Malware; Merging; Pattern recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control Automation Robotics & Vision (ICARCV), 2014 13th International Conference on
Type :
conf
DOI :
10.1109/ICARCV.2014.7064454
Filename :
7064454
Link To Document :
بازگشت