Title :
A comparison for handling imbalanced datasets
Author :
Syaripudin, Arif ; Khodra, Masayu Leylia
Author_Institution :
Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandungw Bandung, Bandung, Indonesia
Abstract :
In various real case, imbalanced datasets problems are inevitable, such as in metal detecting security or diagnosis of disease. With the limitations of existing learning algorithms when faced with imbalanced datasets, the prediction error is caused by the dominance of the majority against the minority class. Various techniques have been made to address the above circumstances. This paper compares those techniques of handling imbalanced datasets with resample and ensembles. From a different standpoint, this paper examines how much influence the number of instances, number of attributes, the attributes data types, the number of the target class, and missing attribute values affect the classification results with performance analysis using f-measure. An experiment has resulted that the criteria regarding the number of attributes, attribute data types, and the number of the target class do not affect the classification results. While the missing attribute with values have an affect classification result. For better high F-measure, the experiment shows that the best performer is combination of SMOTE 5000/0 and AdaBoostMl.
Keywords :
data handling; learning (artificial intelligence); pattern classification; AdaBoostMl; SMOTE 5000/0; attributes data types; f-measure; imbalanced dataset handling; learning algorithms; missing attribute values; performance analysis; prediction error; Conferences; Decision support systems; Error analysis; Informatics; Nickel; Training; ensembles; imbalanced dataset; resamples;
Conference_Titel :
Advanced Informatics: Concept, Theory and Application (ICAICTA), 2014 International Conference of
Conference_Location :
Bandung
Print_ISBN :
978-1-4799-6984-5
DOI :
10.1109/ICAICTA.2014.7005957