Title :
Evolutionary undersampling for imbalanced big data classification
Author :
Triguero, I. ; Galar, M. ; Vluymans, S. ; Cornelis, C. ; Bustince, H. ; Herrera, F. ; Saeys, Y.
Author_Institution :
Inflammation Research Center of the Flemish Institute of Biotechnology (VIB), 9052 Zwijnaarde, Belgium
Abstract :
Classification techniques in the big data scenario are in high demand in a wide variety of applications. The huge increment of available data may limit the applicability of most of the standard techniques. This problem becomes even more difficult when the class distribution is skewed, the topic known as imbalanced big data classification. Evolutionary undersampling techniques have shown to be a very promising solution to deal with the class imbalance problem. However, their practical application is limited to problems with no more than tens of thousands of instances. In this contribution we design a parallel model to enable evolutionary undersampling methods to deal with large-scale problems. To do this, we rely on a MapReduce scheme that distributes the functioning of these kinds of algorithms in a cluster of computing elements. Moreover, we develop a windowing approach for class imbalance data in order to speed up the undersampling process without losing accuracy. In our experiments we test the capabilities of the proposed scheme with several data sets with up to 4 million instances. The results show promising scalability abilities for evolutionary undersampling within the proposed framework.
Keywords :
Big data; Biological cells; Computational modeling; Data mining; Data models; Decision trees; Training;
Conference_Titel :
Evolutionary Computation (CEC), 2015 IEEE Congress on
Conference_Location :
Sendai, Japan
DOI :
10.1109/CEC.2015.7256961