DocumentCode
736327
Title
Evolutionary undersampling for imbalanced big data classification
Author
Triguero, I. ; Galar, M. ; Vluymans, S. ; Cornelis, C. ; Bustince, H. ; Herrera, F. ; Saeys, Y.
Author_Institution
Inflammation Research Center of the Flemish Institute of Biotechnology (VIB), 9052 Zwijnaarde, Belgium
fYear
2015
fDate
25-28 May 2015
Firstpage
715
Lastpage
722
Abstract
Classification techniques in the big data scenario are in high demand in a wide variety of applications. The huge increment of available data may limit the applicability of most of the standard techniques. This problem becomes even more difficult when the class distribution is skewed, the topic known as imbalanced big data classification. Evolutionary undersampling techniques have shown to be a very promising solution to deal with the class imbalance problem. However, their practical application is limited to problems with no more than tens of thousands of instances. In this contribution we design a parallel model to enable evolutionary undersampling methods to deal with large-scale problems. To do this, we rely on a MapReduce scheme that distributes the functioning of these kinds of algorithms in a cluster of computing elements. Moreover, we develop a windowing approach for class imbalance data in order to speed up the undersampling process without losing accuracy. In our experiments we test the capabilities of the proposed scheme with several data sets with up to 4 million instances. The results show promising scalability abilities for evolutionary undersampling within the proposed framework.
Keywords
Big data; Biological cells; Computational modeling; Data mining; Data models; Decision trees; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Evolutionary Computation (CEC), 2015 IEEE Congress on
Conference_Location
Sendai, Japan
Type
conf
DOI
10.1109/CEC.2015.7256961
Filename
7256961
Link To Document