• DocumentCode
    736327
  • Title

    Evolutionary undersampling for imbalanced big data classification

  • Author

    Triguero, I. ; Galar, M. ; Vluymans, S. ; Cornelis, C. ; Bustince, H. ; Herrera, F. ; Saeys, Y.

  • Author_Institution
    Inflammation Research Center of the Flemish Institute of Biotechnology (VIB), 9052 Zwijnaarde, Belgium
  • fYear
    2015
  • fDate
    25-28 May 2015
  • Firstpage
    715
  • Lastpage
    722
  • Abstract
    Classification techniques in the big data scenario are in high demand in a wide variety of applications. The huge increment of available data may limit the applicability of most of the standard techniques. This problem becomes even more difficult when the class distribution is skewed, the topic known as imbalanced big data classification. Evolutionary undersampling techniques have shown to be a very promising solution to deal with the class imbalance problem. However, their practical application is limited to problems with no more than tens of thousands of instances. In this contribution we design a parallel model to enable evolutionary undersampling methods to deal with large-scale problems. To do this, we rely on a MapReduce scheme that distributes the functioning of these kinds of algorithms in a cluster of computing elements. Moreover, we develop a windowing approach for class imbalance data in order to speed up the undersampling process without losing accuracy. In our experiments we test the capabilities of the proposed scheme with several data sets with up to 4 million instances. The results show promising scalability abilities for evolutionary undersampling within the proposed framework.
  • Keywords
    Big data; Biological cells; Computational modeling; Data mining; Data models; Decision trees; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2015 IEEE Congress on
  • Conference_Location
    Sendai, Japan
  • Type

    conf

  • DOI
    10.1109/CEC.2015.7256961
  • Filename
    7256961