• DocumentCode
    2007001
  • Title

    A comparative study on sampling techniques for handling class imbalance in streaming data

  • Author

    Nguyen, Hien M. ; Cooper, Eric W. ; Kamei, Kentaro

  • Author_Institution
    Grad. Sch. of Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
  • fYear
    2012
  • fDate
    20-24 Nov. 2012
  • Firstpage
    1762
  • Lastpage
    1767
  • Abstract
    Sampling is the most popular approach for handling the class imbalance problem in training data. A number of studies have recently adapted sampling techniques for dynamic learning settings in which the training set is not fixed, but gradually grows over time. This paper presents an empirical study to compare over-sampling and under-sampling techniques in the context of data streaming. Experimental results show that under-sampling performs better than over-sampling at smaller training set sizes. All sampling techniques, however, are comparable when the training set becomes larger. This study also suggests that a multiple random under-sampling (MRUS) technique should be a good choice for applications with imbalanced and streaming data, because MRUS is the most effective while still keeping a high speed.
  • Keywords
    data handling; learning (artificial intelligence); MRUS technique; adapted sampling techniques; data streaming; dynamic learning; handling class imbalance; multiple random under sampling; sampling techniques; streaming data; training data; class imbalance; sampling; streaming data; training set size;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on
  • Conference_Location
    Kobe
  • Print_ISBN
    978-1-4673-2742-8
  • Type

    conf

  • DOI
    10.1109/SCIS-ISIS.2012.6505291
  • Filename
    6505291