• DocumentCode
    2892363
  • Title

    A Dynamic Sampling Framework for Multi-class Imbalanced Data

  • Author

    Debowski, B. ; Areibi, Shawki ; Grewal, Gary ; Tempelman, J.

  • Author_Institution
    Sch. of Eng., Univ. of Guelph, Guelph, ON, Canada
  • Volume
    2
  • fYear
    2012
  • fDate
    12-15 Dec. 2012
  • Firstpage
    113
  • Lastpage
    118
  • Abstract
    In this paper we present a Dynamic Sampling Framework for use with multi-class imbalanced data containing any number of classes. The framework makes use of existing sampling techniques such as RUS, ROS, and SMOTE and ties the classification algorithm into the sampling process in a wrapper like manner. In doing so the framework is able to search for a desirably sampled training set, thus eliminating the need to specify a target distribution and automatically tuning the training set distribution to the classification algorithm´s learning preferences. This is important when re-sampling multi-class data where manually searching for an appropriate target distribution would be a daunting task. We test both our Dynamic Sampling approach and traditional Static Sampling using RUS, ROS, SMOTE, ROS+RUS, and SMOTE+RUS with several classification algorithms on a four class, highly imbalanced data set. We compare the results of Static Sampling and Dynamic Sampling and find that overall both techniques are able to raise Recall for the highest minority classes, but Dynamic Sampling is also able to maintain or raise Recall for the majority classes. Also, Dynamic Sampling is overall more robust and resilient, and is better able to sustain classifier Accuracy and to raise G-Mean and Minimum F-Measures.
  • Keywords
    data mining; pattern classification; sampling methods; statistical distributions; G-Mean; ROS; RUS; SMOTE; classification algorithm; classification algorithm learning preferences; dynamic sampling; dynamic sampling framework; minimum F-measures; multiclass data re-sampling; multiclass imbalanced data; sampled training set; sampling process; sampling techniques; static sampling; target distribution; training set distribution; Accuracy; Algorithm design and analysis; Artificial neural networks; Educational institutions; Heuristic algorithms; Niobium; Training; Dynamic Sampling; Imbalanced Data; Multi-class;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2012 11th International Conference on
  • Conference_Location
    Boca Raton, FL
  • Print_ISBN
    978-1-4673-4651-1
  • Type

    conf

  • DOI
    10.1109/ICMLA.2012.144
  • Filename
    6406737