Title :
Improving Learner Performance with Data Sampling and Boosting
Author :
Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason ; Napolitano, Amri
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL
Abstract :
Learning from imbalanced datasets is a well known problem in the data mining community. Many techniques have been proposed to alleviate the problems associated with class imbalance, including data sampling and boosting. While data sampling has received the bulk of the attention from the research community, our results show that boosting often results in better classification performance than even the best data sampling techniques. In this work, we compare the performance of data sampling and boosting on ten datasets from various application domains using two commonly used learners. In addition, we propose the use of both data sampling and boosting in an attempt to combine the strengths of these techniques and achieve even better classification performance.
Keywords :
data mining; learning (artificial intelligence); boosting; data mining; data sampling techniques; learning; Artificial intelligence; Boosting; Costs; Data mining; Iterative algorithms; Learning; Sampling methods; Training data; USA Councils;
Conference_Titel :
Tools with Artificial Intelligence, 2008. ICTAI '08. 20th IEEE International Conference on
Conference_Location :
Dayton, OH
Print_ISBN :
978-0-7695-3440-4
DOI :
10.1109/ICTAI.2008.58