Title :
Learning with limited minority class data
Author :
Khoshgoftaar, Taghi M. ; Seiffert, Chris ; Hulse, Jason Van ; Napolitano, Amri ; Folleco, Andres
Author_Institution :
Florida Atlantic Univ., Boca Raton
Abstract :
A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This work presents a comprehensive empirical study of learning when examples from one class are extremely rare, but examples of the other class(es) are plentiful. Specifically, we address the issue of how many examples from the abundant class should be used when training a classifier on data where one class is very rare. Nearly one million classifiers were built and evaluated to generate the results presented in this work. Our results demonstrate that the often used ´even distribution´ is not optimal when dealing with such rare events.
Keywords :
classification; data handling; data mining; learning (artificial intelligence); binary classification; data classifier; data mining; machine learning; minority class data; Analysis of variance; Costs; Data mining; Decision trees; Machine learning; Measurement; Performance evaluation; Testing; Training data;
Conference_Titel :
Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
Conference_Location :
Cincinnati, OH
Print_ISBN :
978-0-7695-3069-7
DOI :
10.1109/ICMLA.2007.76