Title :
Skewed Class Distributions and Mislabeled Examples
Author :
Hulse, Jason Van ; Khoshgoftaar, Taghi M. ; Napolitano, Amri
Abstract :
Both imbalanced data and class noise are problems which have received attention in data mining research, how- ever learning from imbalanced data with labeling errors has not been adequately addressed. We present system- atic experimentation on imbalanced datasets with simulated class noise and evaluate the impact on various classifica- tion algorithms. Our results show that class noise is a sig- nificant detriment to learning from skewed data, but more importantly, we demonstrate that the class in which the noise is located is critical. This has significant repercus- sions for noise treatment procedures, which often handle noise equally in both classes. In addition, an examination of 11 classifiers demonstrates that the learners react very differently when confronted with class noise.
Keywords :
Area measurement; Classification algorithms; Computational modeling; Computer errors; Computer science; Conferences; Data engineering; Data mining; Labeling; USA Councils;
Conference_Titel :
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
Print_ISBN :
978-0-7695-3019-2
Electronic_ISBN :
978-0-7695-3033-8
DOI :
10.1109/ICDMW.2007.34