Title :
Using Classifier diversity to handle label noise
Author :
Michael R. Smith;Tony Martinez
Author_Institution :
Computer Science Department, Brigham Young University, Provo, Utah 84601, USA
fDate :
7/1/2015 12:00:00 AM
Abstract :
It is widely known in the machine learning community that class noise can be (and often is) detrimental to inducing a model of the data. Many current approaches use a single, often biased, measurement to determine if an instance is noisy. A biased measure may work well on certain data sets, but it can also be less effective on a broader set of data sets. In this paper, we conduct a large empirical evaluation of noise handling techniques; examining 12 noise handling techniques on a set of 54 data sets and 5 learning algorithms. The chosen set of noise handling techniques includes biased and ensembled approaches. Included in the investigation is the proposed noise identification using classifier diversity (NICD). NICD lessens the bias of the noise measure by selecting a diverse set of classifiers to determine which instances are noisy. We examine NICD as a technique for filtering, instance weighting, and selecting the base classifiers of a voting ensemble. We find that lessening the bias of the noise handling techniques significantly improves performance over a broad set of data sets.
Keywords :
"Accuracy","Computational modeling","Unified modeling language","Irrigation","Artificial neural networks","Annealing","Colon"
Conference_Titel :
Neural Networks (IJCNN), 2015 International Joint Conference on
Electronic_ISBN :
2161-4407
DOI :
10.1109/IJCNN.2015.7280316