Title :
A noise filtering method using neural networks
Author :
Zeng, Xinchuan ; Martinez, Tony
Author_Institution :
Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
fDate :
5/17/2003 12:00:00 AM
Abstract :
During the data collecting and labeling process it is possible for noise to be introduced into a data set. As a result, the quality of the data set degrades and experiments and inferences derived from the data set become less reliable. In this paper we present an algorithm, called ANR (automatic noise reduction), as a filtering mechanism to identify and remove noisy data items whose classes have been mislabeled. The underlying mechanism behind ANR is based on a framework of multi-layer artificial neural networks. ANR assigns each data item a soft class label in the form of a class probability vector, which is initialized to the original class label and can be modified during training. When the noise level is reasonably small (< 30%), the non-noisy data is dominant in determining the network architecture and its output, and thus a mechanism for correcting mislabeled data can be provided by aligning class probability vector with the network output. With a learning procedure for class probability vector based on its difference from the network output, the probability of a mislabeled class gradually becomes smaller while that of the correct class becomes larger, which eventually causes a correction of mislabeled data after sufficient training. After training, those data items whose classes have been relabeled are then treated as noisy data and removed from the data set. We evaluate the performance of the ANR based on 12 data sets drawn from the UCI data repository. The results show that ANR is capable of identifying a significant portion of noisy data. An average increase in accuracy of 24.5% can be achieved at a noise level of 25% by using ANR as a training data filter for a nearest neighbor classifier, as compared to the one without using ANR.
Keywords :
digital filters; feedforward neural nets; formal verification; learning (artificial intelligence); multilayer perceptrons; pattern classification; probability; ANR algorithm; UCI data repository; artificial neural network; automatic noise reduction; class probability vector; data collection; data labeling; data noise; data set reliability; learning procedure; multilayer neural network; nearest neighbor classifier; network architecture; network output; noise filtering; noise identification; noise level; noise removal; performance evaluation; Artificial neural networks; Degradation; Filtering algorithms; Filters; Inference algorithms; Labeling; Neural networks; Noise level; Noise reduction; Training data;
Conference_Titel :
Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. SCIMA 2003. IEEE International Workshop on
Print_ISBN :
0-7803-7711-7
DOI :
10.1109/SCIMA.2003.1215926