DocumentCode :
2850434
Title :
Cost-guided class noise handling for effective cost-sensitive learning
Author :
Zhu, Xingquan ; Wu, Xindong
Author_Institution :
Dept. of Comput. Sci., Vermont Univ., Burlington, VT, USA
fYear :
2004
fDate :
1-4 Nov. 2004
Firstpage :
297
Lastpage :
304
Abstract :
Research in machine learning, data mining and related areas has produced a wide variety of algorithms for cost-sensitive (CS) classification, where instead of maximizing the classification accuracy, minimizing the misclassification cost becomes the objective. However, these methods assume that training sets do not contain significant noise, which is rarely the case in real-world environments. In this paper, we systematically study the impacts of class noise on CS learning, and propose a cost-guided class noise handling algorithm to identify noise for effective CS learning. We call it cost-guided iterative classification filter (CICF), because it seamlessly integrates costs and an existing classification filter (C. Brodley and M. Friedl, 1999) for noise identification. Instead of putting equal weights to handle noise in all classes in existing efforts, CICF puts more emphasis on expensive classes, which makes it especially successful in dealing with datasets with a large cost-ratio. Experimental results and comparative studies from real-world datasets indicate that the existence of noise may seriously corrupt the performance of CS classifiers, and by adopting the proposed CICF algorithm, we can significantly reduce the misclassification cost of a CS classifier in noisy environments.
Keywords :
learning (artificial intelligence); noise; pattern classification; cost-guided class noise handling; cost-guided iterative classification filter; cost-sensitive classification; effective cost-sensitive learning; noise identification; Classification tree analysis; Computer science; Costs; Data mining; Decision trees; Filters; Machine learning; Machine learning algorithms; Noise reduction; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
Print_ISBN :
0-7695-2142-8
Type :
conf
DOI :
10.1109/ICDM.2004.10108
Filename :
1410297
Link To Document :
بازگشت