Title :
Chinese Web page classification using noise-tolerant support vector machines
Author :
Zou, Jia-Qi ; Chen, Guo-Long ; Guo, Wen-Zhong
Author_Institution :
Inst. of Math. & Comput. Sci., Fuzhou Univ., China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
Real-world applications often require the classification of Web documents under the situation of noisy data. Support vector machines (SVM) work well for classification applications because of their high generalization ability. But they are very sensitive to noisy training data, which can degrade their classification accuracy. This paper presents a new algorithm to deal with noisy training data, which combines support vector machines and K-nearest neighbor (KNN) method. Given a training set, it employs K-nearest neighbor method to remove noisy training examples. Then the remained examples are selected to train SVM classifiers for Web categorization. Empirical results show that this new algorithm has strong tolerance of noise, and it can greatly reduce the influence of noisy data on the SVM classifier.
Keywords :
Internet; classification; learning (artificial intelligence); natural languages; support vector machines; Chinese Web page classification; K-nearest neighbor method; KNN method; SVM classifier training; Web categorization; noise-tolerant support vector machines; Application software; Computer science; Degradation; Feature extraction; Mathematics; Noise reduction; Support vector machine classification; Support vector machines; Training data; Web pages; K-nearest neighbor(KNN); Noise-tolerant; Support vector machines(SVM); Web classification;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598843