Title :
Noise reduction to text categorization based on density for KNN
Author :
Li, Rong-lu ; Hu, Yun-fa
Author_Institution :
Comput. Technol. & Inf. Dept., Fudan Univ., Shanghai, China
Abstract :
With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data. As a simple and effective classification approach, KNN method is widely used in text categorization. But KNN classifier not only has the large computational demands, but also may result in the decrease of precision of classification because of uneven density of training data. In this paper, we present a density-based method for reducing the noises of training data, which solves these problems. Our experiment results also illustrate it.
Keywords :
classification; information retrieval; learning (artificial intelligence); text analysis; KNN classifier; KNN method; World Wide Web; density based method; document data; k-nearest neighbor classifier; noise reduction; text categorization; text classification; training data; Artificial intelligence; Electronic mail; Machine learning; Natural languages; Noise reduction; Organizing; Runtime; Text categorization; Training data; Web sites;
Conference_Titel :
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN :
0-7803-8131-9
DOI :
10.1109/ICMLC.2003.1260115