DocumentCode :
3449508
Title :
The Research on Distributed Adaptive Text Classification
Author :
Yu, Xiao-Gao
Author_Institution :
Dept. of Inf. Manage., Hubei Univ. of Econ., Wuhan
fYear :
2008
fDate :
12-14 Oct. 2008
Firstpage :
1
Lastpage :
4
Abstract :
The automated categorization of documents into predefined labels has received an ever-increased attention for the exponential growth of documents on the Internet and the emergent need to organize them in the recent years. K-nearest neighbors is a widely used classifier in text categorization community because of its simplicity and efficiency among all these classifiers. However, K-nearest neighbor classification (KNNC) still suffers from inductive biases or model misfits that result from its assumptions, such as the presumption that training data are evenly distributed among all categories. In this paper, a new refinement strategy (DBKNNC) for the KNN classifier is proposed, which adopts sum-of-squared-error criterion to adaptively select the contributing part from these neighbors and classifies the input document in term of the disturbance degree which it brings to the kernel densities of these selected neighbors. DBKNNC is not sensitive to the parameter k and achieves significant classification performance improvement on imbalanced corpora according to the experimental results.
Keywords :
classification; text analysis; Internet; K-nearest neighbor classification; distributed adaptive text classification; document categorization; sum-of-squared-error criterion; text categorization; Benchmark testing; Information management; Internet; Kernel; Nearest neighbor searches; Robustness; Smoothing methods; System testing; Text categorization; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-2107-7
Electronic_ISBN :
978-1-4244-2108-4
Type :
conf
DOI :
10.1109/WiCom.2008.1350
Filename :
4679258
Link To Document :
بازگشت