Title :
An improved KNN text classification algorithm based on density
Author :
Shi, Kansheng ; Li, Lemin ; Liu, Haitao ; He, Jie ; Zhang, Naitong ; Song, Wentao
Author_Institution :
Shanghai Jiaotong Univ., Shanghai, China
Abstract :
Text classification has gained booming interest over the past few years. As a simple, effective and nonparametric classification method, KNN method is widely used in document classification. However, the uneven distribution in training set will affect the KNN classified result negatively. Moreover, the uneven distribution phenomenon of text is very common in documents on the Web. To tackling on this, this paper proposes an improved KNN method denoted by DBKNN. Experimental results show that the DBKNN algorithm can better serve classification requests for large sets of unevenly distributed documents.
Keywords :
Internet; learning (artificial intelligence); pattern classification; text analysis; KNN text classification algorithm; Web document classification; density based KNN algorithm; uneven text distribution; Algorithm design and analysis; Classification algorithms; Equations; Mathematical model; Support vector machine classification; Text categorization; Training; KNN; Text classification; VSM; decision function;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-61284-203-5
DOI :
10.1109/CCIS.2011.6045043