DocumentCode :
3108284
Title :
An Improved KNN Text Categorization Method Based on Spanning Tree Documents Clustering
Author :
Zheng, Nan ; Feng Guo-He ; Nan Zheng
Author_Institution :
Coll. of Sci., Hebei North Univ., Zhangjiakou, China
fYear :
2011
fDate :
16-18 Aug. 2011
Firstpage :
1
Lastpage :
5
Abstract :
For the shortcoming that K-Nearest Neighbor(KNN) classification method is not efficient and it is difficult to determined the optimal parameter value K, a new KNN classification method based on spanning tree document clustering is presented. The basic idea is that using the clustering algorithm based on spanning tree to realize automatic clustering, each sub-tree generated retain a few core document nodes after a few nodes is cut and the core nodes retained have been merged into a new document. When the experiment of classification is carried out, the similarity of document test is computed with center document of sub-tree and the category of document test is the category of sub-tree that it has largest similarity. Experiments show that proposed method is better than KNN in stability of classification ,meanwhile it improve the classification speed and avoid the choice of value of the parameter k.
Keywords :
pattern classification; pattern clustering; text analysis; trees (mathematics); KNN text categorization method; automatic clustering; classification stability; k-nearest neighbor classification method; spanning tree document clustering; subtree; Boosting; Clustering algorithms; Economics; Government; Internet; Software; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Internet Technology and Applications (iTAP), 2011 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-7253-6
Type :
conf
DOI :
10.1109/ITAP.2011.6006411
Filename :
6006411
Link To Document :
بازگشت