DocumentCode :
2991578
Title :
TKNN: An Improved KNN Algorithm Based on Tree Structure
Author :
Juan, Li
Author_Institution :
Sch. of Distance Educ., Shaanxi Normal Univ., Xi´´an, China
fYear :
2011
fDate :
3-4 Dec. 2011
Firstpage :
1390
Lastpage :
1394
Abstract :
Text classification is the process of assigning document to a set of previously fixed categories. It is widely used in many applications, such as web page categorization, email spam filtering, and document indexing, etc. Many popular algorithms for text classification have been proposed, such as Naive Bayes, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). However, these classification approaches do not perform well in multi-class text classification because they are well relied on linear classifiers. KNN is a simple and mature algorithm, but it cannot effectively solve the problem of overlapped categories borders, unbalanced class samples, k value determination, and overlarge search space. In this paper, we propose a new TKNN that absorb tree structure and adaptive k value method based on classical KNN algorithm. TKNN can overcome the shortcoming of KNN and improve the performance of multi-class text classification. Then the theoretical analysis and experimental results show TKNN can greatly enhance the classification efficiency than KNN.
Keywords :
pattern classification; support vector machines; text analysis; tree data structures; KNN algorithm; TKNN; Web page categorization; document assignment; document indexing; email spam filtering; fixed categories; k-nearest neighbor; linear classifiers; naive Bayes; support vector machine; text classification; tree structure; Accuracy; Algorithm design and analysis; Buildings; Classification algorithms; Complexity theory; Text categorization; Training; KNN; TKNN; penalty parameter; tree structure; unbalanced class samples;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Security (CIS), 2011 Seventh International Conference on
Conference_Location :
Hainan
Print_ISBN :
978-1-4577-2008-6
Type :
conf
DOI :
10.1109/CIS.2011.310
Filename :
6128351
Link To Document :
بازگشت