DocumentCode
402857
Title
A simple and efficient classifying algorithm
Author
Wang, Jian-hui ; Zhou, Shui-geng ; Hu, Yun-fa
Author_Institution
Dept. of Comput. & Inf. Technol., Fudan Univ., Shanghai, China
Volume
1
fYear
2003
fDate
2-5 Nov. 2003
Firstpage
51
Abstract
Most of the present classifying methods are based on VSM (the vector space model), of which the widely used method is kNN (the k-nearest neighbors). But most of them are highly complicated on computation, and cannot be used on the occasion of classifying a large number of specimen and the classifier must be rebuilt to increment the training corpora in order to have tough scalability. Two new notions, mutual dependence and equivalent radius, are put forward in this paper. And then a new classifying algorithm based on the two notions, SECTILE is offered in this paper. Later SECTILE is applied to classifying Chinese documents and compared to kNN and CCC methods. The experimental results suggests that SECTILE outperforms kNN and CCC methods, and can be used online to classify a large number of specimen and has good scalability, while the precision and recall of classification are kept.
Keywords
pattern classification; text analysis; Chinese documents; classifying methods; equivalent radius; k-nearest neighbors; mutual dependence; simple and efficient algorithm to classify texts based on equivalent radius and mutual dependence; Classification algorithms; Concrete; Cybernetics; Electronic mail; Information technology; Machine learning; Mutual information; Natural language processing; Scalability; Space technology;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN
0-7803-8131-9
Type
conf
DOI
10.1109/ICMLC.2003.1264441
Filename
1264441
Link To Document