DocumentCode :
2184596
Title :
Developing an effective Thai Document Categorization Framework base on term relevance frequency weighting
Author :
Chirawichitchai, Nivet ; Sa-nguansat, Parinya ; Meesad, Phayung
Author_Institution :
Dept. of Inf. Technol., King Mongkut´´s Univ. of Technol., Bangkok, Thailand
fYear :
2010
fDate :
24-25 Nov. 2010
Firstpage :
19
Lastpage :
23
Abstract :
Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is an important preprocessing technique in text categorization. In this paper, we purpose Thai Document Categorization Framework focusing on the comparison of various term weighting schemes, including Boolean, tf, tf-idf, tfc, ltc entropy and tf-rf weighting. We have evaluated these methods on Thai news article corpus with three supervised learning classifiers. We found tf-rf weighting most effective in our experiments with SVM NB and DT algorithms. Based on our experiments, using tf-rf weighting with SVM algorithm yielded the best performance with the F-measure equaling 95.9%.
Keywords :
learning (artificial intelligence); support vector machines; text analysis; Boolean weighting; SVM algorithm; Thai document categorization framework; ltc entropy weighting; supervised learning classifiers; term relevance frequency weighting; text categorization; tf weighting; tf-idf weighting; tf-rf weighting; tfc weighting; Classification algorithms; Entropy; Machine learning; Niobium; Support vector machines; Text categorization; Training; Supervised Learning; Term weighting; Text Categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Knowledge Engineering, 2010 8th International Conference on ICT and
Conference_Location :
Bangkok
ISSN :
2157-0981
Print_ISBN :
978-1-4244-9874-1
Electronic_ISBN :
2157-0981
Type :
conf
DOI :
10.1109/ICTKE.2010.5692907
Filename :
5692907
Link To Document :
بازگشت