Title :
Developing an effective Thai Document Categorization Framework base on term relevance frequency weighting
Author :
Chirawichitchai, Nivet ; Sa-nguansat, Parinya ; Meesad, Phayung
Author_Institution :
Dept. of Inf. Technol., King Mongkut´´s Univ. of Technol., Bangkok, Thailand
Abstract :
Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is an important preprocessing technique in text categorization. In this paper, we purpose Thai Document Categorization Framework focusing on the comparison of various term weighting schemes, including Boolean, tf, tf-idf, tfc, ltc entropy and tf-rf weighting. We have evaluated these methods on Thai news article corpus with three supervised learning classifiers. We found tf-rf weighting most effective in our experiments with SVM NB and DT algorithms. Based on our experiments, using tf-rf weighting with SVM algorithm yielded the best performance with the F-measure equaling 95.9%.
Keywords :
learning (artificial intelligence); support vector machines; text analysis; Boolean weighting; SVM algorithm; Thai document categorization framework; ltc entropy weighting; supervised learning classifiers; term relevance frequency weighting; text categorization; tf weighting; tf-idf weighting; tf-rf weighting; tfc weighting; Classification algorithms; Entropy; Machine learning; Niobium; Support vector machines; Text categorization; Training; Supervised Learning; Term weighting; Text Categorization;
Conference_Titel :
Knowledge Engineering, 2010 8th International Conference on ICT and
Conference_Location :
Bangkok
Print_ISBN :
978-1-4244-9874-1
Electronic_ISBN :
2157-0981
DOI :
10.1109/ICTKE.2010.5692907