DocumentCode :
2281597
Title :
Implementing News Article Category Browsing Based on Text Categorization Technique
Author :
Haruechaiyasak, Choochart ; Jitkrittum, Wittawat ; Sangkeettrakarn, Chatchawal ; Damrongrat, Chaianun
Author_Institution :
Nat. Electron. & Comput. Technol. Center (NECTEC), Human Language Technol. Lab. (HLT), Pathumthani
Volume :
3
fYear :
2008
fDate :
9-12 Dec. 2008
Firstpage :
143
Lastpage :
146
Abstract :
We propose a feature called category browsing to enhance the full-text search function of Thai-language news article search engine. The category browsing allows users to browse and filter search results based on some predefined categories. To implement the category browsing feature, we applied and compared among several text categorization algorithms including decision tree, Naive Bayes (NB) and Support Vector Machines (SVM). To further increase the performance of text categorization, we performed evaluation among many feature selection techniques including document frequency thresholding (DF), information gain (IG) and x2 (CHI). Based on our experiments using a large news corpus, the SVM algorithm with the IG feature selection yielded the best performance with the F1 measure equal to 95.42%.
Keywords :
decision trees; information filtering; natural language processing; online front-ends; search engines; support vector machines; text analysis; Naive Bayes; SVM algorithm; Thai-language news article search engine; article category browsing; decision tree; document frequency thresholding; feature selection techniques; full-text search function; information gain; support vector machines; text categorization technique; Decision trees; Filters; Frequency; Humans; Intelligent agent; Laboratories; Natural languages; Search engines; Support vector machines; Text categorization; Text categorization; classification algorithm; feature selection; search engine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-0-7695-3496-1
Type :
conf
DOI :
10.1109/WIIAT.2008.61
Filename :
4740747
Link To Document :
بازگشت