DocumentCode :
3319602
Title :
Comparing SVM and naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment
Author :
Hassan, Sundus ; Rafi, Muhammad ; Shaikh, Muhammad Shahid
Author_Institution :
Comput. Sci. Dept., NUCES-FAST, Karachi, Pakistan
fYear :
2011
fDate :
22-24 Dec. 2011
Firstpage :
31
Lastpage :
34
Abstract :
The activity of labeling of documents according to their content is known as text categorization. Many experiments have been carried out to enhance text categorization by adding background knowledge to the document using knowledge repositories like Word Net, Open Project Directory (OPD), Wikipedia and Wikitology. In our previous work, we have carried out intensive experiments by extracting knowledge from Wikitology and evaluating the experiment on Support Vector Machine with 10- fold cross-validations. The results clearly indicate Wikitology is far better than other knowledge bases. In this paper we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers under text enrichment through Wikitology. We validated results with 10-fold cross validation and shown that NB gives an improvement of +28.78%, on the other hand SVM gives an improvement of +636% when compared with baseline results. Naïve Bayes classifier is better choice when external enriching is used through any external knowledge base.
Keywords :
Bayes methods; Web sites; classification; knowledge acquisition; knowledge based systems; support vector machines; text analysis; NB classifiers; OPD; Open Project Directory; SVM; Wikipedia; Wikitology; Word Net; background knowledge; documents labelling; external knowledge base; knowledge bases; knowledge enrichment; knowledge extraction; knowledge repository; naïve Bayes classifiers; support vector machine; text categorization enhancement; text enrichment; Electronic publishing; Encyclopedias; Internet; Niobium; Support vector machines; 20 News Group; Knowledge base; Machine Learning; Naïve Bay; Support Vector Machine; Text Categorization; Wikitology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multitopic Conference (INMIC), 2011 IEEE 14th International
Conference_Location :
Karachi
Print_ISBN :
978-1-4577-0654-7
Type :
conf
DOI :
10.1109/INMIC.2011.6151495
Filename :
6151495
Link To Document :
بازگشت