DocumentCode :
3659884
Title :
Evaluation of classification models for language processing
Author :
Zeynep Hilal Kilimci;Murat Can Ganiz
Author_Institution :
Computer Engineering Department, Dogus University, Istanbul, Turkey
fYear :
2015
Firstpage :
1
Lastpage :
8
Abstract :
Naïve Bayes is a commonly used algorithm in text categorization because of its easy implementation and low complexity. Naïve Bayes has mainly two event models used for text categorization which are multivariate Bernoulli and multinomial models. A very large number of studies choose multinomial model and Laplace smoothing just based on the assumption that it performs better than multivariate model under almost any conditions. This study aims to shed some light into this widely adopted assumption by analyzing Naïve Bayes event models and smoothing methods from a different perspective. To clarify the difference between events models of Naïve Bayes, their classification performance are compared on different languages - English and Turkish - datasets. Results of our extensive experiments demonstrate that superior performance of multinomial model does not observed all the time. On the other hand, multivariate Bernoulli model can perform well when combined with an appropriate smoothing method under different training data size conditions.
Keywords :
"Smoothing methods","Niobium","Computational modeling","Text categorization","Vocabulary","Training","Accuracy"
Publisher :
ieee
Conference_Titel :
Innovations in Intelligent SysTems and Applications (INISTA), 2015 International Symposium on
Type :
conf
DOI :
10.1109/INISTA.2015.7276787
Filename :
7276787
Link To Document :
بازگشت