Title :
Some Effective Techniques for Naive Bayes Text Classification
Author :
Kim, Sang-Bum ; Han, Kyoung-Soo ; Rim, Hae-Chang ; Myaeng, Sung Hyon
Author_Institution :
Dept. of Comput. Sci. & Eng., Korea Univ., Seoul
Abstract :
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM
Keywords :
Bayes methods; classification; data mining; learning (artificial intelligence); natural language processing; parameter estimation; text analysis; automatic text classification problem; data mining tasks; feature weighting method; naive Bayes text classification; natural language; parameter estimation process; per-document text normalization; Data mining; Frequency; Learning systems; Natural languages; Parameter estimation; Probability; Statistical learning; Support vector machine classification; Support vector machines; Text categorization; Poisson model; Text classification; feature weighting.; naive Bayes classifier;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2006.180