Title :
Building naive bayes document classifier using word clusters based on bootstrap averaging
Author :
Wang Yuanzhe ; Zhang Qiang ; Bai Liyuan
Author_Institution :
Inst. of Inf. Eng., Wuhan Univ. of Technol., Wuhan, China
Abstract :
Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive Bayes document classifier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive Bayes documents classifier on word clusters or on words.
Keywords :
Bayes methods; document handling; bootstrap averaging; bootstrap sampling; distribution estimation; naive Bayes document classifier; semantic cluster labels; word clusters; Classification algorithms; Clustering algorithms; Data mining; Machine learning; Mutual information; Parameter estimation; Probability distribution; Sampling methods; Sorting;
Conference_Titel :
IT in Medicine & Education, 2009. ITIME '09. IEEE International Symposium on
Conference_Location :
Jinan
Print_ISBN :
978-1-4244-3928-7
Electronic_ISBN :
978-1-4244-3930-0
DOI :
10.1109/ITIME.2009.5236431