DocumentCode :
3339395
Title :
Building naive bayes document classifier using word clusters based on bootstrap averaging
Author :
Wang Yuanzhe ; Zhang Qiang ; Bai Liyuan
Author_Institution :
Inst. of Inf. Eng., Wuhan Univ. of Technol., Wuhan, China
Volume :
1
fYear :
2009
fDate :
14-16 Aug. 2009
Firstpage :
202
Lastpage :
207
Abstract :
Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive Bayes document classifier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive Bayes documents classifier on word clusters or on words.
Keywords :
Bayes methods; document handling; bootstrap averaging; bootstrap sampling; distribution estimation; naive Bayes document classifier; semantic cluster labels; word clusters; Classification algorithms; Clustering algorithms; Data mining; Machine learning; Mutual information; Parameter estimation; Probability distribution; Sampling methods; Sorting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
IT in Medicine & Education, 2009. ITIME '09. IEEE International Symposium on
Conference_Location :
Jinan
Print_ISBN :
978-1-4244-3928-7
Electronic_ISBN :
978-1-4244-3930-0
Type :
conf
DOI :
10.1109/ITIME.2009.5236431
Filename :
5236431
Link To Document :
بازگشت