DocumentCode :
1593056
Title :
An Ensemble Text Classification Model Combining Strong Rules and N-Gram
Author :
Liu, Jinhong ; Lu, Yuliang
Author_Institution :
Electron. Eng. Inst., Hefei
Volume :
3
fYear :
2007
Firstpage :
535
Lastpage :
539
Abstract :
Most text classification methods apply machine learning methods, while ignored traditional rules-based methods. In this paper, we propose an ensemble text classification model combining classification rules and N-gram language model. In order to generate strong classification rules, we propose an exhaustive noun-phrase extraction algorithm and a new optimized rule induction algorithm, called SCA (strong covering algorithms). We also introduce an improved good-Turing (GT) smoothing method for N-gram model. Experimental results show that our ensemble classifier achieves an approximately 8% improvement as compared to bi-gram with word-based classifier and 15% improvement as compared to traditional rules-based classifier. In conclusion, the accuracy of ensemble classification model is competent to only a single classification models.
Keywords :
Turing machines; learning (artificial intelligence); pattern classification; smoothing methods; text analysis; N-gram language model; classification rules; ensemble text classification model; good-Turing smoothing method; machine learning methods; noun-phrase extraction algorithm; optimized rule induction algorithm; strong covering algorithms; Classification tree analysis; Induction generators; Learning systems; Machine learning; Machine learning algorithms; Ontologies; Optimization methods; Smoothing methods; Statistical learning; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Computation, 2007. ICNC 2007. Third International Conference on
Conference_Location :
Haikou
Print_ISBN :
978-0-7695-2875-5
Type :
conf
DOI :
10.1109/ICNC.2007.198
Filename :
4344570
Link To Document :
بازگشت