Title :
Research and realization of naive Bayes English text classification method based on base noun phrase identification
Author :
Lv, Lin ; Liu, Yu-Shu
Author_Institution :
Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol.
Abstract :
To more advance classification accuracy of English texts, naive Bayes method based on base noun phrase (BaseNP) identification is presented. The rising maximum entropy model is applied to the identification. Firstly, use training corpus and user-defined feature templates to generate candidate features. Secondly, the feature selection algorithm computing feature gains is applied to select features. Finally, at the parameter estimation stage, the improved iterative scaling (IIS) algorithm is adopted. The experimental results show that this technique achieved precision and recall rates of roughly 93% for BaseNP identification and the classification accuracy is remarkably improved on this basis. It indicates that shallow parsing of high accuracy is very helpful to text classification
Keywords :
Bayes methods; classification; maximum entropy methods; natural languages; text analysis; base noun phrase identification; feature selection algorithm; improved iterative scaling; maximum entropy model; naive Bayes English text classification; Data mining; Decision trees; Entropy; Iterative algorithms; Natural languages; Nearest neighbor searches; Parameter estimation; Support vector machine classification; Support vector machines; Text categorization; Base Noun Phrase; Maximum Entropy Model; Naïve Bayes; Phrase Identification; Text Classification;
Conference_Titel :
Information and Communications Technology, 2005. Enabling Technologies for the New Knowledge Society: ITI 3rd International Conference on
Conference_Location :
Cairo
Print_ISBN :
0-7803-9270-1
DOI :
10.1109/ITICT.2005.1609667