Title :
Word Sense Disambiguation method based on probability model improved by information gain
Author :
Fan, Dongmei ; Lu, Zhimao ; Zhang, Rubo ; Li, Xueyao
Author_Institution :
Harbin Eng. Univ., Harbin
Abstract :
Word sense disambiguation (WSD) has always being a key problem and one of difficult points in natural language processing. WSD is usually considered to be a pattern classification to be research. Feature selection is an important sector of WSD process. We review naive Bayes model (NBM) seriously, and the feature selection method adopted in this paper is directed at Bayesian Assumption to improve NBM. Positional information concealed in the context of ambiguous word is mined via information gain calculation, to increase the knowledge acquisition efficiency of Bayesian model and to improve the effect of word-sense classification. Eight ambiguous words are tested in our experiment; the experimental results of improved Bayesian model are higher 3.5 per cent than the ones of NBM. The accuracy rise is bigger and the improvement effect is outstanding; and these results prove also the method put forward in this paper is efficacious.
Keywords :
Bayes methods; feature extraction; knowledge acquisition; natural language processing; pattern classification; probability; text analysis; Bayesian assumption; ambiguous word; feature selection; information gain; knowledge acquisition; naive Bayes model; natural language processing; pattern classification; positional information; probability model; word-sense classification; Automation; Bayesian methods; Context modeling; Intelligent control; Knowledge acquisition; Mutual information; Natural language processing; Pattern classification; Support vector machines; Testing; Bayesian model; Information gain; Natural language processing; Word sense disambiguation;
Conference_Titel :
Intelligent Control and Automation, 2008. WCICA 2008. 7th World Congress on
Conference_Location :
Chongqing
Print_ISBN :
978-1-4244-2113-8
Electronic_ISBN :
978-1-4244-2114-5
DOI :
10.1109/WCICA.2008.4593315