Title :
One Sense per N-gram
Author :
Pengyuan Liu ; Shui Liu ; Shiqi Li ; Shiwen Yu
Author_Institution :
Inst. of Comput. Linguistics, Peking Univ., Beijing, China
fDate :
Aug. 31 2010-Sept. 3 2010
Abstract :
This paper presents a novel supposition, One Sense Per N-gram (N > 1), which we believe is appropriate for more linguistic phenomena and can serve as a general version instead of the celebrated One Sense Per Collocation supposition, at least in Chinese language. This new supposition is based on our observation of the error detection process of annoted sense in People´s Daily that are tagged by an automatic WSD system. Our preliminary experiment on Chinese Word Sense Tagging Corpus shows that it holds with over 85.9% agreement for both nouns and verbs. Based on the supposition we build a prototype naïve Bayes WSD system and tested on Multilingual Chinese-English Lexical Sample task (MCELS) in Semeval-2007. Experimental results show our prototype system can promote the performance of baseline system by 2.7%.
Keywords :
Bayes methods; natural language processing; Chinese language; Chinese word sense tagging corpus; People´s Daily; Semeval-2007; automatic word sense disambiguation system; error detection process; linguistic phenomena; multilingual Chinese-English lexical sample task; naïve Bayes word sense disambiguation system; one sense per N-gram; Conferences; Context; Entropy; Prototypes; Semantics; Tagging; Training; One sense per N-gram; language model; word sense disambiguation; word sense tagging;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4244-8482-9
Electronic_ISBN :
978-0-7695-4191-4
DOI :
10.1109/WI-IAT.2010.268