Title :
Chinese word sense disambiguation by combining pseudo training data
Author :
Wang, Xiaojie ; Matsumoto, Yuji
Author_Institution :
Graduate Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Japan
Abstract :
In supervised methods of word sense disambiguation, sense tagged samples for training classifiers are needed. Since sense tagging for ambiguous words is expensive and labor intensive, it is worth looking for some reasonable substitutes. We suggest a kind of substitute for Chinese sense tagged data, which is called pseudo training data. The suggestion is based on a linguistic phenomenon in Chinese, which some multicharacter-words inherit only one sense and some syntactical features from ambiguous one-character-words. Data derived from unambiguous multicharacter-words are employed as pseudo training data. Pseudo training data have an advantage of being able to be collected automatically. Our experiments show that classifiers trained by not too much pseudo training data outperform classifiers trained by small quantities of sense tagged samples for Chinese ambiguous word senses. Further experiments show combination of a small set of tagged data and a large quantities pseudo training data is a more promising way to word sense disambiguation.
Keywords :
linguistics; natural languages; word processing; Chinese word sense disambiguation; linguistic phenomenon; pseudo training data; sense tagged samples; training classifiers; Humans; Information retrieval; Information science; Natural language processing; Tagging; Training data;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275884