Title :
Unsupervised word sense disambiguation and rules extraction using non-aligned bilingual corpus
Author :
Oliveira, Francisco ; Wong, Fai ; Li, Yiping ; Zheng, Jie
Author_Institution :
Fac. of Sci. & Technol., Univ. of Macau, China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
This paper presents a statistical word sense disambiguation with application in Portuguese-Chinese machine translation systems. Due to the limited availability of Portuguese-Chinese resources in the form of digital corpora and annotated Treebank, an unsupervised learning and a non-aligned bilingual corpus are applied. The proposed method first identifies words related to each of the ambiguous words based on their surrounding words and relative distance. A mathematical model is then applied in the identification of the most suitable sense of an ambiguous word in terms of the related words. All the senses discovered are converted into a set of rules and stored in the sense knowledge base for later use in disambiguation and translation process. Preliminary experiment results show an improvement of 6% in assigning correctly the corresponding translation over the baseline method.
Keywords :
data mining; language translation; natural languages; unsupervised learning; Portuguese-Chinese machine translation system; annotated Treebank; digital corpora; mathematical model; natural language processing; nonaligned bilingual corpus; rules extraction; sense knowledge base; unsupervised learning; word sense disambiguation; Automation; Availability; Dictionaries; Humans; Information retrieval; Mathematical model; Natural language processing; Natural languages; Paper technology; Unsupervised learning; Machine Translation; Natural Language Processing; Word Sense Disambiguation;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598702