• DocumentCode
    3316786
  • Title

    Unsupervised word sense disambiguation and rules extraction using non-aligned bilingual corpus

  • Author

    Oliveira, Francisco ; Wong, Fai ; Li, Yiping ; Zheng, Jie

  • Author_Institution
    Fac. of Sci. & Technol., Univ. of Macau, China
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    30
  • Lastpage
    35
  • Abstract
    This paper presents a statistical word sense disambiguation with application in Portuguese-Chinese machine translation systems. Due to the limited availability of Portuguese-Chinese resources in the form of digital corpora and annotated Treebank, an unsupervised learning and a non-aligned bilingual corpus are applied. The proposed method first identifies words related to each of the ambiguous words based on their surrounding words and relative distance. A mathematical model is then applied in the identification of the most suitable sense of an ambiguous word in terms of the related words. All the senses discovered are converted into a set of rules and stored in the sense knowledge base for later use in disambiguation and translation process. Preliminary experiment results show an improvement of 6% in assigning correctly the corresponding translation over the baseline method.
  • Keywords
    data mining; language translation; natural languages; unsupervised learning; Portuguese-Chinese machine translation system; annotated Treebank; digital corpora; mathematical model; natural language processing; nonaligned bilingual corpus; rules extraction; sense knowledge base; unsupervised learning; word sense disambiguation; Automation; Availability; Dictionaries; Humans; Information retrieval; Mathematical model; Natural language processing; Natural languages; Paper technology; Unsupervised learning; Machine Translation; Natural Language Processing; Word Sense Disambiguation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598702
  • Filename
    1598702