Title :
Minimum Normalized Google Distance for Unsupervised Multilingual Chinese-English Word Sense Disambiguation
Author :
Liu, Pengyuan ; Xue, Yongzeng ; Li, Shiqi ; Liu, Shui
Author_Institution :
Appl. Linguistics RApplied Linguistics Res. Instituteesearch Inst., Beijing Language & Culture Univ., Beijing, China
Abstract :
This paper introduces normalized Google distance into the study of word sense disambiguation and presents a novel unsupervised method of word sense disambiguation. The normalized Google distance is a theory of similarity between words and phrases, based on information distance and Kolmogorov complexity by using the world-wide-web as database, with its page counts derived from a search engine such as Google. This unsupervised method regards the word sense disambiguation as a process of searching minimum normalized Google distance between n-gram and the translation or synonym of the target word, based on the supposition that one sense per n-gram. Our System is tested on Multilingual Chinese-English Lexical Sample task in Semeval-2007. Experimental result shows that our method outperforms the best competing system. Our Experiment on nouns of this dataset also gives a promising result.
Keywords :
Internet; language translation; natural language processing; search engines; word processing; Kolmogorov complexity; World-Wide-Web; information distance; minimum normalized Google distance; multilingual Chinese-English lexical sample task; search engine; unsupervised multilingual Chinese-English word sense disambiguation; Conferences; Context; Dictionaries; Google; Pragmatics; Search engines; Semantics; Normalized Google distance; one sense per n-gram; unsupervised word sense disambiguation;
Conference_Titel :
Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4244-8891-9
Electronic_ISBN :
978-0-7695-4281-2
DOI :
10.1109/ICGEC.2010.69