Title :
Improved named entity translation and bilingual named entity extraction
Author :
Huang, Fei ; Vogel, Stephan
Author_Institution :
Interactive Syst. Labs., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Translation of named entities (NE), including proper names, temporal and numerical expressions, is very important in multilingual natural language processing, like crosslingual information retrieval and statistical machine translation. We present an integrated approach to extract a named entity translation dictionary from a bilingual corpus while at the same time improving the named entity annotation quality. Starting from a bilingual corpus where the named entities are extracted independently for each language, a statistical alignment model is used to align the named entities. An iterative process is applied to extract named entity pairs with higher alignment probability. This leads to a smaller but cleaner named entity translation dictionary and also to a significant improvement of the monolingual named entity annotation quality for both languages. Experimental result shows that the dictionary size is reduced by 51.8% and the annotation quality is improved from 70.03 to 78.15 for Chinese and 73.38 to 81.46 in terms of F-score.
Keywords :
computational linguistics; dictionaries; language translation; natural language interfaces; probability; Chinese; F-score; bilingual named entity extraction; crosslingual information retrieval; dictionary; experimental result; multilingual natural language processing; named entity annotation quality; named entity translation; probability; statistical alignment model; statistical machine translation; Costs; Data mining; Dictionaries; Humans; Information retrieval; Interactive systems; Natural language processing; Natural languages; Probability; Surface-mount technology;
Conference_Titel :
Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on
Print_ISBN :
0-7695-1834-6
DOI :
10.1109/ICMI.2002.1167002