DocumentCode :
2041262
Title :
Dealing with unknowns in machine translation
Author :
Sinha, R.M.K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kanpur, India
Volume :
2
fYear :
2001
fDate :
2001
Firstpage :
940
Abstract :
An ´unknown´ is defined as a word for which there is no entry in the dictionary used by the translation system. In general, a text may contain several unknowns. These words may be names, acronyms, abbreviations, terminology or foreign words. It is a common practice in India to mix English words in Hindi and other Indian languages, and vice-versa. However, the grammatical rules in the construction of gender, number, verb nominalization or forms conform to those for the language used, irrespective of their origin. This gives rise to frequent encounters with unknown words in day-to-day communications. A machine translation system has to provide a mechanism for handling such unknowns. Spelling mistakes are yet another source that contributes to these unknowns. In this paper, we describe the strategy being adopted in our system for machine-aided translation from English to Hindi. No attempt has been made to expand the vocabulary by deriving the meaning of the unknown words. Instead, once an unknown is identified, a transliteration in Hindi with appropriate suffixes or appendages is used to substitute for their meaning. We use predictive parsing and a number of heuristics to identify the type of unknown
Keywords :
grammars; language translation; natural languages; English-Hindi machine translation; Indian languages; abbreviations; acronyms; appendages; dictionary; foreign words; gender; grammatical rules; heuristics; names; natural language processing; number; predictive parsing; spelling mistakes; suffixes; terminology; transliteration; unknown words; verb nominalization; vocabulary; word forms; word meaning; Computer science; Databases; Dictionaries; Engines; Knowledge acquisition; Natural language processing; Natural languages; Terminology; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics, 2001 IEEE International Conference on
Conference_Location :
Tucson, AZ
ISSN :
1062-922X
Print_ISBN :
0-7803-7087-2
Type :
conf
DOI :
10.1109/ICSMC.2001.973038
Filename :
973038
Link To Document :
بازگشت