Author_Institution :
Internet Commerce Security Lab. (ICSL), Univ. of Ballarat, Ballarat, VIC, Australia
Abstract :
The resolution of lexical ambiguity in machine translation systems often involves the automated, on-line selection of the correct sense of polysemous target words in the context of a clause, phrase or sentence. However, the performance of machine translation systems in emulating this aspect of human language processing has not been entirely successful, to the extent that resolution of entities and terms in natural language could be automated for open source intelligence analysis. Whilst some of these systems confine themselves to processing domain-specific knowledge (e.g., medical terminology), with some success, the popular general-purpose direct translation systems now freely available on the World Wide Web (WWW) are investigated for characteristic semantic processing errors in this study. A ubiquitous sentence ("The quick brown fox jumps over the lazy dog"), an equative metaphor, and a simile are translated into four romance and one Germanic language, with the translation then inverted back to English using the same translation system. It is found that in addition to expected differences in correctly mapping shades of meaning (e.g., "quick" is mapped to "fast"), some spatial meanings are incorrectly transformed, especially for verbs (e.g., "jumps over" becomes "branches over" or "jumps on"). The most serious error is the addition of extra semantic features to individual words, particularly features associated with nouns (e.g., the gender-neutral "fox" becomes the female "vixen"). The implications of these types of errors for the automatic translation of human language - with respect to semantic representation in open source intelligence -- are discussed.
Keywords :
grammars; language translation; natural language processing; public domain software; semantic Web; ubiquitous computing; Germanic language; WWW; World Wide Web; automated allegory resolution; automatic human language translation; characteristic semantic processing errors; domain-specific knowledge; entities resolution; equative metaphor; human language processing; lexical ambiguity; machine translation systems; natural language; on-line selection; open source intelligence; open source intelligence analysis; polysemous target words sense; popular general-purpose direct translation systems; semantic features; semantic representation; ubiquitous sentence; Open source intelligence; metaphor; polysemy;