• DocumentCode
    3228612
  • Title

    Mining the Web for Transliteration Lexicons: Joint-Validation Approach

  • Author

    Oh, Jong-Hoon ; Isahara, Hitoshi

  • Author_Institution
    Comput. Linguistics Group, Nat. Inst. of Inf. & Commun. Technol., Kyoto
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    254
  • Lastpage
    261
  • Abstract
    The Web provides the largest data collection, which reflects language use in daily life. With the advent of new technology and the flood of information on the Web, it has become quite common to create new terms supporting new concepts and translate these terms into non-Latin languages with "transliteration" referring to "translation by sound". Cross-language natural language processing applications, such as machine translation and cross-language information retrieval, usually need a translation dictionary, which affects the quality of the applications. However; the transliteration lexicons are usually unregistered in the translation dictionary. To address the problem, we present a transliteration lexicon acquisition model that mines the Web for transliteration lexicons. In this paper, we describe techniques of comparing phonetic-similarity to recognize transliteration pair candidates on the Web and of finding the correct transliteration pairs based on joint-validation. The techniques were evaluated against manually constructed transliteration lexicons. Our experiments revealed that the techniques effectively found transliteration lexicons on the Web
  • Keywords
    Internet; data mining; information retrieval; language translation; natural language processing; Web mining; cross-language information retrieval; cross-language natural language processing; joint-validation approach; machine translation; nonLatin language; translation dictionary; transliteration lexicon acquisition model; Computational linguistics; Costs; Dictionaries; Engines; Floods; Frequency; Information retrieval; Natural language processing; Natural languages; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2747-7
  • Type

    conf

  • DOI
    10.1109/WI.2006.120
  • Filename
    4061374