• DocumentCode
    2910560
  • Title

    Automatic Acquisition of Chinese-Tibetan Multi-word Equivalent Pair from Bilingual Corpora

  • Author

    Nuo, Minghua ; Liu, Huidan ; Ma, Longlong ; Wu, Jian ; Ding, Zhiming

  • Author_Institution
    Inst. of Software, Beijing, China
  • fYear
    2011
  • fDate
    15-17 Nov. 2011
  • Firstpage
    177
  • Lastpage
    180
  • Abstract
    This paper aims to construct Chinese-Tibetan multi-word equivalent pair dictionary for Chinese-Tibetan computer-aided translation system. Since Tibetan is a morphologically rich language, we propose two-phase framework to automatically extract multi-word equivalent pairs. First, extract Chinese Multi-word Units (MWUs). In this phase, we propose CBEM model to partition a Chinese sentence into MWUs using two measures of collocation and binding degree. Second, get Tibetan translations of the extracted Chinese MWUs. In the second phase, we propose TSIM model to focus on extracting 1-to-n bilingual MWUs. Preliminary experimental results show that the mixed method combining CBEM model with TSIM model is effective.
  • Keywords
    computational linguistics; language translation; natural language processing; text analysis; word processing; CBEM model; Chinese multiword units; Chinese-Tibetan computer-aided translation system; Chinese-Tibetan multiword equivalent pair dictionary; TSIM model; automatic Chinese-Tibetan multiword equivalent pair acquisition; bilingual corpora; two-phase framework; Analytical models; Computational modeling; Data mining; Dictionaries; Entropy; Information processing; Morphology; Multi-Word Unit; Tibetan information processing; collocation; machine translation; sequence intersection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2011 International Conference on
  • Conference_Location
    Penang
  • Print_ISBN
    978-1-4577-1733-8
  • Type

    conf

  • DOI
    10.1109/IALP.2011.33
  • Filename
    6121497