• DocumentCode
    18359
  • Title

    Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features

  • Author

    Jinhan Kim ; Seung-won Hwang ; Long Jiang ; Young-In Song ; Ming Zhou

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pohang Univ. of Sci. & Technol. (POSTECH), Pohang, South Korea
  • Volume
    25
  • Issue
    8
  • fYear
    2013
  • fDate
    Aug. 2013
  • Firstpage
    1787
  • Lastpage
    1800
  • Abstract
    This paper addresses the problem of mining named entity translations from comparable corpora, specifically, mining English and Chinese named entity translation. We first observe that existing approaches use one or more of the following named entity similarity metrics: entity, entity context, and relationship. Motivated by this observation, we propose a new holistic approach by 1) combining all similarity types used and 2) additionally considering relationship context similarity between pairs of named entities, a missing quadrant in the taxonomy of similarity metrics. We abstract the named entity translation problem as the matching of two named entity graphs extracted from the comparable corpora. Specifically, named entity graphs are first constructed from comparable corpora to extract relationship between named entities. Entity similarity and entity context similarity are then calculated from every pair of bilingual named entities. A reinforcing method is utilized to reflect relationship similarity and relationship context similarity between named entities. We also discover "latent" features lost in the graph extraction process and integrate this into our framework. According to our experimental results, our holistic graph-based approach and its enhancement using corpus latent features are highly effective and our framework significantly outperforms previous approaches.
  • Keywords
    data mining; graph theory; natural language processing; Chinese named entity translation; English named entity translation; bilingual named entities; comparable corpora; corpus latent features; graph extraction process; graph mapping; holistic graph-based approach; named entity graphs; named entity translation mining; reinforcing method; relationship context similarity; Context; Data mining; Dictionaries; Feature extraction; Measurement; Vectors; Web sites; Data mining; text mining;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.117
  • Filename
    6216378