• DocumentCode
    477923
  • Title

    A Research on Length Based Sentence Alignment for Chinese-English Parallel Corpus

  • Author

    Zan, Hongying ; Zhang, Xia ; Fan, Ming

  • Author_Institution
    Coll. of Inf. & Eng., Zhengzhou Univ., Zhengzhou
  • Volume
    4
  • fYear
    2008
  • fDate
    18-20 Oct. 2008
  • Firstpage
    145
  • Lastpage
    149
  • Abstract
    Many existing length based Chinese-English sentence alignment methods compute sentence length in terms of the number of bytes. In this paper, we examine the effectiveness of six different ways of sentence length computation, which take, respectively, the number of verbs, nouns, adjectives, content words, bytes and all words in a sentence as its length. Most previous methods are found memory consuming and inefficient. This paper proposes an alignment method to save memory and time via grouping sentence for alignment. Our experimental results show that taking all words into account in the sentence length computation can further enhance alignment performance, giving 99.01% precision and 99.5% recall, respectively.
  • Keywords
    natural language processing; Chinese-English parallel corpus; length based sentence alignment; Concurrent computing; Dictionaries; Educational institutions; Fuzzy systems; Heuristic algorithms; Knowledge engineering; Large scale integration; Natural languages; Performance analysis; Terminology; NLP; parallel corpus; sentence alignment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
  • Conference_Location
    Jinan Shandong
  • Print_ISBN
    978-0-7695-3305-6
  • Type

    conf

  • DOI
    10.1109/FSKD.2008.307
  • Filename
    4666373