• DocumentCode
    3074635
  • Title

    A Pragmatic Approach to Increase Accuracy of Chinese Word-Segmentation

  • Author

    Wenyu, Chen ; Biao, Chen ; Tao, Xiang ; Zhongquan, Zhang

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
  • Volume
    1
  • fYear
    2010
  • fDate
    16-18 July 2010
  • Firstpage
    389
  • Lastpage
    391
  • Abstract
    Chinese word segmentation is important for understanding and dealing with Chinese natural language, and it is also a important part of search engineer, text retrieval, speech recognition, automatic translation. Chinese word segmentation is challenging because there is no space or physical means to mark the boundaries of words. It is often difficult to define what constitutes a word in Chinese. Currently, we have not yet fully mature and practical-oriented available Chinese word segmentation system, especially in the word-segmentation accuracy. This article presents a pragmatic approach to Chinese word segmentation to increase the word-segmentation accuracy. We introduce the combining mechanism of hybrid dictionary and universal dictionary, we design the practical data structure and describe this word segmentation algorithm, and give the test results.
  • Keywords
    character recognition; dictionaries; image segmentation; language translation; natural language processing; word processing; Chinese natural language; automatic translation; chinese word segmentation; data structure; hybrid dictionary; search engineer; speech recognition; text retrieval; universal dictionary; Accuracy; Arrays; Computational modeling; Dictionaries; History; Indexes; Pragmatics; Chinese word segmentation; hybrid dictionary; search engineer; word-segmentation accuracy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology and Applications (IFITA), 2010 International Forum on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4244-7621-3
  • Electronic_ISBN
    978-1-4244-7622-0
  • Type

    conf

  • DOI
    10.1109/IFITA.2010.262
  • Filename
    5635012