• DocumentCode
    2838673
  • Title

    SIMD k-ary Search Based Chinese Word Segmentation

  • Author

    Jia, Yunjie ; Lei, Yongmei ; Zhang, Zhuo ; Fang, Yun

  • Author_Institution
    Dept. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
  • Volume
    3
  • fYear
    2011
  • fDate
    26-27 Nov. 2011
  • Firstpage
    387
  • Lastpage
    390
  • Abstract
    Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.
  • Keywords
    information management; natural language processing; parallel processing; search engines; word processing; Chinese word segmentation; SIMD k-ary search; SIMD processing unit; information management; search engine; Algorithm design and analysis; Arrays; Dictionaries; Encoding; Particle separators; Partitioning algorithms; Program processors; Chinese Word Segmentation; SIMD; k-ary search; second character list;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Management, Innovation Management and Industrial Engineering (ICIII), 2011 International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-61284-450-3
  • Type

    conf

  • DOI
    10.1109/ICIII.2011.375
  • Filename
    6116888