Title :
SIMD k-ary Search Based Chinese Word Segmentation
Author :
Jia, Yunjie ; Lei, Yongmei ; Zhang, Zhuo ; Fang, Yun
Author_Institution :
Dept. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
Abstract :
Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.
Keywords :
information management; natural language processing; parallel processing; search engines; word processing; Chinese word segmentation; SIMD k-ary search; SIMD processing unit; information management; search engine; Algorithm design and analysis; Arrays; Dictionaries; Encoding; Particle separators; Partitioning algorithms; Program processors; Chinese Word Segmentation; SIMD; k-ary search; second character list;
Conference_Titel :
Information Management, Innovation Management and Industrial Engineering (ICIII), 2011 International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-61284-450-3
DOI :
10.1109/ICIII.2011.375