DocumentCode :
2838673
Title :
SIMD k-ary Search Based Chinese Word Segmentation
Author :
Jia, Yunjie ; Lei, Yongmei ; Zhang, Zhuo ; Fang, Yun
Author_Institution :
Dept. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
Volume :
3
fYear :
2011
fDate :
26-27 Nov. 2011
Firstpage :
387
Lastpage :
390
Abstract :
Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.
Keywords :
information management; natural language processing; parallel processing; search engines; word processing; Chinese word segmentation; SIMD k-ary search; SIMD processing unit; information management; search engine; Algorithm design and analysis; Arrays; Dictionaries; Encoding; Particle separators; Partitioning algorithms; Program processors; Chinese Word Segmentation; SIMD; k-ary search; second character list;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Management, Innovation Management and Industrial Engineering (ICIII), 2011 International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-61284-450-3
Type :
conf
DOI :
10.1109/ICIII.2011.375
Filename :
6116888
Link To Document :
بازگشت