DocumentCode
2838673
Title
SIMD k-ary Search Based Chinese Word Segmentation
Author
Jia, Yunjie ; Lei, Yongmei ; Zhang, Zhuo ; Fang, Yun
Author_Institution
Dept. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
Volume
3
fYear
2011
fDate
26-27 Nov. 2011
Firstpage
387
Lastpage
390
Abstract
Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.
Keywords
information management; natural language processing; parallel processing; search engines; word processing; Chinese word segmentation; SIMD k-ary search; SIMD processing unit; information management; search engine; Algorithm design and analysis; Arrays; Dictionaries; Encoding; Particle separators; Partitioning algorithms; Program processors; Chinese Word Segmentation; SIMD; k-ary search; second character list;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Management, Innovation Management and Industrial Engineering (ICIII), 2011 International Conference on
Conference_Location
Shenzhen
Print_ISBN
978-1-61284-450-3
Type
conf
DOI
10.1109/ICIII.2011.375
Filename
6116888
Link To Document