DocumentCode
3773455
Title
Segmentation of Chinese Web Text Based on Spark
Author
Jiazhen Xu
Author_Institution
Univ. of Electron. Sci. &
Volume
1
fYear
2015
Firstpage
200
Lastpage
203
Abstract
Massive amounts of data generated by network to be analysed and processed on a computer takes plenty of time. It can not meet people´s needs. In order to break through the bottleneck of the speed of segmentation, this paper uses the spark cluster, and applies the spark programming ideas to the processing of Chinese word segmentation, so that the Chinese word segmentation technology is implemented in the distributed platform. The research can be based on the guarantee of the accuracy of the original word segmentation and improve the processing speed of Chinese word segmentation significantly, and it is feasible and effective to deal with large amount of Chinese information.
Keywords
"Sparks","Data processing","Dictionaries","Programming","Distributed databases","Internet","Computer architecture"
Publisher
ieee
Conference_Titel
Computational Intelligence and Design (ISCID), 2015 8th International Symposium on
Print_ISBN
978-1-4673-9586-1
Type
conf
DOI
10.1109/ISCID.2015.250
Filename
7468933
Link To Document