DocumentCode :
2137383
Title :
An evolutionary approach to automatic Chinese text segmentation
Author :
Dong Zhang
Author_Institution :
Fac. of Arts, Comput., Eng. & Sci., Sheffield Hallam Univ., Sheffield, UK
fYear :
2013
fDate :
23-25 July 2013
Firstpage :
771
Lastpage :
776
Abstract :
Textual information written in Chinese now represents a huge knowledge repository. The first step of managing and processing information in written Chinese text is segmentation. A new method for automatic Chinese text segmentation using evolutionary algorithms and Web search statistical data is outlined. This proposed method considers Web text a de facto corpus that updates automatically, thus eliminating the need for statistics training. It treats the segmentation as a process that finds out the best probability of how individual characters are combined into sentences, paragraphs, and articles, thus producing segmentation results that are tailored to the text in question and are independent of segmentation standards.
Keywords :
Internet; evolutionary computation; information retrieval; natural language processing; probability; text analysis; Web search statistical data; articles; automatic Chinese text segmentation; de facto corpus; evolutionary algorithms; evolutionary approach; knowledge repository; paragraphs; probability; segmentation standards; sentences; textual information; Biological cells; Genetic algorithms; Pragmatics; Probability; Standards; Training data; Web search; Chinese information processing; Chinese text segmentation; genetic algorithm; n-best segmentations; statistical segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Computation (ICNC), 2013 Ninth International Conference on
Conference_Location :
Shenyang
Type :
conf
DOI :
10.1109/ICNC.2013.6818079
Filename :
6818079
Link To Document :
بازگشت