DocumentCode :
2539975
Title :
Auto-recognizing Letter-word Phrases in Chinese Texts
Author :
Zezhi, Zheng
Author_Institution :
Dept. of Chinese language & Literature, Xiamen Univ., Xiamen, China
fYear :
2010
fDate :
13-15 Dec. 2010
Firstpage :
371
Lastpage :
374
Abstract :
As a group of unknown words of Chinese information processing, the letter-word phrases used in Chinese texts can´t be identified correctly by the existed segmentation software. Here, an auto-tagging system of letter-word phrases based on rules and statistical data is presented. At first, the system scans the sentences to get letter-strings, and then takes every letter string as an anchor and scans its two sides, calls boundary words rules, LWP-component rules, exceptional-correct rules and collocation coefficient matrix in turn, to judge whether the letter string should be bound with a LWP-component, finally tags the Letter-word phrases in the sentence. With the close test and open test, the experiment shows that the tagging system precision reaches above 90%.
Keywords :
identification technology; knowledge based systems; matrix algebra; natural language processing; statistical analysis; text analysis; word processing; Chinese information processing; Chinese texts; LWP-component rules; auto-recognizing letter-word phrases; auto-tagging system; boundary words rules; collocation coefficient matrix; exceptional-correct rules; letter-strings; segmentation software; statistical data; tagging system precision; Context; Information processing; Materials; Mutual information; Probability; Statistical analysis; Training; Auto-tagging; Letter-word phrase; collocation coefficient;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4244-8891-9
Electronic_ISBN :
978-0-7695-4281-2
Type :
conf
DOI :
10.1109/ICGEC.2010.98
Filename :
5715446
Link To Document :
بازگشت