DocumentCode :
3105054
Title :
A Divide-Conquer Strategy for Both English and Chinese Text Chunking
Author :
Liang, Ying-Hong ; Wang, Ni-Hong ; Qiu, Zhao-wen ; Chen, Yin- ; Zhao, Tie-jun
fYear :
2007
fDate :
22-24 Aug. 2007
Firstpage :
81
Lastpage :
86
Abstract :
The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, a divide-conquer strategy is proposed and applied in the identification of English phrases. And then, this strategy is rapid transplanted to Chinese text chunking. This strategy divides the task of chunking into several sub-tasks according to sensitive features of each phrase and identifies different phrases in parallel. Then, a two-stage decreasing conflict strategy is used to synthesize each sub-task´s answer, where the main features are: one, each phrase uses its own sensitive features; two, avoidance of data sparseness. Through testing on public corpus (English) and Chinese Penn Treebank (Chinese), F score of English chunking achieves to 95.14% and that of Chinese chunking is 95.23%. These results are state of the art with the best results that have been reported..
Keywords :
Data mining; Electronic mail; Forestry; Information technology; Laboratories; Learning systems; Natural language processing; Natural languages; Speech processing; Testing; text chunkindivide-conquer strategydata sparseness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
Conference_Location :
Luoyang, Henan, China
Print_ISBN :
978-0-7695-2930-1
Type :
conf
DOI :
10.1109/ALPIT.2007.36
Filename :
4460619
Link To Document :
بازگشت