DocumentCode :
1910081
Title :
Classification-based Chinese Collocation Extraction
Author :
Xu, Ruifeng ; Lu, Qin ; Wong, Kam-Fai ; Li, Wenjie
Author_Institution :
Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, csrfxu@comp.polyu.edu.hk
fYear :
2007
fDate :
Aug. 30 2007-Sept. 1 2007
Firstpage :
308
Lastpage :
315
Abstract :
Most collocation extraction algorithms use a single set of criteria and a single threshold which is not quite appropriate because different types of collocations have different behaviors. This paper presents a window-based Chinese collocation extraction system, which identifies different types of collocations separately. By taking into consideration of compositional, non-substitutable, and non-modifiable properties as well as statistical significance, Chinese collocations are classified into four types. A multi-stage extraction system is then designed to separately identify different types of collocations by using different combinations of features. Furthermore, heuristic rules based on dependency knowledge are applied to filter out some pseudo collocations. Experiments show that the proposed system achieves better Fl performance compared to most existing algorithms for Chinese collocation extraction.
Keywords :
classification; knowledge acquisition; natural language processing; classification-based Chinese collocation extraction; collocation identification; dependency knowledge; heuristic rules; multistage extraction system; pseudo collocations; window-based Chinese collocation extraction; Data mining; Extraterrestrial measurements; Filters; Frequency; Natural language processing; Research and development management; Statistics; Sun; Systems engineering and theory; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1610-3
Electronic_ISBN :
978-1-4244-1611-0
Type :
conf
DOI :
10.1109/NLPKE.2007.4368048
Filename :
4368048
Link To Document :
بازگشت