Title :
Classification-based Chinese Collocation Extraction
Author :
Xu, Ruifeng ; Lu, Qin ; Wong, Kam-Fai ; Li, Wenjie
Author_Institution :
Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, csrfxu@comp.polyu.edu.hk
fDate :
Aug. 30 2007-Sept. 1 2007
Abstract :
Most collocation extraction algorithms use a single set of criteria and a single threshold which is not quite appropriate because different types of collocations have different behaviors. This paper presents a window-based Chinese collocation extraction system, which identifies different types of collocations separately. By taking into consideration of compositional, non-substitutable, and non-modifiable properties as well as statistical significance, Chinese collocations are classified into four types. A multi-stage extraction system is then designed to separately identify different types of collocations by using different combinations of features. Furthermore, heuristic rules based on dependency knowledge are applied to filter out some pseudo collocations. Experiments show that the proposed system achieves better Fl performance compared to most existing algorithms for Chinese collocation extraction.
Keywords :
classification; knowledge acquisition; natural language processing; classification-based Chinese collocation extraction; collocation identification; dependency knowledge; heuristic rules; multistage extraction system; pseudo collocations; window-based Chinese collocation extraction; Data mining; Extraterrestrial measurements; Filters; Frequency; Natural language processing; Research and development management; Statistics; Sun; Systems engineering and theory; Testing;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1610-3
Electronic_ISBN :
978-1-4244-1611-0
DOI :
10.1109/NLPKE.2007.4368048