Title :
A Hybrid Strategy for Chinese Domain-Specific Terminology Extraction
Author :
Qiang Zhan;Chunhong Wang
Author_Institution :
Sch. of Comput. Sci. &
Abstract :
Automatic Term Extraction is an important issue in Natural Language Processing. This paper presents a new approach of terminology extraction combining with machine learning based on cascaded conditional random fields and corpus-based statistical model. In this approach, firstly, the low-layer and high-layer conditional random fields (CRFs) are used to extract the simple and compound terminologies respectively. Then, Domain Relevance (DR) and Domain Consensus (DC) degrees are calculated to acquire the final domain terminologies. Experimental results show that the precision, recall and F-score are 83.29%, 80.75%, 82.01% respectively. The comparison with CRFs and MI+T-value shows that the proposed method for extracting terminology is effective.
Keywords :
"Terminology","Compounds","Labeling","Feature extraction","Natural language processing","Information entropy","Data mining"
Conference_Titel :
Semantics, Knowledge and Grids (SKG), 2015 11th International Conference on
DOI :
10.1109/SKG.2015.39