DocumentCode :
1842931
Title :
Research on Automatic Chinese Multi-word Term Extraction Based on Integration of Web Information and Term Component
Author :
Kang, Wei ; Sui, Zhifang ; Liu, Yao
Volume :
3
fYear :
2009
fDate :
15-18 Sept. 2009
Firstpage :
267
Lastpage :
270
Abstract :
This paper presents an automatic Chinese multi-word term extraction method based on the integration of Web information and term component. We extract candidate terms by identifying delimiters, and filter invalid terms by checking the context terms in the Google result pages that are returned by Google when the candidate term is set as search request. Term component is taken into account to estimate the termhood. Inspired by the economical law of term generating, we propose two measures of a candidate term to be a true term: the first measure is based on domain speciality of term, and the second one is based on the similarity between a candidate and a template that contains structured information of terms. Experiments on IT domain and Medicine domain show that our method is effective and portable in different domains.
Keywords :
Computational intelligence; Computational linguistics; Control systems; Data mining; Educational technology; Filters; Intelligent agent; Paper technology; Statistics; Terminology; Chinese terminology; automatic terminology extraction; term component; termhood; web;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Milan, Italy
Print_ISBN :
978-0-7695-3801-3
Electronic_ISBN :
978-1-4244-5331-3
Type :
conf
DOI :
10.1109/WI-IAT.2009.279
Filename :
5285006
Link To Document :
بازگشت