A Domain-Specific Chinese Term Extraction Method Based on Prefix and Suffix

Author

Li, Dongmei ; Wang, Qinglin ; Li, Yuan ; Peng, Qian

Author_Institution

Sch. of Autom., Beijing Inst. of Technol., Beijing, China

fYear

2012

fDate

11-13 Aug. 2012

Firstpage

1356

Lastpage

1359

Abstract

The term recognition and extraction is the foundation of text information processing. This paper presents a domain-specific Chinese term extraction method based on prefix and suffix. Firstly, the commonly used prefix and suffix are extracted from a given set of seed terms. Secondly, we segment the testing corpus to obtain statistics of words which are next to the prefixes and suffixes. And then, we judge whether a word and a prefix/suffix is a candidate term according to frequency information of the word. Thirdly, we enlarge initial candidate term set by frequency judgment. Finally we filter candidate terms by co-occurrence analysis. Experiment shows that terms with common prefixes and suffixes can be well extracted.

Keywords

natural language processing; statistical analysis; text analysis; cooccurrence analysis; domain-specific Chinese term extraction method; prefix; suffix; testing corpus; text information processing; word frequency information; word statistics; Algorithm design and analysis; Data mining; Dictionaries; Feature extraction; Frequency domain analysis; Testing; Text recognition; co-occurrence analysis; domain-specific term; term extraction; term recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Science & Service System (CSSS), 2012 International Conference on

Conference_Location

Nanjing

Print_ISBN

978-1-4673-0721-5

Type

conf

DOI

10.1109/CSSS.2012.342

Filename

6394580