Title :
Web-Based Chinese Term Extraction in the Field of Study
Author :
Rui Guo;Jing Qiu;Guanghua Zhang
Author_Institution :
Sch. of Inf. Sci. &
Abstract :
In today´s era of big data, huge amounts of Web contains many important information. From the Web to extract domain-specific term is an indispensable part of the natural language processing Web, and it also plays an important role in the domain ontology study. Chinese text has no evident difference between words, therefore the present stage in Web text extraction is difficult in the field of Chinese text. This article will put forward to more accurately extract the Chinese text. First by removing stop words, Chinese word segmentation, lexical analysis to extract the nouns and noun phrases as candidate field terms. Then according to the candidate term in the field of subject in the field of distribution, the distribution of the subject areas each page, and terms in the distribution of other background areas. Combination of subject areas and background areas, using both TF-IDF and DR + DC algorithm terminology and implementing the term extraction in the field of subject, based on the Chinese word segmentation system of Chinese Academy of Sciences (ICTCLAS) and Language Technology Platform Cloud of Harbin Institute of Technology (LTP) [15] two platform tools to implement the term extraction, so that extract more accurate domain terminology.
Keywords :
"Ontologies","Data mining","Terminology","Web pages","Libraries","Correlation","Speech"
Conference_Titel :
Semantics, Knowledge and Grids (SKG), 2015 11th International Conference on
DOI :
10.1109/SKG.2015.45