DocumentCode :
3150048
Title :
A method of Chinese document expression for extracting formal context
Author :
Huang, Yinghui ; Li, Guanyu ; Wang, Dongyan
Author_Institution :
Inf. Sci. & Technol. Coll., Dalian Maritime Univ., Dalian, China
Volume :
7
fYear :
2010
fDate :
16-18 Oct. 2010
Firstpage :
3020
Lastpage :
3023
Abstract :
As a kind of data model, a formal context must be extracted from some actual data sources such as documents. For case of unstructured Chinese document, it is the first question to decide how to express the document. Vector space model (VSM) which is the dominant model of document expression now takes a single word as a feature item, so that neglects the lexical semantic relationship between words in a natural language, thereby it cannot show the semantic information implied in the documents. This paper discusses an improved method which is to take Hownet as knowledge base, to establish the concept vector space of Chinese document by using the set of similar word set to replace the single feature word in VSM. This method is convenient to extract a formal context according to the threshold made by users, and how to apply it is illustrated by an example.
Keywords :
data models; document handling; natural language processing; Chinese document expression; Hownet; actual data sources; data model; formal context extraction; lexical semantic relationship; natural language; unstructured Chinese document; vector space model; Context; Data mining; Driver circuits; Feature extraction; Knowledge based systems; Mathematical model; Semantics; VSM; document expression; formal context; set of similar word set;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on
Conference_Location :
Yantai
Print_ISBN :
978-1-4244-6495-1
Type :
conf
DOI :
10.1109/BMEI.2010.5639893
Filename :
5639893
Link To Document :
بازگشت