• DocumentCode
    3150048
  • Title

    A method of Chinese document expression for extracting formal context

  • Author

    Huang, Yinghui ; Li, Guanyu ; Wang, Dongyan

  • Author_Institution
    Inf. Sci. & Technol. Coll., Dalian Maritime Univ., Dalian, China
  • Volume
    7
  • fYear
    2010
  • fDate
    16-18 Oct. 2010
  • Firstpage
    3020
  • Lastpage
    3023
  • Abstract
    As a kind of data model, a formal context must be extracted from some actual data sources such as documents. For case of unstructured Chinese document, it is the first question to decide how to express the document. Vector space model (VSM) which is the dominant model of document expression now takes a single word as a feature item, so that neglects the lexical semantic relationship between words in a natural language, thereby it cannot show the semantic information implied in the documents. This paper discusses an improved method which is to take Hownet as knowledge base, to establish the concept vector space of Chinese document by using the set of similar word set to replace the single feature word in VSM. This method is convenient to extract a formal context according to the threshold made by users, and how to apply it is illustrated by an example.
  • Keywords
    data models; document handling; natural language processing; Chinese document expression; Hownet; actual data sources; data model; formal context extraction; lexical semantic relationship; natural language; unstructured Chinese document; vector space model; Context; Data mining; Driver circuits; Feature extraction; Knowledge based systems; Mathematical model; Semantics; VSM; document expression; formal context; set of similar word set;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on
  • Conference_Location
    Yantai
  • Print_ISBN
    978-1-4244-6495-1
  • Type

    conf

  • DOI
    10.1109/BMEI.2010.5639893
  • Filename
    5639893