• DocumentCode
    461677
  • Title

    The Research and Application about the Information Extraction in Chinese Domain

  • Author

    Zhang, Suxiang ; Wen, Juan ; Qin, Ying ; Wang, Xiaojie ; Zhong, Yixin

  • Author_Institution
    Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
  • Volume
    3
  • fYear
    2006
  • fDate
    16-20 2006
  • Abstract
    A specific prototype information service system was proposed by this paper, which can send interesting information to user with database search way from unstructured text. In order to achieve this goal, two fundamental issues were studied by using maximum entropy (ME) algorithm, which is named entity recognition and relation extraction. Our named entity recognition approach is distinguished from most of the previous approaches. Where, probabilistic feature functions are used instead of binary feature functions, it is one of the several differences between this model and the most of the previous ME based model. We also explore several new features in our model, which includes confidence functions, position of features etc. Like those in some previous works, we use sub-models to model Chinese person names, foreign names respectively, but we bring some new techniques in these sub-models. The experimental result is promising. Moreover, ME algorithm is the first time to be used to extract relations between entities from Chinese texts. Twelve features have been designed, which includes morphology, grammar and semantic feature. The experimental result is satisfied. Therefore, two research results were used into my information extraction system, the goal of information service came from unstructured text is achieved
  • Keywords
    feature extraction; maximum entropy methods; natural language processing; text analysis; Chinese domain; Chinese person names; entity recognition; foreign names; grammar; information extraction; maximum entropy algorithm; morphology; relation extraction; semantic feature; unstructured text; Data engineering; Data mining; Design engineering; Entropy; Morphology; Natural languages; Power engineering and energy; Prototypes; Spatial databases; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing, 2006 8th International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    0-7803-9736-3
  • Electronic_ISBN
    0-7803-9736-3
  • Type

    conf

  • DOI
    10.1109/ICOSP.2006.345822
  • Filename
    4129211