• DocumentCode
    1910067
  • Title

    Integrating Linguistic Patterns and Term-Entity Associations in Chinese Person Description Extraction

  • Author

    Li, Sujian ; Li, Wenjie ; Lu, Qin

  • Author_Institution
    Peking Univ., Peking
  • fYear
    2007
  • fDate
    Aug. 30 2007-Sept. 1 2007
  • Firstpage
    301
  • Lastpage
    307
  • Abstract
    Person description extraction is an important task in biography generation, question answering and summarization. Most previous extraction approaches select descriptive passages depending on sentence structure and/or word co-occurrence information. In this paper, we focus on Chinese person description extraction verification by measuring the associations between the recognized person entities and the surrounding terms, called Term-Entity associations. The associations are derived from both the semantic knowledge provided in a Chinese well-known thesaurus HowNet and the term distributional information gathered from the news corpus. Relying on Term-Entity associations, the ineligible extracted descriptions could be filtered out so that the higher precision could be achieved in turn. As far as we know, no work on Chinese person description extraction has been reported in the literature.
  • Keywords
    computational linguistics; knowledge acquisition; knowledge verification; natural language processing; pattern recognition; rewriting systems; thesauri; Chinese person description extraction verification; HowNet; biography generation; linguistic patterns; question answering; semantic knowledge; summarization; term distributional information; term-entity associations; thesaurus; Computational linguistics; Data mining; Explosions; Ontologies; Organizing; Pattern matching; Statistics; Sun; Tellurium; Thesauri;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-1611-0
  • Electronic_ISBN
    978-1-4244-1611-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2007.4368047
  • Filename
    4368047