• DocumentCode
    477921
  • Title

    A Novel Chinese Text Summarization Approach Using Sentence Extraction Based on Kernel Words Recognition

  • Author

    Yang, Weijie ; Dai, Ruwei ; Cui, Xia

  • Author_Institution
    Key Lab. of Complex Syst. & Intell. Sci., Chinese Acad. of Sci., Beijing
  • Volume
    4
  • fYear
    2008
  • fDate
    18-20 Oct. 2008
  • Firstpage
    134
  • Lastpage
    139
  • Abstract
    The continuing growth of world wide Web and on-line text collections makes a large volume of information available to users. Automatic text summarization helps users to quickly understand the documents. This paper proposes an automated technique for Chinese document summarization based on kernel words recognition and discourse segment extraction. This method can be divided into the following five steps. First, the input articles are annotated by lexical analysis. Second, all focused named entities are recognized using a machine learning method. Third, the input articles are divided into several discourse segments, all kernel words of these segments are extracted by the way of rule-based main verbs recognition, and all relations among entities are extracted. Fourth, all important sentence candidates are ranked based on some rules, and redundant sentences are removed based on kernel words information. Finally, several most important sentences are extracted to compose the summarization according to expected compression ratio, and these important sentences are output using a special document as reference. A series of experiments are performed on two Chinese document collections. The results show the superiority of the proposed technique over reference systems.
  • Keywords
    Internet; feature extraction; learning (artificial intelligence); text analysis; Chinese document summarization; Chinese text summarization; discourse segment extraction; kernel words recognition; lexical analysis; machine learning method; on-line text collections; rule-based main verbs recognition; sentence extraction; world wide Web; Automation; Data mining; Focusing; Fuzzy systems; Intelligent systems; Kernel; Laboratories; Learning systems; Text recognition; Web sites; Social network; Text Summarization; focused named entities; main verb;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
  • Conference_Location
    Jinan Shandong
  • Print_ISBN
    978-0-7695-3305-6
  • Type

    conf

  • DOI
    10.1109/FSKD.2008.20
  • Filename
    4666371