• DocumentCode
    1804802
  • Title

    Corpus-based web document summarization using statistical and linguistic approach

  • Author

    Shams, Rushdi ; Hashem, M.M.A. ; Hossain, Afrina ; Akter, Suraiya Rumana ; Gope, Monika

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Khulna Univ. of Eng. & Technol. (KUET), Khulna, Bangladesh
  • fYear
    2010
  • fDate
    11-12 May 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Single document summarization generates summary by extracting the representative sentences from the document. In this paper, we presented a novel technique for summarization of domain-specific text from a single web document that uses statistical and linguistic analysis on the text in a reference corpus and the web document. The proposed summarizer uses the combinational function of Sentence Weight (SW) and Subject Weight (SuW) to determine the rank of a sentence, where SW is the function of number of terms (tn) and number of words (wn) in a sentence, and term frequency (tf) in the corpus and SuW is the function of tn and wn in a subject, and tf in the corpus. 30 percent of the ranked sentences are considered to be the summary of the web document. We generated three web document summaries using our technique and compared each of them with the summaries developed manually from 16 different human subjects. Results showed that 68 percent of the summaries produced by our approach satisfy the manual summaries.
  • Keywords
    Internet; knowledge acquisition; statistical analysis; text analysis; corpus-based Web document summarization; domain-specific text; knowledge ectraction; linguistic analysis; sentence weight; statistical analysis; subject weight; text summarization; Artificial neural networks; Book reviews; Feature extraction; Humans; Manuals; Pragmatics; Tagging; Knowledge Extraction; POS Tagging; Subject Weight; Text Summarization; Web Document Summarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Communication Engineering (ICCCE), 2010 International Conference on
  • Conference_Location
    Kuala Lumpur
  • Print_ISBN
    978-1-4244-6233-9
  • Type

    conf

  • DOI
    10.1109/ICCCE.2010.5556854
  • Filename
    5556854