• DocumentCode
    3154482
  • Title

    A novel automatic text summarization system with feature terms identification

  • Author

    Manne, Suneetha ; Pervez, Shaik Mohammed Zaheer ; Fatima, S. Sameen

  • Author_Institution
    Dept. of IT, VRSEC, Vijayawada, India
  • fYear
    2011
  • fDate
    16-18 Dec. 2011
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    With ever growing content on World Wide Web, it has been increasingly difficult for users to search for relevant information. A rough estimation of world´s famous search engine Google in year 2010 revealed that the total size of internet has now turned to 2 petabytes. Search engines that are supposed to satisfy user´s information need, has too much information to offer than what is required. This problem is referred as information overload. The field of Information Extraction (IE) is offering a huge scope to concise and compact the information enabling the user to decide by mere check at snippets of each link. Automatic text summarization, a subset of IE is an important activity in the analysis of a high volume text documents. In this context, it has been increasingly important to develop information access solutions that can provide an easy and efficient access to users. Automatic summarization systems address information overload problem by producing a summary of related documents that provides an overall understanding of the topic without having to go through every document. In this paper, we propose a feature term based text summarization technique based on the analysis of Parts of Speech Tagging. A new approach of generating summary for a given input document is discussed based on identification and extraction of important sentences in the document. The system obtains the selective terms from the extracted terms and builds qualitative summary with appreciable compression ratio.
  • Keywords
    Internet; information retrieval; search engines; text analysis; Google search engine; Internet; World Wide Web; automatic text summarization system; feature terms identification; important sentence extraction; important sentence identification; information access; information extraction; parts-of-speech tagging; text document analysis; Feature extraction; Frequency measurement; Hidden Markov models; Natural language processing; Stochastic processes; Tagging; Training; Extractive feature terms; HMM tagger; POS tagging; Term frequency;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    India Conference (INDICON), 2011 Annual IEEE
  • Conference_Location
    Hyderabad
  • Print_ISBN
    978-1-4577-1110-7
  • Type

    conf

  • DOI
    10.1109/INDCON.2011.6139386
  • Filename
    6139386