A novel automatic text summarization system with feature terms identification

Author

Manne, Suneetha ; Pervez, Shaik Mohammed Zaheer ; Fatima, S. Sameen

Author_Institution

Dept. of IT, VRSEC, Vijayawada, India

fYear

2011

fDate

16-18 Dec. 2011

Firstpage

1

Lastpage

6

Abstract

With ever growing content on World Wide Web, it has been increasingly difficult for users to search for relevant information. A rough estimation of world´s famous search engine Google in year 2010 revealed that the total size of internet has now turned to 2 petabytes. Search engines that are supposed to satisfy user´s information need, has too much information to offer than what is required. This problem is referred as information overload. The field of Information Extraction (IE) is offering a huge scope to concise and compact the information enabling the user to decide by mere check at snippets of each link. Automatic text summarization, a subset of IE is an important activity in the analysis of a high volume text documents. In this context, it has been increasingly important to develop information access solutions that can provide an easy and efficient access to users. Automatic summarization systems address information overload problem by producing a summary of related documents that provides an overall understanding of the topic without having to go through every document. In this paper, we propose a feature term based text summarization technique based on the analysis of Parts of Speech Tagging. A new approach of generating summary for a given input document is discussed based on identification and extraction of important sentences in the document. The system obtains the selective terms from the extracted terms and builds qualitative summary with appreciable compression ratio.

Keywords

Internet; information retrieval; search engines; text analysis; Google search engine; Internet; World Wide Web; automatic text summarization system; feature terms identification; important sentence extraction; important sentence identification; information access; information extraction; parts-of-speech tagging; text document analysis; Feature extraction; Frequency measurement; Hidden Markov models; Natural language processing; Stochastic processes; Tagging; Training; Extractive feature terms; HMM tagger; POS tagging; Term frequency;

fLanguage

English

Publisher

ieee

Conference_Titel

India Conference (INDICON), 2011 Annual IEEE

Conference_Location

Hyderabad

Print_ISBN

978-1-4577-1110-7

Type

conf

DOI

10.1109/INDCON.2011.6139386

Filename

6139386