Title :
A novel automatic text summarization system with feature terms identification
Author :
Manne, Suneetha ; Pervez, Shaik Mohammed Zaheer ; Fatima, S. Sameen
Author_Institution :
Dept. of IT, VRSEC, Vijayawada, India
Abstract :
With ever growing content on World Wide Web, it has been increasingly difficult for users to search for relevant information. A rough estimation of world´s famous search engine Google in year 2010 revealed that the total size of internet has now turned to 2 petabytes. Search engines that are supposed to satisfy user´s information need, has too much information to offer than what is required. This problem is referred as information overload. The field of Information Extraction (IE) is offering a huge scope to concise and compact the information enabling the user to decide by mere check at snippets of each link. Automatic text summarization, a subset of IE is an important activity in the analysis of a high volume text documents. In this context, it has been increasingly important to develop information access solutions that can provide an easy and efficient access to users. Automatic summarization systems address information overload problem by producing a summary of related documents that provides an overall understanding of the topic without having to go through every document. In this paper, we propose a feature term based text summarization technique based on the analysis of Parts of Speech Tagging. A new approach of generating summary for a given input document is discussed based on identification and extraction of important sentences in the document. The system obtains the selective terms from the extracted terms and builds qualitative summary with appreciable compression ratio.
Keywords :
Internet; information retrieval; search engines; text analysis; Google search engine; Internet; World Wide Web; automatic text summarization system; feature terms identification; important sentence extraction; important sentence identification; information access; information extraction; parts-of-speech tagging; text document analysis; Feature extraction; Frequency measurement; Hidden Markov models; Natural language processing; Stochastic processes; Tagging; Training; Extractive feature terms; HMM tagger; POS tagging; Term frequency;
Conference_Titel :
India Conference (INDICON), 2011 Annual IEEE
Conference_Location :
Hyderabad
Print_ISBN :
978-1-4577-1110-7
DOI :
10.1109/INDCON.2011.6139386