DocumentCode :
2910468
Title :
Corpus Based Extractive Document Summarization for Indic Script
Author :
Reddy, P. Vijayapal ; Vardhan, B. Vishnu ; Govardhan, A.
Author_Institution :
Dept. of CSE, Raja Mahendra Eng. Coll., Hyderabad, India
fYear :
2011
fDate :
15-17 Nov. 2011
Firstpage :
154
Lastpage :
157
Abstract :
Summarization is a process of generating condensed form of a given text document, which retains its information and overall meaning. Document summarization approaches are broadly classified into two i.e. extractive summarization approach and abstractive summarization approach. In this paper, we performed single document summarization to generate summary of Telugu text document by using extractive summarization approach. Though there are many document surface features exists, we consider those features which can extensively cover original document and generates summary with less redundancy. We considered the features such as sentence position, sentence similarity with the title, centrality of the sentence and word frequency. To increase the strength of the features, we used a corpus which contains 3000 documents and performed various preprocessing steps like stop word elimination and stemming to retain more meaningful words within the sentence. Sentences are ranked by calculating the scores for each individual sentence by considering all four features simultaneously with optimum weights. The optimum weights to the feature are learned with the help human constructed summaries. The machine generated summaries are evaluated using F1 measure followed by human judgements.
Keywords :
abstracting; text analysis; word processing; F1 measure; Indie script; Telugu text document summarization; corpus based extractive document summarization approach; sentence similarity; word elimination; word frequency; word stemming; Text analysis; summarization and generation; understanding;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2011 International Conference on
Conference_Location :
Penang
Print_ISBN :
978-1-4577-1733-8
Type :
conf
DOI :
10.1109/IALP.2011.66
Filename :
6121492
Link To Document :
بازگشت