DocumentCode :
14792
Title :
A Context-Based Word Indexing Model for Document Summarization
Author :
Goyal, Puneet ; Behera, Laxmidhar ; McGinnity, Thomas Martin
Author_Institution :
INRIA Paris-Rocquencourt, Le Chesnay, France
Volume :
25
Issue :
8
fYear :
2013
fDate :
Aug. 2013
Firstpage :
1693
Lastpage :
1705
Abstract :
Existing models for document summarization mostly use the similarity between sentences in the document to extract the most salient sentences. The documents as well as the sentences are indexed using traditional term indexing measures, which do not take the context into consideration. Therefore, the sentence similarity values remain independent of the context. In this paper, we propose a context sensitive document indexing model based on the Bernoulli model of randomness. The Bernoulli model of randomness has been used to find the probability of the cooccurrences of two terms in a large corpus. A new approach using the lexical association between terms to give a context sensitive weight to the document terms has been proposed. The resulting indexing weights are used to compute the sentence similarity matrix. The proposed sentence similarity measure has been used with the baseline graph-based ranking models for sentence extraction. Experiments have been conducted over the benchmark DUC data sets and it has been shown that the proposed Bernoulli-based sentence similarity model provides consistent improvements over the baseline IntraLink and UniformLink methods [1].
Keywords :
graph theory; information retrieval; probability; random processes; text analysis; word processing; Bernoulli randomness model; Bernoulli-based sentence similarity model; baseline IntraLink method; baseline UniformLink method; baseline graph-based ranking models; benchmark DUC data sets; context sensitive weight; context-based word indexing model; document indexing weights; document sentence similarity values; document summarization; document terms; lexical association; sentence extraction; sentence indexing weights; sentence similarity matrix; term cooccurrence probability; Computational modeling; Context; Context modeling; Equations; Indexing; Mathematical model; Measurement; Lexical association; document indexing; text summarization;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2012.114
Filename :
6205756
Link To Document :
بازگشت