Title :
Topic-Driven Multi-document Summarization
Author :
Wang, Hongling ; Zhou, Guodong
Author_Institution :
SchoolJiangsu Provincial Key Lab. for Comput. Inf. Process. Technol., Soochow Univ., Suzhou, China
Abstract :
This paper presents a topic-driven framework for generating a generic summary from multi-documents. Our approach is based on the intuition that, from the statistical point of view, the summary´s probability distribution over the topics should be consistent with the multi-documents´ probability distribution over the inherent topics. Here, the topics are defined as weighted “bag-of-words” and derived by Latent Dirichlet Allocation from a collection of documents, either the given multi-documents or a related large-scale corpus. In this sense, we could represent various kinds of text units, such as word, sentence, summary, document and multi-documents, using a single vector space model via their corresponding probability distributions over the derived topics. Therefore, we are able to extract a sentence or summary by calculating the similarity between a sentence/summary and the given multi-documents via their topic probability distributions. In particular, we propose two methods in similarity measurement: the static method and the dynamic method. While the former is employed to detect the salience of information in a static way, the later further controls redundancy in a dynamic way. In addition, we integrate various popular features to improve the performance. Evaluation on the TAC 2008 update summarization task shows encouraging results.
Keywords :
document handling; natural language processing; statistical distributions; bag-of-words; generic summary; large scale corpus; latent Dirichlet allocation; multidocument summarization; sentence extraction; sentence similarity; similarity measurement; single vector space model; summary probability distribution; topic driven framework; Mathematical model; Probability distribution; Redundancy; Resource management; Semantics; Strontium; Dynamic Method; Latent Dirichlet Allocation; Multi-document Summarization; Static Method; Topic Modeling;
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
DOI :
10.1109/IALP.2010.26