DocumentCode :
256703
Title :
Research on Multi-document Summarization Based on LDA Topic Model
Author :
Jinqiang Bian ; Zengru Jiang ; Qian Chen
Author_Institution :
Sch. of Autom., Beijing Inst. of Technol., Beijing, China
Volume :
2
fYear :
2014
fDate :
26-27 Aug. 2014
Firstpage :
113
Lastpage :
116
Abstract :
Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, based on LDA Model, a new method of sentence-ranking is proposed. The method combines topic-distribution of each sentence with topic-importance of the corpus together to calculate the posterior probability of the sentence, and then, based on the posterior probability, it selects sentences to form a summary. Topic-distribution of each sentence represents the likelihood of sentence belonging to each topic and topic-importance represents the degree that the topics cover the significant portion of the corpus. The method highlights the latent topics and optimizes the summarization. Experiment results on the dataset DUC2006 show the advantage of the multi-document summarization algorithm proposed in the paper. ROUGE values are improved compared with those methods, such as LexRank, LDA-SIBS, LDA-PGS.
Keywords :
information retrieval; probability; text analysis; LDA topic model; ROUGE value; latent dirichlet allocation; latent topic; multidocument summarization; posterior probability; sentence-ranking mechanism; topic-distribution; topic-importance; Data mining; Information retrieval; Probability distribution; Resource management; Semantics; Vectors; LDA; Multi-document summarization; Topic Model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2014 Sixth International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4799-4956-4
Type :
conf
DOI :
10.1109/IHMSC.2014.130
Filename :
6911461
Link To Document :
بازگشت