Title :
Topic-Sensitive Multi-document Summarization Algorithm
Author :
Liu Na ; Tang Xiao-jun ; Lu Ying ; Li Ming-Xia ; Wang Hai-Wen ; Xiao Peng
Author_Institution :
Sch. of Inf. Sci. & Eng., Dalian Polytech. Univ., Dalian, China
Abstract :
Latent Dirichlet Allocation (LDA), has been recently used to automatically generate text corpora topics, and applied to sentences extraction based multi-document summarization algorithms. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant or background words, or represent insignificant themes. This paper proposed a topic-sensitive algorithm for multi-document summarization. Our approach is distinguished from existing approaches in that we use LDA model to identify and distinguish significance topic which is used in sentence weight calculation. Moreover, beside topic characteristics, this approach also considered some statistics characteristics, such as term frequency, sentence position, sentence length, etc. This approach not only highlights the advantages of statistics characteristics, but also cooperated with LDA topic model. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2002 corpus.
Keywords :
knowledge acquisition; statistical distributions; text analysis; LDA topic model; latent Dirichlet allocation; multidocument summarization; sentence extraction; text corpora topics; topic-sensitive algorithm; Bayes methods; Computational modeling; Frequency measurement; Length measurement; Probabilistic logic; Probability distribution; Resource management; LDA; multi-document summarization; topic model;
Conference_Titel :
Parallel Architectures, Algorithms and Programming (PAAP), 2014 Sixth International Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-3844-5
DOI :
10.1109/PAAP.2014.22