• DocumentCode
    263625
  • Title

    Topic-Sensitive Multi-document Summarization Algorithm

  • Author

    Liu Na ; Tang Xiao-jun ; Lu Ying ; Li Ming-Xia ; Wang Hai-Wen ; Xiao Peng

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Dalian Polytech. Univ., Dalian, China
  • fYear
    2014
  • fDate
    13-15 July 2014
  • Firstpage
    69
  • Lastpage
    74
  • Abstract
    Latent Dirichlet Allocation (LDA), has been recently used to automatically generate text corpora topics, and applied to sentences extraction based multi-document summarization algorithms. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant or background words, or represent insignificant themes. This paper proposed a topic-sensitive algorithm for multi-document summarization. Our approach is distinguished from existing approaches in that we use LDA model to identify and distinguish significance topic which is used in sentence weight calculation. Moreover, beside topic characteristics, this approach also considered some statistics characteristics, such as term frequency, sentence position, sentence length, etc. This approach not only highlights the advantages of statistics characteristics, but also cooperated with LDA topic model. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2002 corpus.
  • Keywords
    knowledge acquisition; statistical distributions; text analysis; LDA topic model; latent Dirichlet allocation; multidocument summarization; sentence extraction; text corpora topics; topic-sensitive algorithm; Bayes methods; Computational modeling; Frequency measurement; Length measurement; Probabilistic logic; Probability distribution; Resource management; LDA; multi-document summarization; topic model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures, Algorithms and Programming (PAAP), 2014 Sixth International Symposium on
  • Conference_Location
    Beijing
  • ISSN
    2168-3034
  • Print_ISBN
    978-1-4799-3844-5
  • Type

    conf

  • DOI
    10.1109/PAAP.2014.22
  • Filename
    6916439