• DocumentCode
    2311428
  • Title

    Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA)

  • Author

    Kong, Sheng-Yi ; Lee, Lin-shan

  • Author_Institution
    Coll. of EECS, Nat. Taiwan Univ., Taipei
  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    In this paper we propose a set of new methods exploring the topical information embedded in the spoken documents and using such information in automatic summarization of spoken documents. By introducing a set of latent topic variables, probabilistic latent semantic analysis (PLSA) is useful to find the underlying probabilistic relationships between documents and terms. Two useful measures, referred to as topic significance and term entropy in this paper, are proposed based on the PLSA modeling to determine the terms and thus sentences important for the document which can then be used to construct the summary. Experiment results for preliminary tests performed on broadcast news stories in Mandarin Chinese indicated improved performance as compared to some existing approaches
  • Keywords
    document handling; natural languages; probability; Mandarin Chinese; entropy; probabilistic latent semantic analysis; spoken document summarization; Content based retrieval; Educational institutions; Entropy; Humans; Information analysis; Motion pictures; Performance evaluation; Speech analysis; Speech recognition; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660177
  • Filename
    1660177