Title :
Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA)
Author :
Kong, Sheng-Yi ; Lee, Lin-shan
Author_Institution :
Coll. of EECS, Nat. Taiwan Univ., Taipei
Abstract :
In this paper we propose a set of new methods exploring the topical information embedded in the spoken documents and using such information in automatic summarization of spoken documents. By introducing a set of latent topic variables, probabilistic latent semantic analysis (PLSA) is useful to find the underlying probabilistic relationships between documents and terms. Two useful measures, referred to as topic significance and term entropy in this paper, are proposed based on the PLSA modeling to determine the terms and thus sentences important for the document which can then be used to construct the summary. Experiment results for preliminary tests performed on broadcast news stories in Mandarin Chinese indicated improved performance as compared to some existing approaches
Keywords :
document handling; natural languages; probability; Mandarin Chinese; entropy; probabilistic latent semantic analysis; spoken document summarization; Content based retrieval; Educational institutions; Entropy; Humans; Information analysis; Motion pictures; Performance evaluation; Speech analysis; Speech recognition; Testing;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
Print_ISBN :
1-4244-0469-X
DOI :
10.1109/ICASSP.2006.1660177