DocumentCode :
2311428
Title :
Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA)
Author :
Kong, Sheng-Yi ; Lee, Lin-shan
Author_Institution :
Coll. of EECS, Nat. Taiwan Univ., Taipei
Volume :
1
fYear :
2006
fDate :
14-19 May 2006
Abstract :
In this paper we propose a set of new methods exploring the topical information embedded in the spoken documents and using such information in automatic summarization of spoken documents. By introducing a set of latent topic variables, probabilistic latent semantic analysis (PLSA) is useful to find the underlying probabilistic relationships between documents and terms. Two useful measures, referred to as topic significance and term entropy in this paper, are proposed based on the PLSA modeling to determine the terms and thus sentences important for the document which can then be used to construct the summary. Experiment results for preliminary tests performed on broadcast news stories in Mandarin Chinese indicated improved performance as compared to some existing approaches
Keywords :
document handling; natural languages; probability; Mandarin Chinese; entropy; probabilistic latent semantic analysis; spoken document summarization; Content based retrieval; Educational institutions; Entropy; Humans; Information analysis; Motion pictures; Performance evaluation; Speech analysis; Speech recognition; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
ISSN :
1520-6149
Print_ISBN :
1-4244-0469-X
Type :
conf
DOI :
10.1109/ICASSP.2006.1660177
Filename :
1660177
Link To Document :
بازگشت