DocumentCode
2311428
Title
Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA)
Author
Kong, Sheng-Yi ; Lee, Lin-shan
Author_Institution
Coll. of EECS, Nat. Taiwan Univ., Taipei
Volume
1
fYear
2006
fDate
14-19 May 2006
Abstract
In this paper we propose a set of new methods exploring the topical information embedded in the spoken documents and using such information in automatic summarization of spoken documents. By introducing a set of latent topic variables, probabilistic latent semantic analysis (PLSA) is useful to find the underlying probabilistic relationships between documents and terms. Two useful measures, referred to as topic significance and term entropy in this paper, are proposed based on the PLSA modeling to determine the terms and thus sentences important for the document which can then be used to construct the summary. Experiment results for preliminary tests performed on broadcast news stories in Mandarin Chinese indicated improved performance as compared to some existing approaches
Keywords
document handling; natural languages; probability; Mandarin Chinese; entropy; probabilistic latent semantic analysis; spoken document summarization; Content based retrieval; Educational institutions; Entropy; Humans; Information analysis; Motion pictures; Performance evaluation; Speech analysis; Speech recognition; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location
Toulouse
ISSN
1520-6149
Print_ISBN
1-4244-0469-X
Type
conf
DOI
10.1109/ICASSP.2006.1660177
Filename
1660177
Link To Document