DocumentCode :
3194399
Title :
Word Topical Mixture Models for Extractive Spoken Document Summarization
Author :
Chen, Berlin ; Chen, Yi-Ting
Author_Institution :
Nat. Taiwan Normal Univ., Taipei
fYear :
2007
fDate :
2-5 July 2007
Firstpage :
52
Lastpage :
55
Abstract :
This paper considers extractive summarization of Chinese spoken documents. In contrast to conventional approaches, we attempt to deal with the extractive summarization problem under a probabilistic generative framework. A word topical mixture model (w-TMM) was proposed to explore the cooccurrence relationship between words of the language. Each sentence of the spoken document to be summarized was treated as a composite word TMM model for generating the document, and sentences were ranked and selected according to their likelihoods. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the other conventional summarization approaches. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained. The proposed summarization technique has also been properly integrated into our prototype system for voice retrieval of broadcast news via mobile devices.
Keywords :
document handling; speech processing; Chinese broadcast news; Chinese spoken documents; extractive spoken document summarization; mobile devices; probabilistic generative framework; voice retrieval; word topical mixture models; Broadcasting; Computer science; Data mining; Hidden Markov models; Natural languages; Performance gain; Prototypes; Speech; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2007 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
1-4244-1016-9
Electronic_ISBN :
1-4244-1017-7
Type :
conf
DOI :
10.1109/ICME.2007.4284584
Filename :
4284584
Link To Document :
بازگشت