DocumentCode :
2700722
Title :
Castsearch - Context Based Spoken Document Retrieval
Author :
Molgaard, L.L. ; Jorgensen, K.W. ; Hansen, Lars Kai
Author_Institution :
Inf. & Math. Modelling, Tech. Univ. Denmark, Lyngby, Denmark
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
The paper describes our work on the development of a system for retrieval of relevant stories from broadcast news. The system utilizes a combination of audio processing and text mining. The audio processing consists of a segmentation step that partitions the audio into speech and music. The speech is further segmented into speaker segments and then transcribed using an automatic speech recognition system, to yield text input for clustering using non-negative matrix factorization (NMF). We find semantic topics that are used to evaluate the performance for topic detection. Based on these topics we show that a novel query expansion can be performed to return more intelligent search results. We also show that the query expansion helps overcome errors of the automatic transcription.
Keywords :
document handling; matrix decomposition; query formulation; speaker recognition; speech processing; audio processing; automatic speech recognition system; automatic transcription; broadcast news; castsearch; context based spoken document retrieval; non-negative matrix factorization; query expansion; speaker segments; text mining; Automatic speech recognition; Broadcasting; Indexing; Informatics; Mathematical model; Mel frequency cepstral coefficient; Music information retrieval; Speech analysis; Streaming media; Text mining; Audio Retrieval; Document Clustering; Non-negative Matrix Factorization; Text Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.367171
Filename :
4218045
Link To Document :
بازگشت