Title :
Castsearch - Context Based Spoken Document Retrieval
Author :
Molgaard, L.L. ; Jorgensen, K.W. ; Hansen, Lars Kai
Author_Institution :
Inf. & Math. Modelling, Tech. Univ. Denmark, Lyngby, Denmark
Abstract :
The paper describes our work on the development of a system for retrieval of relevant stories from broadcast news. The system utilizes a combination of audio processing and text mining. The audio processing consists of a segmentation step that partitions the audio into speech and music. The speech is further segmented into speaker segments and then transcribed using an automatic speech recognition system, to yield text input for clustering using non-negative matrix factorization (NMF). We find semantic topics that are used to evaluate the performance for topic detection. Based on these topics we show that a novel query expansion can be performed to return more intelligent search results. We also show that the query expansion helps overcome errors of the automatic transcription.
Keywords :
document handling; matrix decomposition; query formulation; speaker recognition; speech processing; audio processing; automatic speech recognition system; automatic transcription; broadcast news; castsearch; context based spoken document retrieval; non-negative matrix factorization; query expansion; speaker segments; text mining; Automatic speech recognition; Broadcasting; Indexing; Informatics; Mathematical model; Mel frequency cepstral coefficient; Music information retrieval; Speech analysis; Streaming media; Text mining; Audio Retrieval; Document Clustering; Non-negative Matrix Factorization; Text Mining;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
DOI :
10.1109/ICASSP.2007.367171