Title :
Transcription and indexation of broadcast data
Author :
Gauvnin, J.-L. ; Lamel, Lori ; De Kercadio, Yannick ; Adda, Gilles
Author_Institution :
Spoken Language Process. Group, LIMSI-CNRS, Orsay, France
Abstract :
We report on recent research on transcribing and indexing broadcast news data for information retrieval purposes. The system described combines an adapted version of the LIMSI 1998 Hub-4E transcription system for speech recognition with text-based IR methods. Experimental results are reported in terms of recognition word error rate and mean average precision for both the TREC SDR98 (100h) and SDR99 (600h) data sets. With query expansion using commercial transcripts, comparable mean average precisions are obtained on manual reference transcriptions and automatic transcriptions with a word error rate of 21.5% measured on a 10 hour data subset
Keywords :
indexing; information retrieval; speech recognition; TREC SDR98 data set; TREC SDR99 data set; adapted LIMSI 1998 Hub-4E transcription system; automatic transcriptions; broadcast news data indexing; broadcast news data transcription; commercial transcripts; information retrieval; manual reference transcriptions; mean average precision; query expansion; recognition word error rate; speech recognition; text-based information retrieval methods; word error rate; Acoustic noise; Broadcasting; Context modeling; Error analysis; Indexing; Information retrieval; Maximum likelihood detection; Natural languages; Speech recognition; Streaming media;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
0-7803-6293-4
DOI :
10.1109/ICASSP.2000.862069