DocumentCode
353706
Title
Transcription and indexation of broadcast data
Author
Gauvnin, J.-L. ; Lamel, Lori ; De Kercadio, Yannick ; Adda, Gilles
Author_Institution
Spoken Language Process. Group, LIMSI-CNRS, Orsay, France
Volume
3
fYear
2000
fDate
2000
Firstpage
1663
Abstract
We report on recent research on transcribing and indexing broadcast news data for information retrieval purposes. The system described combines an adapted version of the LIMSI 1998 Hub-4E transcription system for speech recognition with text-based IR methods. Experimental results are reported in terms of recognition word error rate and mean average precision for both the TREC SDR98 (100h) and SDR99 (600h) data sets. With query expansion using commercial transcripts, comparable mean average precisions are obtained on manual reference transcriptions and automatic transcriptions with a word error rate of 21.5% measured on a 10 hour data subset
Keywords
indexing; information retrieval; speech recognition; TREC SDR98 data set; TREC SDR99 data set; adapted LIMSI 1998 Hub-4E transcription system; automatic transcriptions; broadcast news data indexing; broadcast news data transcription; commercial transcripts; information retrieval; manual reference transcriptions; mean average precision; query expansion; recognition word error rate; speech recognition; text-based information retrieval methods; word error rate; Acoustic noise; Broadcasting; Context modeling; Error analysis; Indexing; Information retrieval; Maximum likelihood detection; Natural languages; Speech recognition; Streaming media;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location
Istanbul
ISSN
1520-6149
Print_ISBN
0-7803-6293-4
Type
conf
DOI
10.1109/ICASSP.2000.862069
Filename
862069
Link To Document