• DocumentCode
    353706
  • Title

    Transcription and indexation of broadcast data

  • Author

    Gauvnin, J.-L. ; Lamel, Lori ; De Kercadio, Yannick ; Adda, Gilles

  • Author_Institution
    Spoken Language Process. Group, LIMSI-CNRS, Orsay, France
  • Volume
    3
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    1663
  • Abstract
    We report on recent research on transcribing and indexing broadcast news data for information retrieval purposes. The system described combines an adapted version of the LIMSI 1998 Hub-4E transcription system for speech recognition with text-based IR methods. Experimental results are reported in terms of recognition word error rate and mean average precision for both the TREC SDR98 (100h) and SDR99 (600h) data sets. With query expansion using commercial transcripts, comparable mean average precisions are obtained on manual reference transcriptions and automatic transcriptions with a word error rate of 21.5% measured on a 10 hour data subset
  • Keywords
    indexing; information retrieval; speech recognition; TREC SDR98 data set; TREC SDR99 data set; adapted LIMSI 1998 Hub-4E transcription system; automatic transcriptions; broadcast news data indexing; broadcast news data transcription; commercial transcripts; information retrieval; manual reference transcriptions; mean average precision; query expansion; recognition word error rate; speech recognition; text-based information retrieval methods; word error rate; Acoustic noise; Broadcasting; Context modeling; Error analysis; Indexing; Information retrieval; Maximum likelihood detection; Natural languages; Speech recognition; Streaming media;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-6293-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2000.862069
  • Filename
    862069