• DocumentCode
    1161087
  • Title

    Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system

  • Author

    Matsoukas, Spyros ; Gauvain, Jean-Luc ; Adda, Gilles ; Colthurst, Thomas ; Kao, Chia-Lin ; Kimball, Owen ; Lamel, Lori ; Lefevre, Fabrice ; Ma, Jeff Z. ; Makhoul, John ; Nguyen, Long ; Prasad, Rohit ; Schwartz, Richard ; Schwenk, Holger ; Xiang, Bing

  • Author_Institution
    BBN Technol., Cambridge, MA
  • Volume
    14
  • Issue
    5
  • fYear
    2006
  • Firstpage
    1541
  • Lastpage
    1556
  • Abstract
    This paper describes the progress made in the transcription of broadcast news (BN) and conversational telephone speech (CTS) within the combined BBN/LIMSI system from May 2002 to September 2004. During that period, BBN and LIMSI collaborated in an effort to produce significant reductions in the word error rate (WER), as directed by the aggressive goals of the Effective, Affordable, Reusable, Speech-to-text [Defense Advanced Research Projects Agency (DARPA) EARS] program. The paper focuses on general modeling techniques that led to recognition accuracy improvements, as well as engineering approaches that enabled efficient use of large amounts of training data and fast decoding architectures. Special attention is given on efforts to integrate components of the BBN and LIMSI systems, discussing the tradeoff between speed and accuracy for various system combination strategies. Results on the EARS progress test sets show that the combined BBN/LIMSI system achieved relative reductions of 47% and 51% on the BN and CTS domains, respectively
  • Keywords
    broadcasting; speech coding; speech recognition; speech synthesis; telephony; Defense Advanced Research Projects Agency EARS program; broadcast news transcription; combined EARS BBN-LIMSI system; conversational telephone speech; effective-affordable-reusable-speech-to-text program; fast decoding architectures; general modeling techniques; recognition accuracy improvements; system combination strategies; word error rate reduction; Broadcasting; Collaboration; Data engineering; Decoding; Ear; Error analysis; Speech; System testing; Telephony; Training data; Hidden Markov models (HMMs); large training corpora; speech recognition; system combination;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.878257
  • Filename
    1677975