• DocumentCode
    1076635
  • Title

    Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program

  • Author

    Soltau, Hagen ; Saon, George ; Kingsbury, Brian ; Kuo, Hong-Kwang Jeff ; Mangu, Lidia ; Povey, Daniel ; Emami, Ahmad

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY
  • Volume
    17
  • Issue
    5
  • fYear
    2009
  • fDate
    7/1/2009 12:00:00 AM
  • Firstpage
    884
  • Lastpage
    894
  • Abstract
    This paper describes the Arabic broadcast transcription system fielded by IBM in the GALE Phase 2.5 machine translation evaluation. Key advances include the use of additional training data from the Linguistic Data Consortium (LDC), use of a very large vocabulary comprising 737 K words and 2.5 M pronunciation variants, automatic vowelization using flat-start training, cross-adaptation between unvowelized and vowelized acoustic models, and rescoring with a neural-network language model. The resulting system achieves word error rates below 10% on Arabic broadcasts. Very large scale experiments with unsupervised training demonstrate that the utility of unsupervised data depends on the amount of supervised data available. While unsupervised training improves system performance when a limited amount (135 h) of supervised data is available, these gains disappear when a greater amount (848 h) of supervised data is used, even with a very large (7069 h) corpus of unsupervised data. We also describe a method for modeling Arabic dialects that avoids the problem of data sparseness entailed by dialect-specific acoustic models via the use of non-phonetic, dialect questions in the decision trees. We show how this method can be used with a statically compiled decoding graph by partitioning the decision trees into a static component and a dynamic component, with the dynamic component being replaced by a mapping that is evaluated at run-time.
  • Keywords
    decision trees; natural language processing; speech processing; unsupervised learning; DARPA GALE program; automatic vowelization; decision trees; decoding graph; dialect modelling; flat-start training; pronunciation variants; speech transcription; unsupervised training; Broadcasting; Decision trees; Decoding; Error analysis; Large-scale systems; Performance gain; Speech; System performance; Training data; Vocabulary; Dialect modeling; discriminative training; speech recognition; vowelization;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2022966
  • Filename
    5075768