Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program

Author

Soltau, Hagen ; Saon, George ; Kingsbury, Brian ; Kuo, Hong-Kwang Jeff ; Mangu, Lidia ; Povey, Daniel ; Emami, Ahmad

Author_Institution

IBM T. J. Watson Res. Center, Yorktown Heights, NY

Volume

17

Issue

5

fYear

2009

fDate

7/1/2009 12:00:00 AM

Firstpage

884

Lastpage

894

Abstract

This paper describes the Arabic broadcast transcription system fielded by IBM in the GALE Phase 2.5 machine translation evaluation. Key advances include the use of additional training data from the Linguistic Data Consortium (LDC), use of a very large vocabulary comprising 737 K words and 2.5 M pronunciation variants, automatic vowelization using flat-start training, cross-adaptation between unvowelized and vowelized acoustic models, and rescoring with a neural-network language model. The resulting system achieves word error rates below 10% on Arabic broadcasts. Very large scale experiments with unsupervised training demonstrate that the utility of unsupervised data depends on the amount of supervised data available. While unsupervised training improves system performance when a limited amount (135 h) of supervised data is available, these gains disappear when a greater amount (848 h) of supervised data is used, even with a very large (7069 h) corpus of unsupervised data. We also describe a method for modeling Arabic dialects that avoids the problem of data sparseness entailed by dialect-specific acoustic models via the use of non-phonetic, dialect questions in the decision trees. We show how this method can be used with a statically compiled decoding graph by partitioning the decision trees into a static component and a dynamic component, with the dynamic component being replaced by a mapping that is evaluated at run-time.

Keywords

decision trees; natural language processing; speech processing; unsupervised learning; DARPA GALE program; automatic vowelization; decision trees; decoding graph; dialect modelling; flat-start training; pronunciation variants; speech transcription; unsupervised training; Broadcasting; Decision trees; Decoding; Error analysis; Large-scale systems; Performance gain; Speech; System performance; Training data; Vocabulary; Dialect modeling; discriminative training; speech recognition; vowelization;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2009.2022966

Filename

5075768