DocumentCode :
2701675
Title :
The IBM 2006 Gale Arabic ASR System
Author :
Soltau, Hagen ; Saon, George ; Kingsbury, Brian ; kuo, jay ; Mangu, Lidia ; Povey, Daniel ; Zweig, Geoffrey
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
This paper describes the advances made in IBM´s Arabic broadcast news transcription system which was fielded in the 2006 GALE ASR and machine translation evaluation. These advances were instrumental in lowering the word error rate by 42% relative over the course of one year and include: training on additional LDC data, large-scale discriminative training on 1800 hours of unsupervised data, automatic vowelization using a flat-start approach, use of a large vocabulary with 617K words and 2 million pronunciations and lastly, a system architecture based on cross-adaptation between unvowelized and vowelized acoustic models.
Keywords :
language translation; natural language processing; speech processing; speech recognition; Arabic broadcast news transcription system; IBM 2006 GALE Arabic ASR system; automatic vowelization; flat-start approach; large-scale discriminative training; machine translation evaluation; unsupervised data; unvowelized acoustic models; Acoustic testing; Automatic speech recognition; Broadcasting; Error analysis; Instruments; Large-scale systems; Natural languages; Speech recognition; System testing; Vocabulary; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.366921
Filename :
4218109
Link To Document :
بازگشت