Title :
Morphological Decomposition for Arabic Broadcast News Transcription
Author :
Xiang, Bing ; Nguyen, Kham ; Nguyen, Long ; Schwartz, Richard ; Makhoul, John
Author_Institution :
BBN Technol., Cambridge, MA
Abstract :
In this paper, we present a novel approach for morphological decomposition in large vocabulary Arabic speech recognition. It achieved low out-of-vocabulary (OOV) rate as well as high recognition accuracy in a state-of-the-art Arabic broadcast news transcription system. In this approach, the compound words are decomposed into stems and affixes in both language training and acoustic training data. The decomposed words in the recognition output are re-joined before scoring. Four algorithms are experimented and compared in this work. The best system achieved 1.9% absolute reduction (9.8% relative) in word error rate (WER) when compared to the 64K-word baseline. The recognition performance of this system is also comparable to a 300K-word recognition system trained on the normal words. In the meantime, the decomposed system is much faster in terms of speed and also needs less memory than the systems with larger than 64K vocabularies
Keywords :
acoustics; natural languages; speech recognition; Arabic broadcast news transcription; acoustic training data; language training; large vocabulary Arabic speech recognition; morphological decomposition; out-of-vocabulary; word error rate; word recognition system; Automatic speech recognition; Broadcasting; Data mining; Dictionaries; Error analysis; Frequency; Natural languages; Speech recognition; Training data; Vocabulary;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
Print_ISBN :
1-4244-0469-X
DOI :
10.1109/ICASSP.2006.1660214