Cross-dialectal acoustic data sharing for Arabic speech recognition

Author

Kirchhoff, Katrin ; Vergyri, Dimitra

Author_Institution

Dept. of Electr. Eng., Univ. of Washington, Seattle, WA, USA

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

The automatic recognition of Arabic dialectal speech is a challenging task since Arabic dialects are essentially spoken varieties, for which only sparse resources (transcriptions and standardized acoustic data) are available to date. In this paper we describe the use of acoustic data from modern standard Arabic (MSA) to improve the recognition of Egyptian conversational Arabic (ECA). The cross-dialectal use of data is complicated by the fact that MSA is written without short vowels and other diacritics and thus has incomplete phonetic information. This problem is addressed by automatically vowelizing MSA data before combining it with ECA data. We described the vowelization procedure as well as speech recognition experiments and show that our technique yields improvements over our baseline system.

Keywords

speech processing; speech recognition; Arabic dialectal speech; Arabic speech recognition; Egyptian conversational Arabic; automatic recognition; automatic vowelization; cross-dialectal acoustic data sharing; incomplete phonetic information; modern standard Arabic; Automatic speech recognition; Communication standards; Context modeling; Loudspeakers; Morphology; Natural languages; Radio broadcasting; Speech recognition; TV broadcasting; Writing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1326098

Filename

1326098