DocumentCode :
1698388
Title :
Multi dialect Arabic speech parallel corpora
Author :
Almeman, K. ; Lee, Minhung ; Almiman, A.A.
Author_Institution :
Sch. of Comput. Sci., Univ. of Birmingham, Birmingham, UK
fYear :
2013
Firstpage :
1
Lastpage :
6
Abstract :
This paper describes the building of a multi dialect Arabic speech parallel corpus. It is designed to encompass four main dialects; Modern Standard Arabic (MSA), Gulf, Egypt and Levantine dialects. We have chosen a specific linguistic domain to work with it: travel and tourism. Parallel prompts were written for the four main dialects, which involved 1291 recordings for MSA and 1069 recordings for other dialects. The recordings were conducted with the consent of 52 participants. We have obtained about 32 speech hours. After the segmentation stage, we have obtained a total number of 67,132 speech files. These are the first Arabic parallel texts, and speech corpora and will be an open source for researchers.
Keywords :
natural language processing; speech processing; Arabic parallel texts; Egypt; Gulf; Levantine dialects; MSA; linguistic domain; modern standard arabic; multidialect arabic speech parallel corpora; Cities and towns; Microphones; NIST; Receivers; Speech; Speech recognition; Training; Arabic Dialects; Multi-Dialect; Parallel; Speech Corpora;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications, Signal Processing, and their Applications (ICCSPA), 2013 1st International Conference on
Conference_Location :
Sharjah
Print_ISBN :
978-1-4673-2820-3
Type :
conf
DOI :
10.1109/ICCSPA.2013.6487288
Filename :
6487288
Link To Document :
بازگشت