Diacritization, automatic segmentation and labeling for Levantine Arabic speech

Author

Alotaibi, Yousef A. ; Meftah, Ali H. ; Selouani, Sid-Ahmed

Author_Institution

Coll. of Comput. & Inf. Sci., King Saud Univ., Riyadh, Saudi Arabia

fYear

2013

fDate

11-14 Aug. 2013

Firstpage

7

Lastpage

11

Abstract

It is generally acknowledged that a reliable speech corpus is necessary for any application involving speech processing. In this paper, we propose methods to improve the BBN/AUB DARPA Babylon Levantine Arabic speech corpus to increase its reliability and efficiency. For this purpose, correction of pronunciation, diacritization, and new transcription are performed manually along with automatic phoneme segmentation and labeling. The comparison with the original transcription of the corpus shows a clear improvement in the output results.

Keywords

natural language processing; speech processing; BBN-AUB DARPA Babylon Levantine Arabic speech corpus; automatic phoneme labeling; automatic phoneme segmentation; diacritization correction; pronunciation correction; speech processing; transcription; Educational institutions; Hidden Markov models; Labeling; Reliability; Speech; Speech processing; Speech recognition; BBN/AUB; Levantine; diacritics; dialect; transcription;

fLanguage

English

Publisher

ieee

Conference_Titel

Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), 2013 IEEE

Conference_Location

Napa, CA

Print_ISBN

978-1-4799-1614-6

Type

conf

DOI

10.1109/DSP-SPE.2013.6642556

Filename

6642556