• DocumentCode
    1833096
  • Title

    Diacritization, automatic segmentation and labeling for Levantine Arabic speech

  • Author

    Alotaibi, Yousef A. ; Meftah, Ali H. ; Selouani, Sid-Ahmed

  • Author_Institution
    Coll. of Comput. & Inf. Sci., King Saud Univ., Riyadh, Saudi Arabia
  • fYear
    2013
  • fDate
    11-14 Aug. 2013
  • Firstpage
    7
  • Lastpage
    11
  • Abstract
    It is generally acknowledged that a reliable speech corpus is necessary for any application involving speech processing. In this paper, we propose methods to improve the BBN/AUB DARPA Babylon Levantine Arabic speech corpus to increase its reliability and efficiency. For this purpose, correction of pronunciation, diacritization, and new transcription are performed manually along with automatic phoneme segmentation and labeling. The comparison with the original transcription of the corpus shows a clear improvement in the output results.
  • Keywords
    natural language processing; speech processing; BBN-AUB DARPA Babylon Levantine Arabic speech corpus; automatic phoneme labeling; automatic phoneme segmentation; diacritization correction; pronunciation correction; speech processing; transcription; Educational institutions; Hidden Markov models; Labeling; Reliability; Speech; Speech processing; Speech recognition; BBN/AUB; Levantine; diacritics; dialect; transcription;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), 2013 IEEE
  • Conference_Location
    Napa, CA
  • Print_ISBN
    978-1-4799-1614-6
  • Type

    conf

  • DOI
    10.1109/DSP-SPE.2013.6642556
  • Filename
    6642556