DocumentCode :
2253748
Title :
A model for the acoustic phonetic structure of Arabic language using a single ergodic hidden Markov model
Author :
Mokhtar, M.A. ; El-Abddin, A.Z.
Author_Institution :
Dept. of Electr. Eng., Alexandria Univ., Egypt
Volume :
1
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
330
Abstract :
We propose an acoustic-phonetic structure model of the Arabic language using a single ergodic hidden Markov model (HMM), since a single HMM (about 40-50 states) can be used to represent all acoustic phonetic effects. We represent the techniques and algorithms used to perform that model, the problems associated with representing the whole acoustic-phonetic structure, the characteristics of the model, and how it performs as a phonetic decoder for recognition of fluent Arabic speech. The model is trained, segmented (manually and automatically), and labeled using a fixed number of phonemes, each of which has a direct correspondence to the states of the model. The model assumes that the observed spectral vectors were generated by a Gaussian source. The inherent variability of each phoneme is modeled as the observable random process of the Markov chain, while the phonotactic model of the unobservable phonetic sequence is represented by the state transition matrix of the HMM. The model incorporated the variable duration feature densities in each state to account for the fact that vowel-like sounds have vastly different duration characteristics than consonant-like sounds. It is shown that the difficulties in developing an acoustic-phonetic model are not due to the inherent deficiencies of the concept presented. Instead they are due to the choice of the phonemes to be modeled, the selected parametrization of the data, and appropriate choice of the variant of the ergodic HMM. The model used for the recognition experiments is clearly not complete, but it adequately performs phonetic transcription of the unknown utterances, thereby serving as the initial step towards continuous speech recognition
Keywords :
Gaussian processes; cepstral analysis; hidden Markov models; matrix algebra; random processes; speech coding; speech recognition; Arabic language; Arabic speech recognition; Gaussian source; Markov chain; acoustic phonetic structure; cepstral analysis; consonant-like sounds; observed spectral vectors; phonemes; phonetic decoder; random process; single ergodic hidden Markov model; state transition matrix; variable duration feature densities; vowel-like sounds; Acoustical engineering; Automatic speech recognition; Cepstral analysis; Character recognition; Decoding; Hidden Markov models; OFDM modulation; Speech analysis; Speech recognition; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607120
Filename :
607120
Link To Document :
بازگشت