DocumentCode :
3240051
Title :
Spoken Arabic digits recognizer using recurrent neural networks
Author :
Alotaibi, Yousef Ajami
Author_Institution :
Dept. of Comput. Eng., King Saud Univ., Riyadh, Saudi Arabia
fYear :
2004
fDate :
18-21 Dec. 2004
Firstpage :
195
Lastpage :
199
Abstract :
Arabic language is a Semitic language that has many differences when compared to European languages such as English. One of these differences is how to pronounce the ten digits, zero through nine. All Arabic digits are polysyllabic (except digit zero which is a monosyllabic) words and most of them contain Arabic unique phonemes, namely, pharyngeal end emphatic subset. In this paper Arabic digits were investigated from the speech recognition problem point of view. A recurrent neural networks based speech recognition system was designed and tested with automatic Arabic digits recognition. The system is an isolated whole word speech recognizer and it was implemented both as a multispeaker (i.e., the same set of speakers were used in both the training and testing phases) mode and speaker-independent (i.e., speakers used for training are different from those used for testing) mode. During recognition process, the digitized speech is cleaned from the noise by means of band-pass filters, the signal is also preemphasized, then it windowed and blocked by Hamming window, a time alignment algorithm is used to compensate for the differences in the utterances´ lengths and misalignments between phonemes, frames features are extracted by using MFCC coefficients to reduce the amount of the information in the input signal and finally the neural network classifies the unknown digit. This recognition system achieved 99.5% correct digit recognition in the case of multispeaker mode, and 94.5% in the case of speaker-independent mode.
Keywords :
band-pass filters; feature extraction; natural languages; recurrent neural nets; signal classification; speaker recognition; Hamming window; MFCC coefficient; automatic Arabic digit recognition; band-pass filter; feature extraction; multispeaker mode; phonemes; recurrent neural network; signal classification; speaker-independent mode; speech digitization; speech recognition system; time alignment algorithm; word recognizer; Automatic speech recognition; Automatic testing; Natural languages; Noise reduction; Recurrent neural networks; Signal processing; Speech enhancement; Speech processing; Speech recognition; System testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing and Information Technology, 2004. Proceedings of the Fourth IEEE International Symposium on
Print_ISBN :
0-7803-8689-2
Type :
conf
DOI :
10.1109/ISSPIT.2004.1433720
Filename :
1433720
Link To Document :
بازگشت