Title :
Signal modeling for isolated word recognition
Author :
Karnjanadecha, M. ; Zahorian, Stephen A.
Author_Institution :
Dept. of Electr. & Comput. Eng., Old Dominion Univ., Norfolk, VA, USA
Abstract :
This paper presents speech signal modeling techniques which are well suited to high performance and robust isolated word recognition. Speech is encoded by a discrete cosine transform of its spectra, after several preprocessing steps. Temporal information is then also explicitly encoded into the feature set. We present a new technique for incorporating this temporal information as a function of the temporal position within each word. We tested features computed with this method using an alphabet recognition task based on the ISOLET database. The HTK toolkit was used to implement the isolated word recognizer with whole word HMM models. The best result obtained based on 50 features and speaker independent alphabet recognition was 98.0%. Gaussian noise was added to the original speech to simulate a noisy environment. We achieved a recognition accuracy of 95.8% at a SNR of 15 dB. We also tested our recognizer with simulated telephone quality speech by adding noise and band limiting the original speech. For this “telephone” speech, our recognizer achieved 89.6% recognition accuracy. The recognizer was also tested in a speaker dependent mode, resulting in a 97.4% accuracy on test data
Keywords :
Gaussian noise; discrete cosine transforms; hidden Markov models; spectral analysis; speech coding; speech intelligibility; speech recognition; transform coding; 15 dB; Gaussian noise; HTK toolkit; ISOLET database; SNR; alphabet recognition task; band limiting; discrete cosine transform; high performance; noise; noisy environment simulation; preprocessing steps; recognition accuracy; robust isolated word recognition; simulated telephone quality speech; speaker independent alphabet recognition; spectra; speech coding; speech signal modeling; temporal information; temporal position; test data; whole word HMM; Discrete cosine transforms; Gaussian noise; Hidden Markov models; Robustness; Signal to noise ratio; Spatial databases; Speech enhancement; Speech recognition; Testing; Working environment noise;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
Conference_Location :
Phoenix, AZ
Print_ISBN :
0-7803-5041-3
DOI :
10.1109/ICASSP.1999.758120