Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm

Author

Tolba, Hesham ; Selouani, Sid-Ahmed ; O´Shaughnessy, Douglas

Author_Institution

INRS-Télécommunications, Université du Québec, 900 de la Gauchetière Ouest, H5A 1C6, Canada

Volume

1

fYear

2002

fDate

13-17 May 2002

Abstract

In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems. Our goal in this paper is to improve the performance of the HMM-based ASR systems by exploiting some features that characterize speech sounds based on the auditory system and one based on the Fourier power spectrum. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main peaks of the spectrum of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance. The Hidden Markov Model Toolkit (HTK) was used throughout our experiments to test the use of the new multi-stream feature vector. A series of experiments on speaker-independent continuous-speech recognition have been carried out using a subset of the large read-speech corpus TIMIT. Using such multi-stream paradigm, N-mixture mono-/tri-phone models and a bigram language model, we found that the word error rate was decreased by about 4.01%.

Keywords

Analytical models; Cepstrum; Computational modeling; Ear; Hidden Markov models; Markov processes; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on

Conference_Location

Orlando, FL, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.2002.5743869

Filename

5743869