A comparative study of mel cepstra and EIH for phone classification under adverse conditions

Author

Sandhu, Sumeet ; Ghitza, Oded

Author_Institution

AT&T Bell Labs., Murray Hill, NJ, USA

Volume

1

fYear

1995

fDate

9-12 May 1995

Firstpage

409

Abstract

The performance of large-vocabulary automatic speech recognition (ASR) systems deteriorates severely in mismatched training and testing conditions. Signal processing techniques based on the human auditory system have been proposed to improve ASR performance, especially under adverse acoustic conditions. The paper compares one such scheme, the ensemble interval histogram (EIH), with the conventional mel cepstral analysis (MEL). These two spectral feature extraction methods were implemented as front ends to a state-of-the-art continuous speech recognizer and evaluated on the TIMIT database (male). To characterize the influence of signal distortion on the representation of different sounds, phone classification experiments were conducted for three acoustic conditions-clean speech, speech through a telephone channel and speech under room reverberations (the last two are simulations). Classification was performed for static features alone and for static and dynamic features, to observe the relative contribution of time derivatives. The performance is displayed as percentage of phones correctly classified. Confusion matrices were also derived from phone classification to provide diagnostic information

Keywords

architectural acoustics; cepstral analysis; feature extraction; hidden Markov models; matrix algebra; pattern classification; reverberation; speech recognition; EIH; MEL cepstra; acoustic conditions; adverse conditions; clean speech; confusion matrices; continuous speech recognizer; diagnostic information; dynamic features; ensemble interval histogram; large-vocabulary automatic speech recognition systems; mel cepstral analysis; mismatched training; phone classification; room reverberations; signal distortion; signal processing techniques; spectral feature extraction methods; static feature; telephone channel; Acoustic signal processing; Acoustic testing; Auditory system; Automatic speech recognition; Automatic testing; Cepstral analysis; Feature extraction; Histograms; Humans; System testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on

Conference_Location

Detroit, MI

ISSN

1520-6149

Print_ISBN

0-7803-2431-5

Type

conf

DOI

10.1109/ICASSP.1995.479608

Filename

479608