A comparative study of traditional and newly proposed features for recognition of speech under stress

Author

Bou-Ghazale, Sahar E. ; Hansen, John H L

Author_Institution

Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA

Volume

8

Issue

4

fYear

2000

fDate

7/1/2000 12:00:00 AM

Firstpage

429

Lastpage

442

Abstract

It is well known that the performance of speech recognition algorithms degrade in the presence of adverse environments where a speaker is under stress, emotion, or Lombard (1911) effect. This study evaluates the effectiveness of traditional features in recognition of speech under stress and formulates new features which are shown to improve stressed speech recognition. The focus is on formulating robust features which are less dependent on the speaking conditions rather than applying compensation or adaptation techniques. The stressed speaking styles considered are simulated angry and loud. Lombard effect speech, and noisy actual stressed speech from the SUSAS database which is available on a CD-ROM through the NATO IST/TG-01 research group and LDC. In addition, this study investigates the immunity of the linear prediction power spectrum and fast Fourier transform power spectrum to the presence of stress. Our results show that unlike fast Fourier transform´s (FFT) immunity to noise, the linear prediction power spectrum is more immune than FFT to stress as well as to a combination of a noisy and stressful environment. Finally, the effect of various parameter processing such as fixed versus variable preemphasis, liftering, and fixed versus cepstral mean normalization are studied. Two alternative frequency partitioning methods are proposed and compared with traditional mel-frequency cepstral coefficients (MFCC) features for stressed speech recognition. It is shown that the alternate filterbank frequency partitions are more effective for recognition of speech under both simulated and actual stressed conditions

Keywords

cepstral analysis; channel bank filters; fast Fourier transforms; feature extraction; filtering theory; prediction theory; speech recognition; CD-ROM; FFT; LDC; Lombard effect speech; NATO IST/TG-01 research group; adverse environments; cepstral mean normalization; emotion; fast Fourier transform power spectrum; filterbank frequency partitions; fixed mean normalization; fixed preemphasis; frequency partitioning methods; liftering; linear prediction power spectrum; loud speech; mel-frequency cepstral coefficients; noise immunity; noisy stressed speech; parameter processing; performance; robust features; simulated angry speech; speaking conditions; speech recognition algorithms; stressed speech recognition; variable preemphasis; CD-ROMs; Cepstral analysis; Degradation; Mel frequency cepstral coefficient; Robustness; Spatial databases; Speech analysis; Speech recognition; Stress; Working environment noise;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.848224

Filename

848224