• DocumentCode
    1350262
  • Title

    A comparative study of traditional and newly proposed features for recognition of speech under stress

  • Author

    Bou-Ghazale, Sahar E. ; Hansen, John H L

  • Author_Institution
    Center for Spoken Language Res., Colorado Univ., Boulder, CO, USA
  • Volume
    8
  • Issue
    4
  • fYear
    2000
  • fDate
    7/1/2000 12:00:00 AM
  • Firstpage
    429
  • Lastpage
    442
  • Abstract
    It is well known that the performance of speech recognition algorithms degrade in the presence of adverse environments where a speaker is under stress, emotion, or Lombard (1911) effect. This study evaluates the effectiveness of traditional features in recognition of speech under stress and formulates new features which are shown to improve stressed speech recognition. The focus is on formulating robust features which are less dependent on the speaking conditions rather than applying compensation or adaptation techniques. The stressed speaking styles considered are simulated angry and loud. Lombard effect speech, and noisy actual stressed speech from the SUSAS database which is available on a CD-ROM through the NATO IST/TG-01 research group and LDC. In addition, this study investigates the immunity of the linear prediction power spectrum and fast Fourier transform power spectrum to the presence of stress. Our results show that unlike fast Fourier transform´s (FFT) immunity to noise, the linear prediction power spectrum is more immune than FFT to stress as well as to a combination of a noisy and stressful environment. Finally, the effect of various parameter processing such as fixed versus variable preemphasis, liftering, and fixed versus cepstral mean normalization are studied. Two alternative frequency partitioning methods are proposed and compared with traditional mel-frequency cepstral coefficients (MFCC) features for stressed speech recognition. It is shown that the alternate filterbank frequency partitions are more effective for recognition of speech under both simulated and actual stressed conditions
  • Keywords
    cepstral analysis; channel bank filters; fast Fourier transforms; feature extraction; filtering theory; prediction theory; speech recognition; CD-ROM; FFT; LDC; Lombard effect speech; NATO IST/TG-01 research group; adverse environments; cepstral mean normalization; emotion; fast Fourier transform power spectrum; filterbank frequency partitions; fixed mean normalization; fixed preemphasis; frequency partitioning methods; liftering; linear prediction power spectrum; loud speech; mel-frequency cepstral coefficients; noise immunity; noisy stressed speech; parameter processing; performance; robust features; simulated angry speech; speaking conditions; speech recognition algorithms; stressed speech recognition; variable preemphasis; CD-ROMs; Cepstral analysis; Degradation; Mel frequency cepstral coefficient; Robustness; Spatial databases; Speech analysis; Speech recognition; Stress; Working environment noise;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.848224
  • Filename
    848224