UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech

Author

Ghaffarzadegan, Shabnam ; Boril, Hynek ; Hansen, John H. L.

Author_Institution

Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA

fYear

2014

fDate

4-9 May 2014

Firstpage

2544

Lastpage

2548

Abstract

This study focuses on acoustic variations in speech introduced by whispering, and proposes several strategies to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models. In the analysis part, differences in neutral and whispered speech captured in the UT-Vocal Effort II corpus are studied in terms of energy, spectral slope, and formant center frequency and bandwidth distributions in silence, voiced, and unvoiced speech signal segments. In the part dedicated to speech recognition, several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed. The proposed neutral-trained system employing redistributed filter bank and reduced features provides a 7.7 % absolute WER reduction over the baseline system trained on neutral speech, and a 1.3 % reduction over a baseline system with whisper-adapted acoustic models.

Keywords

acoustic signal processing; cepstral analysis; channel bank filters; data reduction; speech recognition; text analysis; UT-vocal effort II corpus; WER reduction; acoustic variation; alternative pronunciations; automatic speech recognition; bandwidth distribution; baseline system; cepstral dimensionality reduction; constrained lexicon recognition; formant center frequency; front-end filter bank redistribution; lexicon expansion; neutral speech; neutral trained acoustic model; silence speech signal segment; spectral slope; unvoiced speech signal segment; whisper adapted acoustic model; whispered speech; Adaptation models; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Whisper speech recognition; filter-bank optimization; speech analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854059

Filename

6854059