• DocumentCode
    178408
  • Title

    UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech

  • Author

    Ghaffarzadegan, Shabnam ; Boril, Hynek ; Hansen, John H. L.

  • Author_Institution
    Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    2544
  • Lastpage
    2548
  • Abstract
    This study focuses on acoustic variations in speech introduced by whispering, and proposes several strategies to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models. In the analysis part, differences in neutral and whispered speech captured in the UT-Vocal Effort II corpus are studied in terms of energy, spectral slope, and formant center frequency and bandwidth distributions in silence, voiced, and unvoiced speech signal segments. In the part dedicated to speech recognition, several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed. The proposed neutral-trained system employing redistributed filter bank and reduced features provides a 7.7 % absolute WER reduction over the baseline system trained on neutral speech, and a 1.3 % reduction over a baseline system with whisper-adapted acoustic models.
  • Keywords
    acoustic signal processing; cepstral analysis; channel bank filters; data reduction; speech recognition; text analysis; UT-vocal effort II corpus; WER reduction; acoustic variation; alternative pronunciations; automatic speech recognition; bandwidth distribution; baseline system; cepstral dimensionality reduction; constrained lexicon recognition; formant center frequency; front-end filter bank redistribution; lexicon expansion; neutral speech; neutral trained acoustic model; silence speech signal segment; spectral slope; unvoiced speech signal segment; whisper adapted acoustic model; whispered speech; Adaptation models; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Whisper speech recognition; filter-bank optimization; speech analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854059
  • Filename
    6854059