• DocumentCode
    2323960
  • Title

    Ear-model derived features for automatic speech recognition

  • Author

    de Mori, Renato ; Albesano, Dario ; Gemello, Roberto ; Mana, Franco

  • Author_Institution
    LIA CERI-IUP, Univ. of Avignon, France
  • Volume
    3
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    1603
  • Abstract
    The paper provides a theoretical justification that gravity centers (GC) in frequency bands computed from zero-crossing information are far more robust to additive telephone noise than GCs computed from FFT spectra. Experiments on two different corpora confirm the theoretical results when GCs are added to standard mel frequency-scaled cepstral coefficients (MFCC) and their time derivatives. A 20.1% word error reduction is observed on a large telephone corpus of Italian cities, with an average signal-to-noise ratio (SNR) of 15 dB, if GCs are computed from zero-crossings, while performance deteriorates when GCs are computed from FFT spectra
  • Keywords
    acoustic noise; cepstral analysis; speech recognition; FFT spectra; Italian cities; SNR; additive telephone noise; automatic speech recognition; average signal-to-noise ratio; ear-model derived features; frequency bands; gravity centers; large telephone corpus; performance; standard mel frequency-scaled cepstral coefficients; time derivatives; word error reduction; zero-crossing information; Additive noise; Automatic speech recognition; Cepstral analysis; Frequency; Gravity; Hidden Markov models; Neural networks; Noise robustness; Telecommunication computing; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-6293-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2000.862002
  • Filename
    862002