A CHiME-3 challenge system: Long-term acoustic features for noise robust automatic speech recognition

Author

Niko Moritz;Stephan Gerlach;Kamil Adiloglu;Jorn Anem?lle;Birger Kollmeier;Stefan Goetze

Author_Institution

Fraunhofer IDMT, Project Group for Hearing, Speech, and Audio Technology, Oldenburg, Germany

fYear

2015

Firstpage

468

Lastpage

474

Abstract

The paper describes an automatic speech recognition (ASR) system for the 3rd CHiME challenge that addresses noisy acoustic scenes within public environments. The proposed system includes a multi-channel speech enhancement front-end including a microphone channel failure detection method that is based on cross-comparing the modulation spectra of speech to detect erroneous microphone recordings. The main focus of the submission is the investigation of the amplitude modulation filter bank (AMFB) as a method to extract long-term acoustic cues prior to a Gaussian mixture model (GMM) or deep neural network (DNN) based ASR classifier. It is shown that AMFB features outperform the commonly used frame splicing technique of filter bank features even on a performance optimized ASR challenge system. I.e., temporal analysis of speech by hand-crafted and auditory motivated AMFBs is shown to be more robust compared to a data-driven method based on extracting temporal dynamics with a DNN. Our final ASR system, which additionally includes adaptation of acoustic features to speaker characteristics, achieves an absolute word error rate reduction of approx. 21.53 % relative to the best CHiME-3 baseline system on the "real" test condition.

Keywords

"Feature extraction","Filter banks","Acoustics","Training","Microphones","Speech","Frequency modulation"

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type

conf

DOI

10.1109/ASRU.2015.7404832

Filename

7404832