Title :
On using heterogeneous data for vehicle-based speech recognition: A DNN-based approach
Author :
Xue Feng ; Richardson, Brigitte ; Amman, Scott ; Glass, James
Author_Institution :
MIT Comput. Sci. & Artificial Intell. Lab., Cambridge, MA, USA
Abstract :
Most automatic speech recognition (ASR) systems incorporate a single source of information about their input, namely, features and transformations derived from the speech signal. However, in many applications, e.g., vehicle-based speech recognition, sensor data and environmental information are often available to complement audio information. In this paper, we show how these data can be used to improve hybrid DNN-HMM ASR systems for a vehicle-based speech recognition task. Feature fusion is accomplished by augmenting acoustic features with additional side information before being presented to the DNN acoustic model. The additional features are extracted from the vehicle speed, HVAC status, windshield wiper status, and vehicle type. This supplementary information improves the DNNs ability to discriminate phonetic events in an environment-aware way without having to make any modification to the DNN training algorithms. Experimental results show that heterogeneous data are effective irrespective of whether cross-entropy or sequence training is used. For CE training, a WER reduction of 6.3% is obtained, while sequential training reduces it by 5.5%.
Keywords :
entropy; learning (artificial intelligence); neural nets; speech processing; speech recognition; vehicles; DNN acoustic model; DNN training algorithm; DNN-HMM ASR system; HVAC status; WER reduction; acoustic feature augmentation; automatic speech recognition; cross-entropy; deep neural network; feature fusion; heterogeneous data; phonetic event discrimination; sensor data; sequence training; speech signal; vehicle speed; vehicle type; vehicle-based speech recognition; windshield wiper status; Computational modeling; Hidden Markov models; Mel frequency cepstral coefficient; Robustness; Speech; Vehicles; Additional Feature for ASR; Condition-aware DNN; Deep Neural Network; Noise Robustness;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178799