Title :
Building Acoustic Model Ensembles by Data Sampling With Enhanced Trainings and Features
Author :
Xin Chen ; Yunxin Zhao
Author_Institution :
Pearson Knowledge Technol., Menlo Park, CA, USA
Abstract :
We propose a novel approach of using Cross Validation (CV) and Speaker Clustering (SC) based data samplings to construct an ensemble of acoustic models for speech recognition. We also investigate the effects of the existing techniques of Cross Validation Expectation Maximization (CVEM), Discriminative Training (DT), and Multiple Layer Perceptron (MLP) features on the quality of the proposed ensemble acoustic models (EAMs). We have evaluated the proposed methods on TIMIT phoneme recognition task as well as on a telemedicine automatic captioning task. The proposed methods have led to significant improvements in recognition accuracy over conventional Hidden Markov Model (HMM) baseline systems, and the integration of EAMs with CVEM, DT, and MLP has also significantly improved the accuracy performances of the single model systems based on CVEM, DT, and MLP, where the increased inter-model diversity is shown to have played an important role in the performance gain.
Keywords :
acoustic signal processing; expectation-maximisation algorithm; hidden Markov models; multilayer perceptrons; pattern clustering; signal sampling; speaker recognition; CVEM; DT; EAM; HMM baseline system; MLP; SC; TIMIT phoneme recognition task; cross validation expectation maximization; data sampling; discriminative training; ensemble acoustic model; hidden Markov model; intermodel diversity; multiple layer perceptron; recognition accuracy; speaker clustering; speech recognition; telemedicine automatic captioning task; Acoustics; Computational modeling; Data models; Diversity reception; Hidden Markov models; Speech recognition; Training; Ensemble acoustic model; MLP feature; cross validation data sampling; discriminative training; speaker clustering data sampling;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2227729