Title :
Submodular data selection with acoustic and phonetic features for automatic speech recognition
Author :
Chongjia Ni ; Lei Wang ; Haibo Liu ; Cheung-Chi Leung ; Li Lu ; Bin Ma
Author_Institution :
Inst. for Infocomm Res. (I2R), A*STAR, Singapore, Singapore
Abstract :
In this paper, we propose to use acoustic feature based submodular function optimization to select a subset of untranscribed data for manual transcription, and retrain the initial acoustic model with the additional transcribed data. The acoustic features are obtained from an unsupervised Gaussian mixture model. We also integrate the acoustic features with the phonetic features, which are obtained from an initial ASR system, in the submodular function. Submodular function optimization has been theoretically shown its near-optimal guarantee. We performed the experiments on 1000 hours of Mandarin mobile phone speech, in which 300 hours of initial data was for the training of an initial acoustic model. The experimental results show that the acoustic feature based approach, which does not rely on an initial ASR system, performs as well as the phonetic feature based approach. Moreover, there is complementary effect between the acoustic feature based and the phonetic feature based data selection. The submodular function with the combined features provides a relative 4.8% character error rate (CER) reduction over the corresponding ASR system using random selection. We also include the desired feature distribution obtained from a development set in a generalized function, but the improvement is insignificant.
Keywords :
Gaussian distribution; Gaussian processes; acoustic signal processing; error statistics; feature selection; mixture models; mobile handsets; speech processing; speech recognition; CER reduction; Mandarin mobile phone speech; acoustic feature based submodular function optimization; automatic speech recognition; character error rate reduction; feature distribution; initial ASR system; manual transcription; phonetic feature based submodular data selection; random selection; unsupervised Gaussian mixture model; Acoustics; Data models; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training data; Active learning; automatic speech recognition; data selection; submodular optimization;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178848