مرکز منطقه ای اطلاع رساني علوم و فناوري - Submodular data selection with acoustic and phonetic features for automatic speech recognition

DocumentCode :

730724

Title :

Submodular data selection with acoustic and phonetic features for automatic speech recognition

Author :

Chongjia Ni ; Lei Wang ; Haibo Liu ; Cheung-Chi Leung ; Li Lu ; Bin Ma

Author_Institution :

Inst. for Infocomm Res. (I2R), A*STAR, Singapore, Singapore

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

4629

Lastpage :

4633

Abstract :

In this paper, we propose to use acoustic feature based submodular function optimization to select a subset of untranscribed data for manual transcription, and retrain the initial acoustic model with the additional transcribed data. The acoustic features are obtained from an unsupervised Gaussian mixture model. We also integrate the acoustic features with the phonetic features, which are obtained from an initial ASR system, in the submodular function. Submodular function optimization has been theoretically shown its near-optimal guarantee. We performed the experiments on 1000 hours of Mandarin mobile phone speech, in which 300 hours of initial data was for the training of an initial acoustic model. The experimental results show that the acoustic feature based approach, which does not rely on an initial ASR system, performs as well as the phonetic feature based approach. Moreover, there is complementary effect between the acoustic feature based and the phonetic feature based data selection. The submodular function with the combined features provides a relative 4.8% character error rate (CER) reduction over the corresponding ASR system using random selection. We also include the desired feature distribution obtained from a development set in a generalized function, but the improvement is insignificant.

Keywords :

Gaussian distribution; Gaussian processes; acoustic signal processing; error statistics; feature selection; mixture models; mobile handsets; speech processing; speech recognition; CER reduction; Mandarin mobile phone speech; acoustic feature based submodular function optimization; automatic speech recognition; character error rate reduction; feature distribution; initial ASR system; manual transcription; phonetic feature based submodular data selection; random selection; unsupervised Gaussian mixture model; Acoustics; Data models; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training data; Active learning; automatic speech recognition; data selection; submodular optimization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178848

Filename :

7178848

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=730724