Submodular subset selection for large-scale speech training data

Author

Kai Wei ; Yuzong Liu ; Kirchhoff, Katrin ; Bartels, Christopher ; Bilmes, Jeff

Author_Institution

Dept. of Electr. Eng., Univ. of Washington, Seattle, WA, USA

fYear

2014

fDate

4-9 May 2014

Firstpage

3311

Lastpage

3315

Abstract

We address the problem of subselecting a large set of acoustic data to train automatic speech recognition (ASR) systems. To this end, we apply a novel data selection technique based on constrained submodular function maximization. Though NP-hard, the combinatorial optimization problem can be approximately solved by a simple and scalable greedy algorithm with constant-factor guarantees. We evaluate our approach by subselecting data from 1300 hours of conversational English telephone data to train two types large-vocabulary speech recognizers, one with Gaussian mixture model (GMM) based acoustic models, and another based on deep neural networks (DNNs). We show that training data can be reduced significantly, and that our technique outperforms both random selection and a previously proposed selection method utilizing comparable resources. Notably, using the submodular selection method, the DNN system using only about 5% of the training data is able to achieve performance on par with the GMM system using 100% of the training data - with the baseline subset selection methods, however, the DNN system is unable to accomplish this correspondence.

Keywords

Gaussian processes; combinatorial mathematics; neural nets; optimisation; speech recognition; ASR systems; DNN; GMM based acoustic models; Gaussian mixture model; NP-hard problem; automatic speech recognition; combinatorial optimization problem; constant-factor guarantees; constrained submodular function maximization; data selection technique; deep neural networks; large-scale speech training data; large-vocabulary speech recognizers; submodular subset selection method; Acoustics; Hidden Markov models; Speech; Speech processing; Speech recognition; Training; Training data; automatic speech recognition; large-scale systems; machine learning; speech processing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854213

Filename

6854213