Title : 
Training data selection based on context-dependent state matching
         
        
        
            Author_Institution : 
Google Inc., New York, NY, USA
         
        
        
        
        
        
            Abstract : 
In this paper we construct a data set for semi-supervised acoustic model training by selecting spoken utterances from a massive collection of anonymized Google Voice Search utterances. Semi-supervised training usually retains high-confidence utterances which are presumed to have an accurate hypothesized transcript, a necessary condition for successful training. Selecting high confidence utterances can however restrict the diversity of the resulting data set. We propose to introduce a constraint enforcing that the distribution of the context-dependent state symbols obtained by running forced alignment of the hypothesized transcript matches a reference distribution estimated from a curated development set. The quality of the obtained training set is illustrated on large scale Voice Search recognition experiments and outperforms random selection of high-confidence utterances.
         
        
            Keywords : 
speech recognition; training; context-dependent state matching; curated development set; google voice search utterances; high-confidence utterance selection; hypothesized transcript matches; large scale voice search recognition; reference distribution estimation; running forced alignment; semisupervised acoustic model training; spoken utterances selection; training data selection; Acoustics; Google; Hidden Markov models; Mobile communication; Speech; Speech processing; Training; data selection; semi-supervised training;
         
        
        
        
            Conference_Titel : 
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
         
        
            Conference_Location : 
Florence
         
        
        
            DOI : 
10.1109/ICASSP.2014.6854214