Title :
Acoustic model building based on non-uniform segments and bidirectional recurrent neural networks
Author_Institution :
ATR Interpreting Telephony Res. Labs., Kyoto, Japan
Abstract :
A new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior “frame to phoneme” probabilities, they are used here to estimate directly “segment to phoneme” probabilities, which results in an improved duration model. The special MAP approach allows not only incorporation of long term dependencies on the acoustic side, but also on the phone (output) side, which results automatically in parameter efficient context dependent models. While the use of neural networks as frame or phoneme classifiers always results in discriminative training for the acoustic information, the MAP approach presented also incorporates discriminative training for the internally learned phoneme language model. Classification tests for the TIMIT phoneme database gave promising results of 77.75 (82.38)% for the full test data set with all 61(39) symbols
Keywords :
acoustic signal processing; feature extraction; learning (artificial intelligence); maximum likelihood estimation; pattern classification; recurrent neural nets; speech processing; speech recognition; TIMIT phoneme database; acoustic model building; bidirectional recurrent neural networks; classification tests; discriminative training; duration model; feature extraction; frame classifiers; long term dependencies; nonuniform segments; parameter efficient context dependent models; phoneme classifiers; phoneme language model; segment to phoneme probabilities; speech recognition; speech recognition systems; test data set; Acoustic testing; Databases; Error analysis; Merging; Neural networks; Pattern recognition; Probability; Recurrent neural networks; Speech recognition; Statistical analysis;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
Print_ISBN :
0-8186-7919-0
DOI :
10.1109/ICASSP.1997.595486