Title :
HMMs and OWE neural network for continuous speech recognition
Author :
Pican, Nicolas ; Fohr, Dominique ; Mari, Jean-François
Author_Institution :
CRIN-CNRS & INRIA Lorraine, Vandoeuvre-les-Nancy, France
Abstract :
The phonetic context has a large effect on stop consonants in a continuous speech signal. Therefore, recognition systems that model allophones using context-dependent hidden Markov models have been implemented (Lamel and Gauvain, 1993). HMMs have a great ability for segmentation in the temporal domain but have some difficulties in recognition because the MLE training (maximum likelihood estimation) is not discriminant, whereas discrimination is one of the abilities of artificial neural network models. In the last three years we have developed a new ANN model named OWE (Orthogonal Weight Estimator). The principle of the OWE is an ANN that classifies an input pattern according to the contextual environment. This new ANN architecture tackles the problem of context dependent behaviour training. Roughly, the principle is based on a main MLP (multilayered perceptron) in which each synaptic weight connection value is estimated by another MLP (an OWE) with respect to context representation. In this paper, we present a hierarchical system for phoneme recognition: first the system segments the input signal using 48 context independent HMMs. Then the stop consonants are reordered by an OWE ANN. Experiments on TIMIT show 78% correct recognition rate on the 6 stop consonants (/p, t, k, b, d, g)
Keywords :
feedforward neural nets; hidden Markov models; learning (artificial intelligence); maximum likelihood estimation; multilayer perceptrons; neural net architecture; pattern classification; speech recognition; MLE training; OWE neural network; Orthogonal Weight Estimator; TIMIT; allophones; context dependent behaviour training; context representation; context-dependent hidden Markov models; continuous speech recognition; continuous speech signal; discrimination; experiments; maximum likelihood estimation; multilayered perceptron; neural network architecture; pattern classification; phoneme recognition; phonetic context; segmentation; stop consonants; synaptic weight connection value; temporal domain; Artificial intelligence; Artificial neural networks; Context modeling; Covariance matrix; Databases; Hidden Markov models; Maximum likelihood estimation; Neural networks; Speech recognition; State estimation;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607853