An MRNN-based method for continuous Mandarin speech recognition

Author

Liao, Yuan-Fu ; Chen, Sin-Horng

Author_Institution

Dept. of Commun. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan

Volume

2

fYear

1998

fDate

12-15 May 1998

Firstpage

1121

Abstract

A new modular recurrent neural network (MRNN)-based method for continuous Mandarin speech recognition is proposed. The system uses five RNNs to accomplish many subtasks separately and then combine them to integrally solve the problem. They include two RNNs for the discrimination of the two sub-syllable groups of 100 right-final-dependent (RFD) initials and 39 context independent (CI) finals, two RNNs for the generation of dynamic weighting functions for sub-syllable´s integration, and one RNN for syllable boundary detection. All RNN modules are combined using a delay-decision Viterbi search. The method differs from the ANN/HMM hybrid approach of using ANNs to perform not only sub-syllables discrimination but also temporal structure modeling of the speech signal. The system is trained using a three-stage training method embedding with the MCE/GPD algorithms. Besides, a fast recognition method using multi-level pruning is also proposed. Experimental results showed that it outperforms the HMM method on both the recognition accuracy and the computational complexity

Keywords

backpropagation; computational complexity; natural languages; neural net architecture; recurrent neural nets; search problems; speech processing; speech recognition; ANN/HMM hybrid approach; CI finals; MCE/GPD algorithms; MRNN-based method; RFD initials; RNN modules; computational complexity; context independent finals; continuous Mandarin speech recognition; delay-decision Viterbi search; dynamic weighting functions; error backpropagation algorithm; experimental results; fast recognition method; modular recurrent neural network; multi-level pruning; neural network architecture; recognition accuracy; right-final-dependent initials; speech signal; sub-syllable groups discrimination; sub-syllable integration; syllable boundary detection; temporal structure modeling; three-stage training method; Artificial neural networks; Computational complexity; Contracts; Councils; Delay; Error analysis; Hidden Markov models; Recurrent neural networks; Speech recognition; Viterbi algorithm;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on

Conference_Location

Seattle, WA

ISSN

1520-6149

Print_ISBN

0-7803-4428-6

Type

conf

DOI

10.1109/ICASSP.1998.675466

Filename

675466