Title :
Speaker normalized acoustic modeling based on 3-D Viterbi decoding
Author :
Fukada, Toshiaki ; Sagisaka, Yoshinori
Author_Institution :
ATR I, Kyoto, Japan
Abstract :
This paper describes a novel method for speaker normalization based on a frequency warping approach to reduce variations due to speaker-induced factors such as the vocal tract length. In our approach, a speaker normalized acoustic model is trained using time-varying (i.e., state, phoneme or word dependent) warping factors, while in the conventional approaches, the frequency warping factor is fixed for each speaker. These time-varying frequency warping factors are determined by a 3-dimensional (i.e., input frames, HMM states and warping factors) Viterbi decoding procedure. Experimental results on Japanese spontaneous speech recognition show that the proposed method yields a 9.7% improvement in speech recognition accuracy compared to the conventional speaker-independent model
Keywords :
Viterbi decoding; acoustic signal processing; cepstral analysis; speech coding; speech recognition; 3D Viterbi decoding; HMM states; Japanese spontaneous speech recognition; experimental results; input frames; mel-cepstral analysis; speaker normalized acoustic model; speaker normalized acoustic modeling; speaker-independent model; speaker-induced factors; speech recognition accuracy; time-varying frequency warping factors; vocal tract length; Cepstral analysis; Frequency estimation; Hidden Markov models; Loudspeakers; Maximum likelihood decoding; Maximum likelihood linear regression; Mel frequency cepstral coefficient; Robustness; Speech recognition; Viterbi algorithm;
Conference_Titel :
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7803-4428-6
DOI :
10.1109/ICASSP.1998.674461