Title :
Speaker adaptation by modeling the speaker variation in a continuous speech recognition system
Author_Institution :
Dept. of Speech, Music & Hearing, KTH, Sweden
Abstract :
A method for unsupervised instantaneous speaker adaptation is presented and evaluated on a continuous speech recognition task in a man-machine dialogue system. The method is based on modeling of the systematic speaker variation. The variation is modeled by a low-dimensional speaker space and the classification of speech segments is conditioned by the position in the speaker space. Because the effect of the speaker space position on the classification is determined in an off-line training procedure using the speakers in a training database, complex systematic speaker variation can be modeled. Speaker adaptation is achieved only by the constraint that the position in the speaker space is constant over each utterance. Therefore, no separate adaptation session is needed and the adaptation is present from the first utterance. Consequently, for a user there is no noticeable difference between this system and a speaker-independent system. The speaker model and the phonetic classification are implemented in the ANN part of a hybrid ANN/HMM system. In experiments with a pilot system, word accuracy is improved for utterances longer than three words, and utterance level results are improved for utterances of all lengths
Keywords :
hidden Markov models; learning (artificial intelligence); natural language interfaces; neural nets; pattern classification; speech recognition; conditioning; continuous speech recognition system; hybrid artificial neural net/hidden Markov model system; low-dimensional speaker space position; man-machine dialogue system; off-line training procedure; phonetic classification; speech segment classification; systematic speaker variation modelling; training database; unsupervised instantaneous speaker adaptation; utterances; word accuracy; Adaptation model; Auditory system; Automatic speech recognition; Calibration; Hidden Markov models; Loudspeakers; Man machine systems; Parameter estimation; Speech analysis; Speech recognition;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607769