Title :
Speaker adaptation applied to HMM and neural networks
Author :
Nakamura, Satoshi ; Shikano, Kiyohiro
Author_Institution :
ATR Interpreting Telephony Res. Lab., Kyoto, Japan
Abstract :
The authors propose a speaker adaptation algorithm which does not depend on speech recognition algorithms. The proposed spectral mapping algorithm is based on three ideas: (1) accurate representation of the input vector by separate vector quantization and fuzzy vector quantization, (2) continuous spectral mapping from one speaker to another by fuzzy mapping, and (3) accurate establishment of spectral correspondence based on the fuzzy relationship of the membership function obtained from supervised training. The spectrum dynamic features are also utilized. The algorithm is applied to hidden Markov models (HMMs) and neural networks and evaluated using a database of 216 phonetically balanced words and 5240 important Japanese words uttered by three speakers. The HMM speaker adapted recognition rate for /b,d,g/ is 79.5%. The average recognition rate for the top-three choices is about 91%. The algorithm was applied to neural networks and resulted in almost the same performance. The algorithm was also applied to voice conversion, and a preference score of 65.6% was obtained
Keywords :
Markov processes; neural nets; speech recognition; HMM; Japanese words; continuous spectral mapping; fuzzy mapping; fuzzy vector quantization; hidden Markov models; membership function; neural networks; phonetically balanced words; separate vector quantization; speaker adaptation algorithm; spectral correspondence; spectral mapping algorithm; speech recognition; supervised training; voice conversion; Auditory system; Hidden Markov models; Histograms; Laboratories; Neural networks; Spatial databases; Speech recognition; System testing; Telephony; Vector quantization;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 International Conference on
Conference_Location :
Glasgow
DOI :
10.1109/ICASSP.1989.266370