An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition

Author

Zhao, Yunxin

Author_Institution

Speech Technol. Lab., Panasonic Technol. Inc., Santa Barbara, CA, USA

Volume

2

Issue

3

fYear

1994

fDate

7/1/1994 12:00:00 AM

Firstpage

380

Lastpage

394

Abstract

A new speaker adaptation technique is proposed for improving speaker-independent continuous speech recognition based on a decomposition of spectral variation sources. In this technique, the spectral variations are separated into two categories, one acoustic and the other phone-specific, where each variation source is modeled by a linear transformation system. The technique consists of two sequential steps: first, acoustic normalization is performed, and second, phone model parameters are adapted. Experiments of speaker adaptation on the TIMIT database using short calibration speech (5 s per speaker) have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units. For a vocabulary size of 853 and test set perplexity of 104, the recognition word accuracy has been improved from 86.9% for the baseline system to 90.5% after adaptation, corresponding to an error reduction of 27.5%. On a more difficult test set that contains an additional variation source due to recording channel mismatch, a more significant performance improvement has been obtained: for the same vocabulary and a test set perplexity of 101, the recognition word accuracy has been improved from 65.4% for the baseline to 86.0% after adaptation, corresponding to an error reduction of 59.5%

Keywords

acoustic signal processing; speech recognition; Gaussian mixture density based hidden Markov models; TIMIT database; acoustic normalization; acoustic-phonetic-based speaker adaptation technique; decomposition; linear transformation system; performance; phone model parameters; recognition word accuracy; speaker-independent continuous speech recognition; spectral variation sources; test set perplexity; vocabulary size; Calibration; Character recognition; Databases; Decoding; Hidden Markov models; Loudspeakers; Speech recognition; System performance; System testing; Vocabulary;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.294352

Filename

294352