Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems

Author

Siniscalchi, Sabato Marco ; Jinyu Li ; Chin-Hui Lee

Author_Institution

Dept. of Comput. Eng., Kore Univ. of Enna, Enna, Italy

Volume

21

Issue

10

fYear

2013

fDate

Oct. 2013

Firstpage

2152

Lastpage

2161

Abstract

Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any automatic speech recognition (ASR) system. This work addresses the problem of increased degradation in performance when moving from speaker-dependent (SD) to speaker-independent (SI) conditions for connectionist (or hybrid) hidden Markov model/artificial neural network (HMM/ANN) systems in the context of large vocabulary continuous speech recognition (LVCSR). Adapting hybrid HMM/ANN systems on a small amount of adaptation data has been proven to be a difficult task, and has been a limiting factor in the widespread deployment of hybrid techniques in operational ASR systems. Addressing the crucial issue of speaker adaptation (SA) for hybrid HMM/ANN system can thereby have a great impact on the connectionist paradigm, which will play a major role in the design of next-generation LVCSR considering the great success reported by deep neural networks - ANNs with many hidden layers that adopts the pre-training technique - on many speech tasks. Current adaptation techniques for ANNs based on injecting an adaptable linear transformation network connected to either the input, or the output layer are not effective especially with a small amount of adaptation data, e.g., a single adaptation utterance. In this paper, a novel solution is proposed to overcome those limits and make it robust to scarce adaptation resources. The key idea is to adapt the hidden activation functions rather than the network weights. The adoption of Hermitian activation functions makes this possible. Experimental results on an LVCSR task demonstrate the effectiveness of the proposed approach.

Keywords

hidden Markov models; next generation networks; polynomials; speech recognition; ANN; ASR; Hermitian activation functions; Hermitian polynomial; adaptable linear transformation network; adaptation techniques; artificial neural network; automatic speech recognition; connectionist speech recognition systems; hidden Markov model; hybrid HMM-ANN systems; large vocabulary continuous speech recognition; model adaptation techniques; neural networks; next-generation LVCSR; operational ASR systems; speaker adaptation; speaker-dependent conditions; speaker-independent conditions; Artificial neural networks; model adaptation; speech processing;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2013.2270370

Filename

6544616