VTLN Based Approaches for Speech Recognition with Very Limited Training Speakers

Author

Sung Min Ban;Bo Kyung Choi;Young Ho Choi;Hyung Soon Kim

Author_Institution

Dept. of Electron. Eng., Pusan Nat. Univ., Busan, South Korea

fYear

2014

Firstpage

285

Lastpage

288

Abstract

In this paper, two approaches using vocal tract length normalization (VTLN) are examined to deal with the acoustic mismatch due to different speakers in automatic speech recognition for the special case that training data is available only for a small number of speakers. One is the conventional VTLN approach in which both training and test utterances are frequency warped according to the maximum likelihood (ML) based warping factor estimation scheme, in order to normalize the speaker characteristics. The other approach is to build a virtually speaker-independent (SI) acoustic model using artificially generated multiple speaker data by VTLN based frequency warping of training utterances from the limited speakers. To compare the performance of the two approaches, Korean isolated word recognition experiments are performed with a small amount of training data from limited speakers. The experimental results show that the virtually SI acoustic model approach yields better performance than both the conventional VTLN approach and the baseline system in case of very limited training speakers.

Keywords

"Acoustics","Silicon","Speech recognition","Mathematical model","Hidden Markov models","Speech","Training"

Publisher

ieee

Conference_Titel

Intelligent Systems, Modelling and Simulation (ISMS), 2014 5th International Conference on

ISSN

2166-0662

Type

conf

DOI

10.1109/ISMS.2014.55

Filename

7280922