مرکز منطقه ای اطلاع رساني علوم و فناوري - A hybrid approach to adapting acoustic and pronunciation models for non-native speech recognition

DocumentCode :

2429288

Title :

A hybrid approach to adapting acoustic and pronunciation models for non-native speech recognition

Author :

Oh, Yoo Rhee ; Kim, Hong Kook

Author_Institution :

Dept. of Inf. & Commun., Gwangju Inst. of Sci. & Technol. (GIST), Buk-gu, South Korea

fYear :

2009

fDate :

1-4 Nov. 2009

Firstpage :

1757

Lastpage :

1761

Abstract :

In this paper, we propose a hybrid model adaptation approach that combines pronunciation and acoustic model adaptation methods in order to improve the performance of nonnative automatic speech recognition (ASR). Specifically, the hybrid model adaptation can be performed in two ways; at a state-tying level or a triphone-modeling level. In both methods, we first analyze the pronunciation variant rules of non-native speech and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level method then adapts pronunciation models by adding variant pronunciations from the non-native speech and acoustic models by tying the states of triphone acoustic models using the acoustic variants. Conversely, the triphone-modeling level method adapts pronunciation models in the same way as the state-tying level method, re-estimates the triphone acoustic models using the adapted pronunciation models, and clusters the states of triphone acoustic models using the acoustic variants. From Korean-spoken English speech-recognition experiments, it is shown that the proposed hybrid acoustic and pronunciation model adaptation approach using the state-tying level method and the triphone-modeling level method can relatively reduce the average word error rates (WERs) by 16.07% and 20.94%, respectively, when compared to a baseline ASR system.

Keywords :

speech recognition; acoustic model adaptation; automatic speech recognition; average word error rates; hybrid model adaptation approach; nonnative speech recognition; Adaptation model; Automatic speech recognition; Databases; Decision trees; Degradation; Dictionaries; Loudspeakers; Natural languages; Speech analysis; Speech recognition; Non-native speech recognition; acoustic model adaptation; pronunciation model adaptation; pronunciation variability; state-tying level adaptation; triphone-modeling level adaptation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signals, Systems and Computers, 2009 Conference Record of the Forty-Third Asilomar Conference on

Conference_Location :

Pacific Grove, CA

ISSN :

1058-6393

Print_ISBN :

978-1-4244-5825-7

Type :

conf

DOI :

10.1109/ACSSC.2009.5469755

Filename :

5469755

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2429288