مرکز منطقه ای اطلاع رساني علوم و فناوري - Capturing Local Variability for Speaker Normalization in Speech Recognition

DocumentCode :

1036518

Title :

Capturing Local Variability for Speaker Normalization in Speech Recognition

Author :

Miguel, Antonio ; Lleida, Eduardo ; Rose, Richard ; Buera, Luis ; Saz, Óscar ; Ortega, Alfonso

Author_Institution :

Univ. of Zaragoza, Zaragoza

Volume :

Issue :

fYear :

2008

fDate :

3/1/2008 12:00:00 AM

Firstpage :

578

Lastpage :

593

Abstract :

The new model reduces the impact of local spectral and temporal variability by estimating a finite set of spectral and temporal warping factors which are applied to speech at the frame level. Optimum warping factors are obtained while decoding in a locally constrained search. The model involves augmenting the states of a standard hidden Markov model (HMM), providing an additional degree of freedom. It is argued in this paper that this represents an efficient and effective method for compensating local variability in speech which may have potential application to a broader array of speech transformations. The technique is presented in the context of existing methods for frequency warping-based speaker normalization for ASR. The new model is evaluated in clean and noisy task domains using subsets of the Aurora 2, the Spanish Speech-Dat-Car, and the TIDIGITS corpora. In addition, some experiments are performed on a Spanish language corpus collected from a population of speakers with a range of speech disorders. It has been found that, under clean or not severely degraded conditions, the new model provides improvements over the standard HMM baseline. It is argued that the framework of local warping is an effective general approach to providing more flexible models of speaker variability.

Keywords :

decoding; hidden Markov models; speech coding; speech recognition; Spanish Speech-Dat-Car; Spanish language corpus; decoding; finite set estimation; frequency warping-based speaker normalization; hidden Markov model; local spectral-temporal variability; optimum warping factors; spectral-temporal warping factors; speech recognition; speech transformations; Automatic speech recognition (ASR); local warping; maximum likelihood; speaker normalization; vocal tract normalization;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2007.914114

Filename :

4432279

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1036518