DocumentCode :
1239167
Title :
Unsupervised Adaptation With Discriminative Mapping Transforms
Author :
Yu, Kai ; Gales, Mark ; Woodland, Philip C.
Author_Institution :
Eng. Dept., Cambridge Univ., Cambridge
Volume :
17
Issue :
4
fYear :
2009
fDate :
5/1/2009 12:00:00 AM
Firstpage :
714
Lastpage :
723
Abstract :
The most commonly used approaches to speaker adaptation are based on linear transforms, as these can be robustly estimated using limited adaptation data. Although significant gains can be obtained using discriminative criteria for training acoustic models, maximum-likelihood (ML) estimated transforms are still used for unsupervised adaptation. This is because discriminatively trained transforms are highly sensitive to errors in the adaptation supervision hypothesis. This paper describes a new framework for estimating transforms that are discriminative in nature, but are less sensitive to this hypothesis issue. A speaker-independent discriminative mapping transformation (DMT) is estimated during training. This transform is obtained after a speaker-specific ML-estimated transform of each training speaker has been applied. During recognition an ML speaker-specific transform is found for each test-set speaker and the speaker-independent DMT then applied. This allows a transform which is discriminative in nature to be indirectly estimated, while only requiring an ML speaker-specific transform to be found during recognition. The DMT technique is evaluated on an English conversational telephone speech task. Experiments showed that using DMT in unsupervised adaptation led to significant gains over both standard ML and discriminatively trained transforms.
Keywords :
maximum likelihood estimation; speaker recognition; transforms; English conversational telephone speech task; acoustic model training; adaptation supervision hypothesis; linear transforms; maximum-likelihood estimation transform; speaker adaptation; speaker recognition; speaker-independent discriminative mapping transformation; unsupervised adaptation; Digital audio broadcasting; Loudspeakers; Maximum likelihood estimation; OFDM modulation; Robustness; Speech analysis; Speech recognition; Target recognition; Telephony; Testing; Criterion mapping function; discriminative mapping transform; discriminative training; unsupervised adaptation;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2008.2011535
Filename :
4814782
Link To Document :
بازگشت