• DocumentCode
    1239167
  • Title

    Unsupervised Adaptation With Discriminative Mapping Transforms

  • Author

    Yu, Kai ; Gales, Mark ; Woodland, Philip C.

  • Author_Institution
    Eng. Dept., Cambridge Univ., Cambridge
  • Volume
    17
  • Issue
    4
  • fYear
    2009
  • fDate
    5/1/2009 12:00:00 AM
  • Firstpage
    714
  • Lastpage
    723
  • Abstract
    The most commonly used approaches to speaker adaptation are based on linear transforms, as these can be robustly estimated using limited adaptation data. Although significant gains can be obtained using discriminative criteria for training acoustic models, maximum-likelihood (ML) estimated transforms are still used for unsupervised adaptation. This is because discriminatively trained transforms are highly sensitive to errors in the adaptation supervision hypothesis. This paper describes a new framework for estimating transforms that are discriminative in nature, but are less sensitive to this hypothesis issue. A speaker-independent discriminative mapping transformation (DMT) is estimated during training. This transform is obtained after a speaker-specific ML-estimated transform of each training speaker has been applied. During recognition an ML speaker-specific transform is found for each test-set speaker and the speaker-independent DMT then applied. This allows a transform which is discriminative in nature to be indirectly estimated, while only requiring an ML speaker-specific transform to be found during recognition. The DMT technique is evaluated on an English conversational telephone speech task. Experiments showed that using DMT in unsupervised adaptation led to significant gains over both standard ML and discriminatively trained transforms.
  • Keywords
    maximum likelihood estimation; speaker recognition; transforms; English conversational telephone speech task; acoustic model training; adaptation supervision hypothesis; linear transforms; maximum-likelihood estimation transform; speaker adaptation; speaker recognition; speaker-independent discriminative mapping transformation; unsupervised adaptation; Digital audio broadcasting; Loudspeakers; Maximum likelihood estimation; OFDM modulation; Robustness; Speech analysis; Speech recognition; Target recognition; Telephony; Testing; Criterion mapping function; discriminative mapping transform; discriminative training; unsupervised adaptation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2008.2011535
  • Filename
    4814782