• DocumentCode
    865613
  • Title

    Speech Recognition Using Linear Dynamic Models

  • Author

    Frankel, Joe ; King, Simon

  • Author_Institution
    Centre for Speech Technol. Res., Edinburgh Univ.
  • Volume
    15
  • Issue
    1
  • fYear
    2007
  • Firstpage
    246
  • Lastpage
    256
  • Abstract
    The majority of automatic speech recognition systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors (augmented to include derivative information) as statistically independent. Furthermore, spatial correlations present in speech parameters are frequently ignored through the use of diagonal covariance matrices. This paper continues the work of Digalakis and others who proposed instead a first-order linear state-space model which has the capacity to model underlying dynamics, and furthermore give a model of spatial correlations. This paper examines the assumptions made in applying such a model and shows that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models. We also propose a time-asynchronous decoding strategy suited to recognition with segment models. We describe implementation of decoding for linear dynamic models and present TIMIT phone recognition results
  • Keywords
    Gaussian processes; covariance matrices; decoding; hidden Markov models; speech coding; speech recognition; Gaussian mixtures model; automatic speech recognition; diagonal covariance matrices; equivalent static models; hidden Markov models; linear dynamic models; spatial correlations; time-asynchronous decoding strategy; Automatic speech recognition; Covariance matrix; Decoding; Feature extraction; Gaussian noise; Helium; Hidden Markov models; Lattices; Speech recognition; Vectors; Automatic speech recognition (ASR); linear dynamic models (LDMs); stack decoding;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.876766
  • Filename
    4032771