• DocumentCode
    1161046
  • Title

    Structured speech modeling

  • Author

    Deng, Li ; Yu, Dong ; Acero, Alex

  • Author_Institution
    Microsoft Res., Redmond, WA
  • Volume
    14
  • Issue
    5
  • fYear
    2006
  • Firstpage
    1492
  • Lastpage
    1504
  • Abstract
    Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structure is exploited to represent long-distance relationships among words , the structured speech model described in this paper makes use of the dynamic structure in the hidden vocal tract resonance space to characterize long-span contextual influence among phonetic units. A general overview is provided first on hierarchically classified types of dynamic speech models in the literature. A detailed account is then given for a specific model type called the hidden trajectory model, and we describe detailed steps of model construction and the parameter estimation algorithms. We show how the use of resonance target parameters and their temporal filtering enables joint modeling of long-span coarticulation and phonetic reduction effects. Experiments on phonetic recognition evaluation demonstrate superior recognizer performance over a modern hidden Markov model-based system. Error analysis shows that the greatest performance gain occurs within the sonorant speech class
  • Keywords
    error analysis; filtering theory; hidden Markov models; parameter estimation; speech recognition; error analysis; generative modeling framework; hidden Markov modeling approach; hidden trajectory model; hidden vocal tract resonance space; hierarchically classified types; joint modeling; long-span coarticulation; long-span contextual influence; model construction; parameter estimation algorithms; phonetic recognition evaluation; phonetic reduction effects; phonetic units; resonance target parameters; sonorant speech class; speech recognition; structured dynamic speech modeling; structured language models; syntactic structure; temporal filtering; Context modeling; Error analysis; Filtering; Hidden Markov models; Natural languages; Parameter estimation; Performance gain; Resonance; Speech analysis; Speech recognition; Hidden dynamics; hidden trajectory; long span modeling; maximum-likelihood; nonlinear prediction; parameter learning; structured modeling; vocal tract resonance;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.878265
  • Filename
    1677971