• DocumentCode
    323758
  • Title

    Dynamically configurable acoustic models for speech recognition

  • Author

    Hwang, Mei-Yuh ; Huang, Xuedong

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • Volume
    2
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    669
  • Abstract
    Senones were introduced to share Hidden Markov model (HMM) parameters at a sub-phonetic level as proposed by Hwang and Huang (1992) and decision trees were incorporated to predict unseen phonetic contexts as suggested by Hwang, Haung and Alleva (1993). We describe two applications of the senonic decision tree in (1) dynamically downsizing a speech recognition system for small platforms and in (2) sharing the Gaussian covariances of continuous density HMMs (CHMMs). We experimented on how to balance different parameters that can offer the best trade off between recognition accuracy and system size. The dynamically downsized system, without retraining, performed even better than the regular Baum-Welch (1972) trained system. The shared covariance model provided as good a performance as the unshared full model and thus gave us the freedom to increase the number of Gaussian means to increase the accuracy of the model. Combining the downsizing and covariance sharing algorithms, a total of 8% error reduction was achieved over the Baum-Welch trained system with approximately the same parameter size
  • Keywords
    Gaussian processes; acoustic signal processing; covariance analysis; decision theory; error statistics; hidden Markov models; speech processing; speech recognition; trees (mathematics); Baum-Welch trained system; CHMM; Gaussian covariances; Gaussian means; HMM parameters sharing; Hidden Markov model; continuous density HMM; covariance sharing algorithm; downsizing algorithm; dynamically configurable acoustic models; error reduction; model accuracy; parameter size; performance; phonetic contexts prediction; recognition accuracy; senones; senonic decision tree; shared covariance model; speech recognition; speech recognition system downsizing; sub-phonetic level; system size; unshared full model; Acoustic testing; Decision trees; Density functional theory; Error analysis; Hidden Markov models; Histograms; Resource management; Speech recognition; Statistics; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.675353
  • Filename
    675353