• DocumentCode
    3424359
  • Title

    Acoustic modeling with contextual additive structure for HMM-based speech recognition

  • Author

    Nankaku, Yoshihiko ; Nakamura, Kazuhiro ; Zen, Heiga ; Tokuda, Keiichi

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Nagoya Inst. of Technol., Nagoya
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4469
  • Lastpage
    4472
  • Abstract
    This paper proposes an acoustic modeling technique based on an additive structure of context dependencies for HMM-based speech recognition. Typical context dependent models, e.g., triphone HMMs, have direct dependencies of phonetic contexts, i.e., if a phonetic context is given, the Gaussian distribution is specified immediately. This paper assumes a more complex structure, an additive structure of acoustic feature components which have different context dependencies. Since the output probability distribution is composed of additive component distributions, a number of different distributions can be efficiently represented by a combination of fewer distributions. To automatically extract additive components, this paper presents a context clustering algorithm for the additive structure model in which multiple decision trees are constructed simultaneously. Experimental results show that the proposed technique improves phoneme recognition accuracy with fewer number of distributions than the conventional triphone HMMs.
  • Keywords
    Gaussian distribution; acoustic signal processing; decision trees; hidden Markov models; speech processing; speech recognition; Gaussian distribution; HMM-based speech recognition; acoustic feature components; acoustic modeling; additive component distributions; complex structure; context clustering algorithm; contextual additive structure; multiple decision trees; phoneme recognition accuracy; phonetic contexts; probability distribution; Additives; Clustering algorithms; Computer science; Context modeling; Decision trees; Hidden Markov models; Linear regression; Probability distribution; Speech recognition; Training data; Additive structure; Context clustering; Decision trees; Distribution convolution; Hidden Markov models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518648
  • Filename
    4518648