• DocumentCode
    2973011
  • Title

    Investigations on features for log-linear acoustic models in continuous speech recognition

  • Author

    Wiesler, S. ; Nussbaum-Thom, M. ; Heigold, G. ; Schluter, R. ; Ney, H.

  • Author_Institution
    Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
  • fYear
    2009
  • fDate
    Nov. 13 2009-Dec. 17 2009
  • Firstpage
    52
  • Lastpage
    57
  • Abstract
    Hidden Markov Models with Gaussian Mixture Models as emission probabilities (GHMMs) are the underlying structure of all state-of-the-art speech recognition systems. Using Gaussian mixture distributions follows the generative approach where the class-conditional probability is modeled, although for classification only the posterior probability is needed. Though being very successful in related tasks like Natural Language Processing (NLP), in speech recognition direct modeling of posterior probabilities with log-linear models has rarely been used and has not been applied successfully to continuous speech recognition. In this paper we report competitive results for a speech recognizer with a log-linear acoustic model on the Wall Street Journal corpus, a Large Vocabulary Continuous Speech Recognition (LVCSR) task. We trained this model from scratch, i.e. without relying on an existing GHMM system. Previously the use of data dependent sparse features for log-linear models has been proposed. We compare them with polynomial features and show that the combination of polynomial and data dependent sparse features leads to better results.
  • Keywords
    polynomials; speech recognition; statistical distributions; Gaussian mixture models; Wall Street Journal corpus; class-conditional probability; continuous speech recognition system; data dependent sparse features; emission probabilities; hidden Markov models; log-linear acoustic models; posterior probability; Acoustic distortion; Acoustic emission; Computer science; Hidden Markov models; Maximum likelihood estimation; Natural language processing; Polynomials; Probability; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
  • Conference_Location
    Merano
  • Print_ISBN
    978-1-4244-5478-5
  • Electronic_ISBN
    978-1-4244-5479-2
  • Type

    conf

  • DOI
    10.1109/ASRU.2009.5373362
  • Filename
    5373362