• DocumentCode
    2793443
  • Title

    Discriminatively estimated joint acoustic, duration, and language model for speech recognition

  • Author

    Lehr, Maider ; Shafran, Izhak

  • Author_Institution
    Center for Spoken Language Understanding (CSLU), Oregon Health & Sci. Univ., Portland, OR, USA
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    5542
  • Lastpage
    5545
  • Abstract
    We introduce a discriminative model for speech recognition that integrates acoustic, duration and language components. In the framework of finite state machines, a general model for speech recognition G is a finite state transduction from acoustic state sequences to word sequences (e.g., search graph in many speech recognizers). The lattices from a baseline recognizer can be viewed as an a posteriori version of G after having observed an utterance. So far, discriminative language models have been proposed to correct the output side of G and is applied on the lattices. The acoustic state sequences on the input side of these lattice can also be exploited to improve the choice of the best hypotheses through the lattice. Taking this view, the model proposed in this paper jointly estimates the parameters for acoustic and language components in a discriminative setting. The resulting model can be factored as corrections for the input and the output sides of the general model G. This formulation allows us to incorporate duration cues seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.6% absolute. Through a series of experiments we analyze the contributions from and interactions between acoustic, duration and language components to find that duration cues play an important role in Arabic task.
  • Keywords
    linguistics; speech recognition; acoustic modeling; acoustic state sequences; discriminative language model; duration cues; duration modeling; finite state transduction; language modeling; large vocabulary Arabic GALE task; speech recognition; Automata; Decoding; Error analysis; Lattices; Natural languages; Parameter estimation; Performance gain; Speech recognition; Vectors; Vocabulary; acoustic modeling; discriminative modeling; duration modeling; language modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495227
  • Filename
    5495227