• DocumentCode
    310559
  • Title

    Using word temporal structure in HMM speech recognition

  • Author

    Fissore, L. ; Laface, P. ; Ravera, F.

  • Author_Institution
    CSELT, Torino, Italy
  • Volume
    2
  • fYear
    1997
  • fDate
    21-24 Apr 1997
  • Firstpage
    975
  • Abstract
    Isolated word speech recognizers with fixed vocabularies are often used to provide vocal services through the telephone line. The paper illustrates a simple postprocessing approach that allows the hypotheses produced by a hidden Markov model recognizer to be rescored taking into account the global temporal structure of the pronounced words. Our approach does not directly rely on state/word duration modeling. It models, instead, the global time variations of the spectral features of each word and their correlation in time: two important perceptual cues that are only partially exploited by standard HMMs. This method has been evaluated using three isolated word speaker independent systems with vocabulary of different size and complexity. We show that, with minimal overhead, the recognition performance improves not only for small vocabulary recognition systems such as the isolated digit one, or for the recognition of 26 Italian spelling names, but also for a system with a 475 city name vocabulary included in a vocal service that provides information about the main railway connections
  • Keywords
    correlation methods; hidden Markov models; spectral analysis; speech processing; speech recognition; telephone lines; voice communication; HMM speech recognition; Italian spelling names; city name vocabulary; correlation; fixed vocabularies; global temporal structure; global time variations; hidden Markov model recognizer; isolated word speaker independent systems; isolated word speech recognizers; main railway connections; perceptual cues; postprocessing approach; pronounced words; recognition performance; small vocabulary recognition systems; spectral features; telephone line; vocabulary complexity; vocabulary size; vocal services; word temporal structure; Cepstral analysis; Computational complexity; Decoding; Hidden Markov models; Laboratories; Predictive models; Robustness; Speech recognition; Vectors; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
  • Conference_Location
    Munich
  • ISSN
    1520-6149
  • Print_ISBN
    0-8186-7919-0
  • Type

    conf

  • DOI
    10.1109/ICASSP.1997.596101
  • Filename
    596101