• DocumentCode
    1109722
  • Title

    An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition

  • Author

    Zhao, Yunxin

  • Author_Institution
    Speech Technol. Lab., Panasonic Technol. Inc., Santa Barbara, CA, USA
  • Volume
    2
  • Issue
    3
  • fYear
    1994
  • fDate
    7/1/1994 12:00:00 AM
  • Firstpage
    380
  • Lastpage
    394
  • Abstract
    A new speaker adaptation technique is proposed for improving speaker-independent continuous speech recognition based on a decomposition of spectral variation sources. In this technique, the spectral variations are separated into two categories, one acoustic and the other phone-specific, where each variation source is modeled by a linear transformation system. The technique consists of two sequential steps: first, acoustic normalization is performed, and second, phone model parameters are adapted. Experiments of speaker adaptation on the TIMIT database using short calibration speech (5 s per speaker) have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units. For a vocabulary size of 853 and test set perplexity of 104, the recognition word accuracy has been improved from 86.9% for the baseline system to 90.5% after adaptation, corresponding to an error reduction of 27.5%. On a more difficult test set that contains an additional variation source due to recording channel mismatch, a more significant performance improvement has been obtained: for the same vocabulary and a test set perplexity of 101, the recognition word accuracy has been improved from 65.4% for the baseline to 86.0% after adaptation, corresponding to an error reduction of 59.5%
  • Keywords
    acoustic signal processing; speech recognition; Gaussian mixture density based hidden Markov models; TIMIT database; acoustic normalization; acoustic-phonetic-based speaker adaptation technique; decomposition; linear transformation system; performance; phone model parameters; recognition word accuracy; speaker-independent continuous speech recognition; spectral variation sources; test set perplexity; vocabulary size; Calibration; Character recognition; Databases; Decoding; Hidden Markov models; Loudspeakers; Speech recognition; System performance; System testing; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.294352
  • Filename
    294352