• DocumentCode
    310570
  • Title

    Improved estimation of supervision in unsupervised speaker adaptation

  • Author

    Homma, Shigeru ; Aikawa, Kiyoaki ; Sagayama, Shigeki

  • Author_Institution
    NTT Human Interface Labs., Kanagawa, Japan
  • Volume
    2
  • fYear
    1997
  • fDate
    21-24 Apr 1997
  • Firstpage
    1023
  • Abstract
    Unsupervised speaker adaptation plays an important role in “batch dictation”, the aim of which is to automatically transcribe large amounts of recorded dictation using speech recognition. In the case of unsupervised speaker adaptation which uses recognition results of target speech as the means of supervision, erroneous recognition results degrade the quality of the adapted acoustic models. This paper presents a new supervision selection method. By using this method, correction of the first candidate is judged based on the likelihood ratio of the first and the second candidates. This method eliminates erroneous recognition results and corresponding speech data from the adaptive training data. We implemented this method in the iterative unsupervised speaker adaptation procedure. It is shown that the recognition errors are drastically reduced by 50% in a practical application of batch-style speech-to-text conversion of recorded dictation of Japanese medical diagnoses compared with speaker-independent recognition
  • Keywords
    adaptive estimation; dictation; iterative methods; speech recognition; unsupervised learning; Japanese medical diagnoses; adapted acoustic models; adaptive training; batch dictation; batch-style speech-to-text conversion; erroneous recognition results; iterative unsupervised speaker adaptation procedure; likelihood ratio; recorded dictation; speech recognition; supervision; target speech; transcription; unsupervised speaker adaptation; Automatic speech recognition; Degradation; Hidden Markov models; Humans; Iterative methods; Laboratories; Loudspeakers; Speech recognition; Target recognition; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
  • Conference_Location
    Munich
  • ISSN
    1520-6149
  • Print_ISBN
    0-8186-7919-0
  • Type

    conf

  • DOI
    10.1109/ICASSP.1997.596114
  • Filename
    596114