• DocumentCode
    590865
  • Title

    Speaker adaptation intensively weighted on mis-recognized speech segments

  • Author

    Oku, Takanori ; Fujita, Yoshikazu ; Kobayashi, Akihiro ; Imai, Tetsuro

  • Author_Institution
    Sci. & Technol. Res. Labs., NHK (Nippon Hoso Kyokai), Tokyo, Japan
  • fYear
    2012
  • fDate
    3-6 Dec. 2012
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    A “re-speak method” is an effective speech recognition method for simultaneous closed-captioning of live broadcasting programs picked up in noisy environments featuring spontaneous or emotional commentary. An acoustic model of the re-speaker needs to be constantly adapted according to the re-speaker´s daily health condition or level of fatigue. In this paper, we propose efficient speaker adaptation for the re-speak method. Conventional speaker adaptation is performed uniformly over entire speech segments. In comparison, our proposed speaker adaptation determines intensive adaptation segments corresponding to recognition error parts by comparing speech recognition results and manually error-corrected results. These results are provided in real time by the simultaneous closed-captioning process. Then, the frame-level statistics for speaker adaptation are multiplied by larger weights in proportion to the degree of the recognition errors more over the intensive adaptation segments than they are over the other segments. In an experiment on an information variety program in Japanese broadcasting, our speaker adaptation method reduced the word error rate relatively by 3.4% compared with the conventional uniform adaptation method.
  • Keywords
    speaker recognition; Japanese broadcasting; acoustic model; emotional commentary; fatigue level; frame level statistics; intensive adaptation segment; live broadcasting program; misrecognized speech segments; noisy environment; recognition error; respeak method; respeaker daily health condition; simultaneous closed captioning process; speaker adaptation; speech recognition; uniform adaptation method; word error rate; Acoustics; Adaptation models; Hidden Markov models; Real-time systems; Speech; Speech recognition; TV;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
  • Conference_Location
    Hollywood, CA
  • Print_ISBN
    978-1-4673-4863-8
  • Type

    conf

  • Filename
    6412012