Speaker adaptation intensively weighted on mis-recognized speech segments

Author

Oku, Takanori ; Fujita, Yoshikazu ; Kobayashi, Akihiro ; Imai, Tetsuro

Author_Institution

Sci. & Technol. Res. Labs., NHK (Nippon Hoso Kyokai), Tokyo, Japan

fYear

2012

fDate

3-6 Dec. 2012

Firstpage

1

Lastpage

4

Abstract

A “re-speak method” is an effective speech recognition method for simultaneous closed-captioning of live broadcasting programs picked up in noisy environments featuring spontaneous or emotional commentary. An acoustic model of the re-speaker needs to be constantly adapted according to the re-speaker´s daily health condition or level of fatigue. In this paper, we propose efficient speaker adaptation for the re-speak method. Conventional speaker adaptation is performed uniformly over entire speech segments. In comparison, our proposed speaker adaptation determines intensive adaptation segments corresponding to recognition error parts by comparing speech recognition results and manually error-corrected results. These results are provided in real time by the simultaneous closed-captioning process. Then, the frame-level statistics for speaker adaptation are multiplied by larger weights in proportion to the degree of the recognition errors more over the intensive adaptation segments than they are over the other segments. In an experiment on an information variety program in Japanese broadcasting, our speaker adaptation method reduced the word error rate relatively by 3.4% compared with the conventional uniform adaptation method.

Keywords

speaker recognition; Japanese broadcasting; acoustic model; emotional commentary; fatigue level; frame level statistics; intensive adaptation segment; live broadcasting program; misrecognized speech segments; noisy environment; recognition error; respeak method; respeaker daily health condition; simultaneous closed captioning process; speaker adaptation; speech recognition; uniform adaptation method; word error rate; Acoustics; Adaptation models; Hidden Markov models; Real-time systems; Speech; Speech recognition; TV;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific

Conference_Location

Hollywood, CA

Print_ISBN

978-1-4673-4863-8

Type

conf

Filename

6412012