Progress in the CU-HTK broadcast news transcription system

Author

Gales, Mark J F ; Kim, Do Yeong ; Woodland, Philip C. ; Chan, Ho Yin ; Mrva, David ; Sinha, Rohit ; Tranter, Sue E.

Author_Institution

Eng. Dept., Cambridge Univ.

Volume

14

Issue

5

fYear

2006

Firstpage

1513

Lastpage

1525

Abstract

Broadcast news (BN) transcription has been a challenging research area for many years. In the last couple of years, the availability of large amounts of roughly transcribed acoustic training data and advanced model training techniques has offered the opportunity to greatly reduce the error rate on this task. This paper describes the design and performance of BN transcription systems which make use of these developments. First, the effects of using lightly supervised training data and advanced acoustic modeling techniques are discussed. The design of a real-time broadcast news recognition system is then detailed using these new models. As system combination has been found to yield large gains in performance, a range of frameworks that allow multiple recognition outputs to be combined are next described. These include the use of multiple types of acoustic models and multiple segmentations. As a contrast a system developed by multiple sites allowing cross-site combination, the "SuperEARS" system, is also described. The various models and recognition configurations are evaluated using several recent BN development and evaluation test sets. These new BN transcription systems can give gains of over 25% relative to the CU-HTK 2003 BN system

Keywords

broadcasting; speech recognition; CU-HTK broadcast news transcription system; SuperEARS system; advanced acoustic model training techniques; cross-site combination; error rate reduction; lightly supervised training data; multiple recognition outputs; multiple segmentations; real-time broadcast news recognition system; recognition configurations; roughly transcribed acoustic training data; Acoustic testing; Availability; Broadcasting; Ear; Error analysis; Loudspeakers; Performance gain; Real time systems; Speech recognition; Training data; Automatic speech recognition; broadcast news (BN) transcription; diarization;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2006.878264

Filename

1677973