Using a large vocabulary continuous speech recognizer for a constrained domain with limited training

Author

Siu, Manhung ; Jonas, Michael ; Gish, Herbert

Author_Institution

BBN Technol./GTE Internetworking, Cambridge, UK

Volume

1

fYear

1999

fDate

15-19 Mar 1999

Firstpage

105

Abstract

How to train a speech recognizer with limited amount of training data is of interest to many researcher. We describe how we use BBN´s Byblos large vocabulary continuous speech recognition (LVCSR) system for the military air-traffic-control domain where we have less than an hour of training data. We investigate three ways to deal with the limited training data: (1) re-configure the LVCSR system to use fewer parameters, (2) incorporate out-of-domain data, and, (3) use pragmatic information, such as speaker identity and controller function to improve recognition performance. We compare the LVCSR performance to that of the tied-mixture recognizer that is designed for a limited vocabulary. We show that the reconfigured LVCSR system outperforms the tied-mixture system by 10% in absolute word error rate. When enough data is available per speaker, vocal tract length normalization and supervised adaptation techniques can further improve performance by 6% even for this domain with limited training. We also show that the use of out-of-domain data and pragmatic information, if available, can each further improve performance by 1-3%

Keywords

air traffic control; learning (artificial intelligence); military communication; military computing; speech recognition; BBN; Byblos LVCSR system; absolute word error rate; constrained domain; controller function; large vocabulary continuous speech recognizer; limited training; military air-traffic-control domain; out-of-domain data; parameters; pragmatic information; recognition performance; speaker identity; supervised adaptation techniques; tied-mixture recognizer; training data; vocal tract length normalization; Air traffic control; Control systems; Engines; Error analysis; Interleaved codes; Loudspeakers; Speech recognition; Training data; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on

Conference_Location

Phoenix, AZ

ISSN

1520-6149

Print_ISBN

0-7803-5041-3

Type

conf

DOI

10.1109/ICASSP.1999.758073

Filename

758073