Discriminative model combination

Author

Beyerlein, Peter

Author_Institution

Philips Res. Lab., Aachen, Germany

Volume

1

fYear

1998

fDate

12-15 May 1998

Firstpage

481

Abstract

Discriminative model combination is a new approach in the field of automatic speech recognition, which aims at an optimal integration of all given (acoustic and language) models into one log-linear posterior probability distribution. As opposed to the maximum entropy approach, the coefficients of the log-linear combination are optimized on training samples using discriminative methods to obtain an optimal classifier. Three methods are discussed to find coefficients which minimize the empirical word error rate on given training data: the well-known generalised probabilistic descent (GPD) based minimum error rate training leading to an iterative optimization scheme; a minimization of the mean distance between the discriminant function of the log-linear posterior probability distribution and an “ideal” discriminant function; and a minimization of a smoothed error count measure, where the smoothing function is a parabola. The latter two methods lead to closed-form solutions for the coefficients of the model combination. Experimental results show that the accuracy of a large vocabulary continuous speech recognition system can be increased by a discriminative model combination, due to a better exploitation of the given acoustic and language models

Keywords

acoustic signal processing; error statistics; iterative methods; natural languages; optimisation; probability; smoothing methods; speech recognition; GPD-based minimum error rate training; acoustic models; automatic speech recognition; closed-form solution; discriminant function; discriminative model combination; experimental results; generalised probabilistic decent; generalised probabilistic descent; iterative optimization; language models; log-linear combination coefficients; log-linear posterior probability distribution; maximum entropy; mean distance minimization; optimal classifier; parabola; smoothed error count measure; smoothing function; training data; training samples; word error rate; Automatic speech recognition; Entropy; Error analysis; Iterative methods; Minimization methods; Natural languages; Optimization methods; Probability distribution; Smoothing methods; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on

Conference_Location

Seattle, WA

ISSN

1520-6149

Print_ISBN

0-7803-4428-6

Type

conf

DOI

10.1109/ICASSP.1998.674472

Filename

674472