DocumentCode
323581
Title
Improved neural network training of inter-word context units for connected digit recognition
Author
Wei, Wei ; Van Vuuren, Srel
Author_Institution
Center for Spoken Language Understanding, Oregon Graduate Inst. of Sci. & Technol., Portland, OR, USA
Volume
1
fYear
1998
fDate
12-15 May 1998
Firstpage
497
Abstract
For connected digit recognition the relative frequency of occurrence for context-dependent phonetic units at inter-word boundaries depends on the ordering of the spoken digits and may or may not include silence or pause. If these units represent classes in a model this means that the distribution of samples between classes (the class prior) may be extremely nonuniform and that the distribution over many utterances in a training set may be very different from the rather flat distribution over any single test utterance. Using a neural network to model context-dependent phonetic units we show how to compensate for this problem. We do this by roughly flattening the class prior for infrequently occurring context units by a suitable weighting of the neural network cost function. This is based entirely on training set statistics. We show that this leads to improved classification of infrequent classes and translates into an improved overall recognition performance. We give results for telephone speech on the OGI Numbers Corpus. Flattening the prior for infrequently occurring context units resulted in a 12.37% reduction of the sentence-level error rate (from 16.17% to 14.76%) and a 9.93% reduction of the word-level error rate (from 4.23% to 3.81%) compared to not doing any compensation
Keywords
entropy; error statistics; learning (artificial intelligence); multilayer perceptrons; speech recognition; statistical analysis; OGI Numbers Corpus; connected digit recognition; context-dependent phonetic units; cross-entropy cost function; frequent classes; infrequent classes; inter-word boundaries; inter-word context units; multilayer preceptron; neural network cost function weighting; neural network training; nonuniform distribution; pause; recognition performance; samples distribution; sentence-level error rate; silence; spoken digits ordering; telephone speech; test utterance; training set; training set statistics; word-level error rate; Context modeling; Cost function; Error analysis; Frequency; Natural languages; Neural networks; Speech recognition; Statistics; Telephony; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location
Seattle, WA
ISSN
1520-6149
Print_ISBN
0-7803-4428-6
Type
conf
DOI
10.1109/ICASSP.1998.674476
Filename
674476
Link To Document