Title :
Improved neural network training of inter-word context units for connected digit recognition
Author :
Wei, Wei ; Van Vuuren, Srel
Author_Institution :
Center for Spoken Language Understanding, Oregon Graduate Inst. of Sci. & Technol., Portland, OR, USA
Abstract :
For connected digit recognition the relative frequency of occurrence for context-dependent phonetic units at inter-word boundaries depends on the ordering of the spoken digits and may or may not include silence or pause. If these units represent classes in a model this means that the distribution of samples between classes (the class prior) may be extremely nonuniform and that the distribution over many utterances in a training set may be very different from the rather flat distribution over any single test utterance. Using a neural network to model context-dependent phonetic units we show how to compensate for this problem. We do this by roughly flattening the class prior for infrequently occurring context units by a suitable weighting of the neural network cost function. This is based entirely on training set statistics. We show that this leads to improved classification of infrequent classes and translates into an improved overall recognition performance. We give results for telephone speech on the OGI Numbers Corpus. Flattening the prior for infrequently occurring context units resulted in a 12.37% reduction of the sentence-level error rate (from 16.17% to 14.76%) and a 9.93% reduction of the word-level error rate (from 4.23% to 3.81%) compared to not doing any compensation
Keywords :
entropy; error statistics; learning (artificial intelligence); multilayer perceptrons; speech recognition; statistical analysis; OGI Numbers Corpus; connected digit recognition; context-dependent phonetic units; cross-entropy cost function; frequent classes; infrequent classes; inter-word boundaries; inter-word context units; multilayer preceptron; neural network cost function weighting; neural network training; nonuniform distribution; pause; recognition performance; samples distribution; sentence-level error rate; silence; spoken digits ordering; telephone speech; test utterance; training set; training set statistics; word-level error rate; Context modeling; Cost function; Error analysis; Frequency; Natural languages; Neural networks; Speech recognition; Statistics; Telephony; Testing;
Conference_Titel :
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7803-4428-6
DOI :
10.1109/ICASSP.1998.674476