• DocumentCode
    323581
  • Title

    Improved neural network training of inter-word context units for connected digit recognition

  • Author

    Wei, Wei ; Van Vuuren, Srel

  • Author_Institution
    Center for Spoken Language Understanding, Oregon Graduate Inst. of Sci. & Technol., Portland, OR, USA
  • Volume
    1
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    497
  • Abstract
    For connected digit recognition the relative frequency of occurrence for context-dependent phonetic units at inter-word boundaries depends on the ordering of the spoken digits and may or may not include silence or pause. If these units represent classes in a model this means that the distribution of samples between classes (the class prior) may be extremely nonuniform and that the distribution over many utterances in a training set may be very different from the rather flat distribution over any single test utterance. Using a neural network to model context-dependent phonetic units we show how to compensate for this problem. We do this by roughly flattening the class prior for infrequently occurring context units by a suitable weighting of the neural network cost function. This is based entirely on training set statistics. We show that this leads to improved classification of infrequent classes and translates into an improved overall recognition performance. We give results for telephone speech on the OGI Numbers Corpus. Flattening the prior for infrequently occurring context units resulted in a 12.37% reduction of the sentence-level error rate (from 16.17% to 14.76%) and a 9.93% reduction of the word-level error rate (from 4.23% to 3.81%) compared to not doing any compensation
  • Keywords
    entropy; error statistics; learning (artificial intelligence); multilayer perceptrons; speech recognition; statistical analysis; OGI Numbers Corpus; connected digit recognition; context-dependent phonetic units; cross-entropy cost function; frequent classes; infrequent classes; inter-word boundaries; inter-word context units; multilayer preceptron; neural network cost function weighting; neural network training; nonuniform distribution; pause; recognition performance; samples distribution; sentence-level error rate; silence; spoken digits ordering; telephone speech; test utterance; training set; training set statistics; word-level error rate; Context modeling; Cost function; Error analysis; Frequency; Natural languages; Neural networks; Speech recognition; Statistics; Telephony; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.674476
  • Filename
    674476