• DocumentCode
    312343
  • Title

    Predicting the out-of-vocabulary rate and the required vocabulary size for speech processing applications

  • Author

    Muller, Johannes ; Stahl, Holger ; Lang, Manfred

  • Author_Institution
    Inst. for Human-Machine-Commun., Munich Univ. of Technol., Germany
  • Volume
    3
  • fYear
    1996
  • fDate
    3-6 Oct 1996
  • Firstpage
    1922
  • Abstract
    The paper describes an approach for predicting both the vocabulary size and the resulting out-of-vocabulary rate (OOV rate) for a hypothetical extension of an existing text corpus. By splitting the original corpus into two different sub corpora, vocabulary and OOV rate can be determined for that special constellation. Average values are calculated for all combinations of sub corpora and can be approximated by analytic function terms. These functions enable the easy prediction of vocabulary size and OOV rate. The prediction accuracy results in a relative error below 4.6%
  • Keywords
    natural languages; speech processing; vocabulary; word processing; analytic function terms; hypothetical extension; out-of-vocabulary rate; prediction accuracy; relative error; required vocabulary size; special constellation; speech processing applications; sub corpora; text corpus; vocabulary size; Accuracy; Speech processing; Testing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    0-7803-3555-4
  • Type

    conf

  • DOI
    10.1109/ICSLP.1996.608010
  • Filename
    608010