• DocumentCode
    1064989
  • Title

    An extended clustering algorithm for statistical language models

  • Author

    Ueberla, J.P.

  • Author_Institution
    DRA Malvern
  • Volume
    4
  • Issue
    4
  • fYear
    1996
  • fDate
    7/1/1996 12:00:00 AM
  • Firstpage
    313
  • Lastpage
    316
  • Abstract
    An existing clustering algorithm is extended to deal with higher order N-grams and a faster heuristic version is developed. Even though results are not comparable to back-off trigram models, they outperform back-off bigram models when many million words of training data are not available
  • Keywords
    grammars; natural languages; speech processing; statistical analysis; back-off bigram models; extended clustering algorithm; heuristic algorithm; higher order N-grams; statistical language models; training data; Clustering algorithms; Convergence; Probability distribution; Standards publication; Training data; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.506936
  • Filename
    506936