DocumentCode
1064989
Title
An extended clustering algorithm for statistical language models
Author
Ueberla, J.P.
Author_Institution
DRA Malvern
Volume
4
Issue
4
fYear
1996
fDate
7/1/1996 12:00:00 AM
Firstpage
313
Lastpage
316
Abstract
An existing clustering algorithm is extended to deal with higher order N-grams and a faster heuristic version is developed. Even though results are not comparable to back-off trigram models, they outperform back-off bigram models when many million words of training data are not available
Keywords
grammars; natural languages; speech processing; statistical analysis; back-off bigram models; extended clustering algorithm; heuristic algorithm; higher order N-grams; statistical language models; training data; Clustering algorithms; Convergence; Probability distribution; Standards publication; Training data; Vocabulary;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/89.506936
Filename
506936
Link To Document