Title :
An extended clustering algorithm for statistical language models
Author_Institution :
DRA Malvern
fDate :
7/1/1996 12:00:00 AM
Abstract :
An existing clustering algorithm is extended to deal with higher order N-grams and a faster heuristic version is developed. Even though results are not comparable to back-off trigram models, they outperform back-off bigram models when many million words of training data are not available
Keywords :
grammars; natural languages; speech processing; statistical analysis; back-off bigram models; extended clustering algorithm; heuristic algorithm; higher order N-grams; statistical language models; training data; Clustering algorithms; Convergence; Probability distribution; Standards publication; Training data; Vocabulary;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on