Title : 
Combination of words and word categories in varigram histories
         
        
            Author : 
Blasig, Reinhard
         
        
            Author_Institution : 
Philips Res. Lab., Aachen, Germany
         
        
        
        
        
        
            Abstract : 
This paper presents a new kind of language model: category/word varigrams. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categories may be employed to describe a given word history. This provides a much greater flexibility than previous combinations of word-based and category-based language models. Experiments on the WSJO corpus and the 1994 ARPA evaluation data indicate that the category/word varigram yields a perplexity reduction of up to 10 percent as compared to a word varigram of the same size, and improves the word error rate (WER) by 7 percent. Compared to a linear interpolation of a word-based and a category-based n-gram, the WER improvement is about 4 percent
         
        
            Keywords : 
computational linguistics; natural languages; 1994 ARPA evaluation data; WER; WSJO corpus; category-based modeling; category/word varigrams; language model; perplexity reduction; varigram histories; word categories; word error rate; word history; word sequences; word-based modeling; words; Educational technology; Error analysis; History; Interpolation; Laboratories; Natural languages; Predictive models; Probability;
         
        
        
        
            Conference_Titel : 
Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
         
        
            Conference_Location : 
Phoenix, AZ
         
        
        
            Print_ISBN : 
0-7803-5041-3
         
        
        
            DOI : 
10.1109/ICASSP.1999.758179