Title : 
The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression
         
        
            Author : 
Witten, Ian H. ; Bell, Timothy C.
         
        
            Author_Institution : 
Dept. of Comput. Sci., Calgary Univ., Alta., Canada
         
        
        
        
        
            fDate : 
7/1/1991 12:00:00 AM
         
        
        
        
            Abstract : 
Approaches to the zero-frequency problem in adaptive text compression are discussed. This problem relates to the estimation of the likelihood of a novel event occurring. Although several methods have been used, their suitability has been on empirical evaluation rather than a well-founded model. The authors propose the application of a Poisson process model of novelty. Its ability to predict novel tokens is evaluated, and it consistently outperforms existing methods. It is applied to a practical statistical coding scheme, where a slight modification is required to avoid divergence. The result is a well-founded zero-frequency model that explains observed differences in the performance of existing methods, and offers a small improvement in the coding efficiency of text compression over the best method previously known
         
        
            Keywords : 
data compression; encoding; probability; Poisson process model; adaptive text compression; novel events; statistical coding scheme; zero-frequency problem; Arithmetic; Computer errors; Computer science; Context modeling; Councils; Data compression; Decoding; Drives; Encoding; Probability;
         
        
        
            Journal_Title : 
Information Theory, IEEE Transactions on