Title : 
Bounds of compression of unknown alphabets
         
        
            Author : 
Orlitsky, A. ; Santhanam, N.P. ; Zhang, J.
         
        
            Author_Institution : 
Dept. of Electr. Eng., California Univ., San Diego, CA, USA
         
        
        
            fDate : 
29 June-4 July 2003
         
        
        
            Abstract : 
It is known that the redundancy of universally compressing i.i.d. strings increases to infinity as the alphabet size grows. It is also apparent that any string can be described by separately conveying its symbols, and their pattern-the order in which they appear. Concentrating on the latter, we show that the patterns of iid strings drawn from any, possibly infinite or even unknown, alphabet, can be universally compressed with diminishing worst-case redundancy, both in block, and sequentially.
         
        
            Keywords : 
block codes; data compression; redundancy; sequential codes; alphabet size; block code compression; sequential code compression; string compression; string order; string pattern; string symbol; symbol redundancy; unknown alphabet compression; Costs; Dictionaries; H infinity control; Personal communication networks; Upper bound;
         
        
        
        
            Conference_Titel : 
Information Theory, 2003. Proceedings. IEEE International Symposium on
         
        
            Print_ISBN : 
0-7803-7728-1
         
        
        
            DOI : 
10.1109/ISIT.2003.1228125