Title :
Learning to compress ergodic sources
Author :
Baxter, Jonathan ; Shawe-Taylor, John
Author_Institution :
Dept. of Comput. Sci., London Univ., UK
Abstract :
Abstract only given. We present an adaptive coding technique which is shown to achieve optimal coding in the limit as the size of the text grows, while the data structures associated with the code only grow linearly with the text. The approach relies on Huffman codes which are generated relative to the context in which a particular character occurs. The Huffman codes themselves are inferred from the data that has already been seen. A key part of the paper involves showing that the loss per character incurred by the learning process tends to zero as the size of the text tends to infinity. This involves an analysis in an on-line learning framework bounding the cumulative loss, where loss is defined to be the excess code length. By using the Bayes prediction distribution and code the expected loss per character converges to zero at the best possible rate of O(log n/n). By allowing the length of contexts to grow in response to commonly occurring subsequences, the coding is efficient precisely where it needs to be, hence achieving a high compression rate at a relatively low overhead in terms of data structure storage
Keywords :
Bayes methods; Huffman codes; adaptive codes; data structures; source coding; Bayes prediction distribution; Huffman codes; adaptive coding; compression rate; cumulative loss; data structures; ergodic sources; excess code length; loss per character; on-line learning framework; optimal coding; subsequences; text; Adaptive coding; Character generation; Computer science; Data structures; Economic forecasting; H infinity control; Mathematics;
Conference_Titel :
Data Compression Conference, 1996. DCC '96. Proceedings
Conference_Location :
Snowbird, UT
Print_ISBN :
0-8186-7358-3
DOI :
10.1109/DCC.1996.488351