Title :
Context coding of parse trees
Author_Institution :
Dept. of Comput. Sci., Helsinki Univ., Finland
Abstract :
Summary form only given. General-purpose text compression works normally at the lexical level assuming that symbols to be encoded are independent or they depend on preceding symbols within a fixed distance. Traditionally such syntactical models have been focused on compression of source programs, but also other areas are feasible. The compression of a parse tree is an important and challenging part of syntactical modeling. A parse tree can be represented by a left parse which is a sequence of productions applied in preorder. A left parse can be encoded efficiently with arithmetic coding using counts of production alternatives of each nonterminal. We introduce a more refined method which reduces the size of a compressed tree. A blending scheme, PPM (prediction by partial matching) produces very good compression on text files. In PPM, adaptive models of several context lengths are maintained and they are blended during processing. The k preceding symbols of the symbol to be encoded form the context of order k. We apply the PPM technique to a left parse so that we use contexts of nodes instead of contexts consisting of preceding symbols in the sequence. We tested our approach with parse trees of Pascal programs. Our method gave on the average 20 percent better compression than the standard method based on counts of production alternatives of nonterminals. In our model, an item of the context is a pair (production, branch). The form of the item seems to be crucial. We tested three other variations for an item: production, nonterminal, and (nonterminal, branch), but all these three approaches produced clearly worse results
Keywords :
adaptive signal processing; arithmetic codes; context-sensitive grammars; data compression; prediction theory; trees (mathematics); word processing; Pascal programs; adaptive models; arithmetic coding; blending scheme; branch; context coding; context lengths; nonterminal; parse tree compression; parse trees; prediction by partial matching; production; source programs; syntactical modeling; syntactical models; text compression; text files; Arithmetic; Computer science; Context modeling; Production; Testing;
Conference_Titel :
Data Compression Conference, 1995. DCC '95. Proceedings
Conference_Location :
Snowbird, UT
Print_ISBN :
0-8186-7012-6
DOI :
10.1109/DCC.1995.515552