Title :
Off-line compression by greedy textual substitution
Author :
Apostolico, Alberto ; Lonardi, Stefano
Author_Institution :
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
Abstract :
Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; the process is then repeated on the contracted text string until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments.
Keywords :
computational complexity; data compression; grammars; inference mechanisms; text analysis; computational issues; contraction; greedy textual substitution; off-line compression; structural inference; substring; textstring; Biological information theory; Biology computing; CD-ROMs; Computational efficiency; Councils; Data compression; Dictionaries; Encoding; Production; Statistics;
Journal_Title :
Proceedings of the IEEE