DocumentCode :
3063159
Title :
LIPT: a lossless text transform to improve compression
Author :
Awan, Fauzia S. ; Mukherjee, Amar
Author_Institution :
Sch. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
fYear :
2001
fDate :
36982
Firstpage :
452
Lastpage :
460
Abstract :
We propose an approach to develop a dictionary based reversible lossless text transformation, called LIFT (length index preserving transform), which can be applied to a source text to improve the existing algorithm´s ability to compress. In LIFT, the length of the input word and the offset of the words in the dictionary are denoted with alphabets. Our encoding scheme makes use of the recurrence of same length words in the English language to create context in the transformed text that the entropy coders can exploit. LIFT also achieves some compression at the preprocessing stage and retains enough context and redundancy for the compression algorithms to give better results. Bzip2 with LIFT gives 5.24% improvement in average BPC over Bzip2 without LIPT, and PPMD with LIPT gives 4.46% improvement in average BPC over PPMD without LIFT, for our test corpus
Keywords :
data compression; dictionaries; encoding; redundancy; English language; LIPT; alphabets; compression; context; dictionary based reversible lossless text transformation; entropy coders; input word length; length index preserving transform; lossless text transform; redundancy; word offset; Compression algorithms; Computer science; Dictionaries; Encoding; Entropy; Explosions; Frequency; Internet; Natural languages; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology: Coding and Computing, 2001. Proceedings. International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-1062-0
Type :
conf
DOI :
10.1109/ITCC.2001.918838
Filename :
918838
Link To Document :
بازگشت