Title :
Compression of unicode files
Author :
Fenwick, Peter ; Brierley, Simon
Author_Institution :
Dept. of Comput. Sci., Auckland Univ., New Zealand
fDate :
30 Mar-1 Apr 1998
Abstract :
Summary form only given. The increasing importance of unicode for text files, for example with Java and in some modern operating systems, implies a possible increase of data storage space and data transmission time, with a corresponding need for data compression. However data compressors designed for traditional 8-bit byte data are not necessarily well matched to the peculiarities of unicode data. Different “standard” text compression methods behave in different ways, as compared with the performance already known from ASCII or other 8-bit data. A small corpus of unicode files has been compressed on several widely-available text compressors of the various types, confirming that unicode files have different compression characteristics from those known for 8-bit data. Tests with a simple LZ-77 compressor designed to operate in both 8-bit and 16-bit modes indicate that it may be useful to design compressors specifically for unicode data
Keywords :
data compression; document image processing; image coding; word processing; 16 bit; 8 bit; ASCII; Java; LZ-77 compressor; compression characteristics; data compression; data storage space; data transmission time; operating systems; performance; text compression methods; text compressors; text files; unicode data; unicode files compression; Compressors; Computer science; Data communication; Data compression; Degradation; Dictionaries; Encoding; Java; Memory; Operating systems;
Conference_Titel :
Data Compression Conference, 1998. DCC '98. Proceedings
Conference_Location :
Snowbird, UT
Print_ISBN :
0-8186-8406-2
DOI :
10.1109/DCC.1998.672274