DocumentCode :
3430242
Title :
A corpus for the evaluation of lossless compression algorithms
Author :
Arnold, Ross ; Bell, Tim
Author_Institution :
Dept. of Comput. Sci., Canterbury Univ., Christchurch, New Zealand
fYear :
1997
fDate :
25-27 Mar 1997
Firstpage :
201
Lastpage :
210
Abstract :
A number of authors have used the Calgary corpus of texts to provide empirical results for lossless compression algorithms. This corpus was collected in 1987, although it was not published until 1990. The advances with compression algorithms have been achieving relatively small improvements in compression, measured using the Calgary corpus. There is a concern that algorithms are being fine-tuned to this corpus, and that small improvements measured in this way may not apply to other files. Furthermore, the corpus is almost ten years old, and over this period there have been changes in the kinds of files that are compressed, particularly with the development of the Internet, and the rapid growth of high-capacity secondary storage for personal computers. We explore the issues raised above, and develop a principled technique for collecting a corpus of test data for compression methods. A corpus, called the Canterbury corpus, is developed using this technique, and we report the performance of a collection of compression methods using the new corpus
Keywords :
data compression; decoding; digital storage; encoding; Calgary corpus; Canterbury corpus; Internet; compression methods performance; compression methods testing; decoding; encoding; high capacity secondary storage; lossless compression algorithms; personal computers; Compression algorithms; Computer science; Convergence; Decoding; Encoding; Entropy; Internet; Microcomputers; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1997. DCC '97. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-8186-7761-9
Type :
conf
DOI :
10.1109/DCC.1997.582019
Filename :
582019
Link To Document :
بازگشت