DocumentCode :
2465061
Title :
Searching for optimal alphabet for data compression using simulated annealing
Author :
Platos, Jan ; Kromer, Pavel
Author_Institution :
Dept. of Comput. Sci., VSB-Tech. Univ. of Ostrava, Ostrava, Czech Republic
fYear :
2012
fDate :
14-17 Oct. 2012
Firstpage :
468
Lastpage :
473
Abstract :
Data compression is very important today and it will be even more important in the future. Textual data use only limited alphabet - total number of used symbols (letters, numbers, diacritics, dots, spaces, etc.). In most languages, letters are joined into syllables and words. All three approaches are useful in text compression, but none of them is the best for any file. This paper describes a variant of algorithm for evolving alphabet from characters, 2-grams and 3-grams, which is optimal for compression of text files. We used Simulated Annealing for this evolution of the alphabet. The efficiency of the new variant will be tested on four compression algorithms. The achieved results are very promising.
Keywords :
data compression; simulated annealing; text analysis; 2-grams; 3-grams; characters; compression algorithm; data compression; evolving alphabet; optimal alphabet; simulated annealing; syllables; text file compression; textual data; words; Compression algorithms; Cooling; Data compression; Encoding; Genetic algorithms; Simulated annealing; Burrows Wheeler transformation; Huffman encoding; LZ77; LZW; alphabet optimization; data compression; simulated annealing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4673-1713-9
Electronic_ISBN :
978-1-4673-1712-2
Type :
conf
DOI :
10.1109/ICSMC.2012.6377768
Filename :
6377768
Link To Document :
بازگشت