مرکز منطقه ای اطلاع رساني علوم و فناوري - Universal Compression of Memoryless Sources over Large Alphabets via Independent Component Analysis

DocumentCode :

3036114

Title :

Universal Compression of Memoryless Sources over Large Alphabets via Independent Component Analysis

Author :

Painsky, Amichai ; Rosset, Saharon ; Feder, Meir

Author_Institution :

Stat. Dept., Tel Aviv Univ., Tel Aviv, Israel

fYear :

2015

fDate :

7-9 April 2015

Firstpage :

213

Lastpage :

222

Abstract :

Many applications of universal compression involve sources such as text, speech and image, whose alphabet is extremely large. In this work we propose a conceptual framework in which a large alphabet memory less source is decomposed into multiple ´as independent as possible´ sources whose alphabet is much smaller. This way we slightly increase the average codeword length as the compressed symbols are no longer perfectly independent, but at the same time significantly reduce the overhead redundancy resulted by the large alphabet of the observed source. Our proposed algorithm, based on a generalization of the Binary Independent Component Analysis, shows to efficiently find the ideal trade-off so that the overall compression size is minimal. We demonstrate our framework on memory less draws from a variety of natural languages and show that the redundancy we achieve is remarkably smaller than most commonly used methods.

Keywords :

formal languages; independent component analysis; natural language processing; source coding; average codeword length; binary independent component analysis; compressed symbols; large alphabet memoryless source universal compression; natural languages; overhead redundancy; Complexity theory; Dictionaries; Encoding; Entropy; Image coding; Independent component analysis; Redundancy; ICA; Large Alphabet Souce coding; Universal Compression;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Compression Conference (DCC), 2015

Conference_Location :

Snowbird, UT

ISSN :

1068-0314

Type :

conf

DOI :

10.1109/DCC.2015.48

Filename :

7149278

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3036114