DocumentCode :
1439213
Title :
Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments
Author :
Choi, Yongwook ; Szpankowski, Wojciech
Author_Institution :
J. Craig Venter Inst., Rockville, MD, USA
Volume :
58
Issue :
2
fYear :
2012
Firstpage :
620
Lastpage :
638
Abstract :
Information theory traditionally deals with “conventional data,” be it textual data, image, or video data. However, databases of various sorts have come into existence in recent years for storing “unconventional data” including biological data, social data, web data, topographical maps, and medical data. In compressing such data, one must consider two types of information: the information conveyed by the structure itself, and the information conveyed by the data labels implanted in the structure. In this paper, we attempt to address the former problem by studying information of graphical structures (i.e., unlabeled graphs). As the first step, we consider the Erdös-Rényi graphs G(n,p) over n vertices in which edges are added independently and randomly with probability p. We prove that the structural entropy of G(n,p) is (n;2)h(p)-logn!+o(1)=(n;2)h(p)-nlog+O(n), where h(p)=-plogp-(1-p)log(1-p) is the entropy rate of a conventional memoryless binary source. Then, we propose a two-stage compression algorithm that asymptotically achieves the structural entropy up to the nlog term (i.e., the first two leading terms) of the structural entropy. Our algorithm runs either in time O(n2) in the worst case for any graph or in time O(n+e) on average for graphs generated by G(n,p), where e is the average number of edges. To the best of our knowledge, this is the first provable (asymptotically) optimal graph compressor for Erdös-Rényi graph models. We use combinatorial and analytic techniques such as generating functions, Mellin transform, and poissonization to establish these findings. Our experiments confirm the theoretical results and also show the usefulness of our algorithm for some real-world graphs such as the Internet, biological networks, and social networks.
Keywords :
computational complexity; data compression; entropy; graph theory; probability; transforms; Erdös-Rényi graphs; Mellin transform; World Wide Web data; analytic techniques; asymptotically optimal graph compressor; biological data; combinatorial techniques; conventional memoryless binary source; data compression; data labels; entropy rate; fundamental limits; generating functions; graphical structures compression; image data; information theory; medical data; poissonization; probability; real-world graphs; social data; structural entropy; textual data; topographical maps; two-stage compression algorithm; unconventional data storage; unlabeled graphs; video data; Biological information theory; Compression algorithms; Data structures; Encoding; Entropy; Orbits; Partitioning algorithms; Analytic information theory; Erdös–Rényi graphs; Mellin transform; arithmetic encoder; digital trees; graph automorphism; poissonization; structural entropy; unlabeled graphs;
fLanguage :
English
Journal_Title :
Information Theory, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9448
Type :
jour
DOI :
10.1109/TIT.2011.2173710
Filename :
6145505
Link To Document :
بازگشت