Author_Institution :
Dipt. di Inf., Univ. degli Studi di Milano, Milan, Italy
Abstract :
Given a social network, which of its nodes are more central? This question has been asked many times in sociology, psychology and computer science, and a whole plethora of centrality measures (a.k.a. centrality indices, or rankings) were proposed to account for the importance of the nodes of a network. In this paper, we approach the problem of computing geometric centralities, such as closeness [1] and harmonic centrality [2], on very large graphs; traditionally this task requires an all-pairs shortest-path computation in the exact case, or a number of breadth-first traversals for approximated computations, but these techniques yield very weak statistical guarantees on highly disconnected graphs. We rather assume that the graph is accessed in a semi-streaming fashion, that is, that adjacency lists are scanned almost sequentially, and that a very small amount of memory (in the order of a dozen bytes) per node is available in core memory. We leverage the newly discovered algorithms based on HyperLogLog counters [3], making it possible to approximate a number of geometric centralities at a very high speed and with high accuracy. While the application of similar algorithms for the approximation of closeness was attempted in the MapReduce [4] framework [5], our exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using "just" 2TiB of RAM. Moreover, the computations we describe are inherently parallelizable, and scale linearly with the number of available cores.
Keywords :
distributed processing; graph theory; mathematics computing; HyperBall; HyperLogLog counters; MapReduce framework; RAM; adjacency lists; breadth-first traversals; centrality measures; computer science; disconnected graphs; geometric centralities; harmonic centrality; in-core computation; psychology; semistreaming fashion; shortest-path computation; social network; sociology; statistical guarantees; Approximation algorithms; Approximation methods; Harmonic analysis; Memory management; Radiation detectors; Registers; Standards; Centrality; Distance distribution; Graph algorithms; Probabilistic counters;