• DocumentCode
    1762038
  • Title

    All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis

  • Author

    Cohen, Edith

  • Author_Institution
    Dept. of Comput. Sci., Tel Aviv Univ., Tel Aviv, Israel
  • Volume
    27
  • Issue
    9
  • fYear
    2015
  • fDate
    Sept. 1 2015
  • Firstpage
    2320
  • Lastpage
    2334
  • Abstract
    Graph datasets with billions of edges, such as social and web graphs, are prevalent, and scalability is critical. All-distances sketches (ADS) [Cohen 1997], are a powerful tool for scalable approximation of statistics. The sketch is a small size sample of the distance relation of a node which emphasizes closer nodes. Sketches for all nodes are computed using a nearly linear computation and estimators are applied to sketches of nodes to estimate their properties. We provide, for the first time, a unified exposition of ADS algorithms and applications. We present the historic inverse probability (HIP) estimators which are applied to the ADS of a node to estimate a large natural class of statistics. For the important special cases of neighborhood cardinalities (the number of nodes within some query distance) and closeness centralities, HIP estimators have at most half the variance of previous estimators and we show that this is essentially optimal. Moreover, HIP obtains a polynomial improvement for more general statistics and the estimators are simple, flexible, unbiased, and elegant. For approximate distinct counting on data streams, HIP outperforms the original estimators for the HyperLogLog MinHash sketches (Flajolet et al. 2007), obtaining significantly improved estimation quality for this state-of-the-art practical algorithm.
  • Keywords
    Internet; file organisation; graph theory; social networking (online); statistical analysis; ADS algorithms; HIP estimators; HyperLogLog MinHash sketches; Web graphs; all-distances sketches; graph datasets; historic inverse probability estimators; massive graphs analysis; scalable approximation; social graphs; Approximation algorithms; Approximation methods; Estimation; Heuristic algorithms; Hip; Polynomials; Radiation detectors; All-distances sketches; HIP estimators; approximate counting; approximate distinct counting; closeness centrality; distance smoothing kernel;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2015.2411606
  • Filename
    7058441