Title :
Towards Unbiased BFS Sampling
Author :
Kurant, Maciej ; Markopoulou, Athina ; Thiran, Patrick
Author_Institution :
California Inst. for Telecommun. & Inf. Technol. (CalIT2), Univ. of California, Irvine, CA, USA
fDate :
10/1/2011 12:00:00 AM
Abstract :
Breadth First Search (BFS) is a widely used approach for sampling large graphs. However, it has been empirically observed that BFS sampling is biased toward high-degree nodes, which may strongly affect the measurement results. In this paper, we quantify and correct the degree bias of BFS. First, we consider a random graph RG(pk) with an arbitrary degree distribution pk. For this model, we calculate the node degree distribution expected to be observed by BFS as a function of the fraction f of covered nodes. We also show that, for RG(pk), all commonly used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS) have exactly the same bias. Next, we propose a practical BFS-bias correction procedure that takes as input a collected BFS sample together with the fraction f. Our correction technique is exact (i.e., leads to unbiased estimation) for RG(pk). Furthermore, it performs well when applied to a broad range of Internet topologies and to two large BFS samples of Facebook and Orkut networks.
Keywords :
graph theory; graphs; sampling methods; social networking (online); telecommunication network topology; tree searching; Facebook; Orkut networks; breadth first search; graphs; high-degree nodes; node degree distribution; random graph; unbiased BFS sampling; Equations; Facebook; Fires; Indexes; Mathematical model; Peer to peer computing; Scattering; Breadth-First-Search (BFS); bias; estimation; network topology; online social networks; sampling methods;
Journal_Title :
Selected Areas in Communications, IEEE Journal on
DOI :
10.1109/JSAC.2011.111005