DocumentCode :
1330980
Title :
Towards Unbiased BFS Sampling
Author :
Kurant, Maciej ; Markopoulou, Athina ; Thiran, Patrick
Author_Institution :
California Inst. for Telecommun. & Inf. Technol. (CalIT2), Univ. of California, Irvine, CA, USA
Volume :
29
Issue :
9
fYear :
2011
fDate :
10/1/2011 12:00:00 AM
Firstpage :
1799
Lastpage :
1809
Abstract :
Breadth First Search (BFS) is a widely used approach for sampling large graphs. However, it has been empirically observed that BFS sampling is biased toward high-degree nodes, which may strongly affect the measurement results. In this paper, we quantify and correct the degree bias of BFS. First, we consider a random graph RG(pk) with an arbitrary degree distribution pk. For this model, we calculate the node degree distribution expected to be observed by BFS as a function of the fraction f of covered nodes. We also show that, for RG(pk), all commonly used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS) have exactly the same bias. Next, we propose a practical BFS-bias correction procedure that takes as input a collected BFS sample together with the fraction f. Our correction technique is exact (i.e., leads to unbiased estimation) for RG(pk). Furthermore, it performs well when applied to a broad range of Internet topologies and to two large BFS samples of Facebook and Orkut networks.
Keywords :
graph theory; graphs; sampling methods; social networking (online); telecommunication network topology; tree searching; Facebook; Orkut networks; breadth first search; graphs; high-degree nodes; node degree distribution; random graph; unbiased BFS sampling; Equations; Facebook; Fires; Indexes; Mathematical model; Peer to peer computing; Scattering; Breadth-First-Search (BFS); bias; estimation; network topology; online social networks; sampling methods;
fLanguage :
English
Journal_Title :
Selected Areas in Communications, IEEE Journal on
Publisher :
ieee
ISSN :
0733-8716
Type :
jour
DOI :
10.1109/JSAC.2011.111005
Filename :
6027862
Link To Document :
بازگشت