DocumentCode
246179
Title
Sampling from Large Graphs with a Reservoir
Author
Kai Cheng
Author_Institution
Fac. of Inf. Sci., Kyushu Sangyo Univ., Fukuoka, Japan
fYear
2014
fDate
10-12 Sept. 2014
Firstpage
347
Lastpage
354
Abstract
Sampling is a process of choosing a suitable representative subset from a population and uniformity is a basic requirement of representative ness. A sampling process produces a uniform random sample when all possible samples of the same size have the same probability. Uniform sampling from large graphs is a challenging task due to the lack of a sample frame, i.e., a list of all elements to be sampled from. As a result, graph sampling via crawling, in particular, random walk based graph sampling methods (or Markov chain samplers) become the feasible approach to this task. The drawback of this approach however, is that it may produces biased and/or correlated samples. The goal of this paper is to provide a framework for obtaining an asymptotically uniform independence sample of large graphs by crawling. Unlike the existing graph sampling techniques, we propose a reservoir-based graph sampling algorithm, called RWW (Random Walk with a Weighted Reservoir) to cope with the bias and correlation issues. First, a random walk based crawler is used to explore the underlying graph. Then we employ a weighted reservoir to maintain sample nodes to eliminate the bias as well as the correlation caused by random walk. We perform simulation-driven experiments to evaluate the proposed algorithms and verified that it can achieve uniform sampling efficiently.
Keywords
Markov processes; graph theory; probability; random processes; sampling methods; social networking (online); Markov chain sampler; RWW; crawling; graph sampling technique; large graph; probability; random walk based crawler; random walk based graph sampling method; random walk with a weighted reservoir; reservoir-based graph sampling algorithm; sampling process; uniform sampling; Algorithm design and analysis; Correlation; Maintenance engineering; Peer-to-peer computing; Reservoirs; Sociology; Markov chian; graph crawling; graph mining; graph sampling; reservoir sampling; social networks;
fLanguage
English
Publisher
ieee
Conference_Titel
Network-Based Information Systems (NBiS), 2014 17th International Conference on
Conference_Location
Salerno
Print_ISBN
978-1-4799-4226-8
Type
conf
DOI
10.1109/NBiS.2014.25
Filename
7023975
Link To Document