Title :
Sampling from Large Graphs with a Reservoir
Author_Institution :
Fac. of Inf. Sci., Kyushu Sangyo Univ., Fukuoka, Japan
Abstract :
Sampling is a process of choosing a suitable representative subset from a population and uniformity is a basic requirement of representative ness. A sampling process produces a uniform random sample when all possible samples of the same size have the same probability. Uniform sampling from large graphs is a challenging task due to the lack of a sample frame, i.e., a list of all elements to be sampled from. As a result, graph sampling via crawling, in particular, random walk based graph sampling methods (or Markov chain samplers) become the feasible approach to this task. The drawback of this approach however, is that it may produces biased and/or correlated samples. The goal of this paper is to provide a framework for obtaining an asymptotically uniform independence sample of large graphs by crawling. Unlike the existing graph sampling techniques, we propose a reservoir-based graph sampling algorithm, called RWW (Random Walk with a Weighted Reservoir) to cope with the bias and correlation issues. First, a random walk based crawler is used to explore the underlying graph. Then we employ a weighted reservoir to maintain sample nodes to eliminate the bias as well as the correlation caused by random walk. We perform simulation-driven experiments to evaluate the proposed algorithms and verified that it can achieve uniform sampling efficiently.
Keywords :
Markov processes; graph theory; probability; random processes; sampling methods; social networking (online); Markov chain sampler; RWW; crawling; graph sampling technique; large graph; probability; random walk based crawler; random walk based graph sampling method; random walk with a weighted reservoir; reservoir-based graph sampling algorithm; sampling process; uniform sampling; Algorithm design and analysis; Correlation; Maintenance engineering; Peer-to-peer computing; Reservoirs; Sociology; Markov chian; graph crawling; graph mining; graph sampling; reservoir sampling; social networks;
Conference_Titel :
Network-Based Information Systems (NBiS), 2014 17th International Conference on
Conference_Location :
Salerno
Print_ISBN :
978-1-4799-4226-8
DOI :
10.1109/NBiS.2014.25