Title :
On The Even-Out Effect of Probabilistic Sampling
Author :
Liu, Ziqian ; Chen, Changjia
Author_Institution :
Sch. of Electron. & Inf. Eng., Beijing Jiaotong Univ.
Abstract :
Sampling is widely used in social investigations and network measurements since it can significantly reduce the expense of data storage and processing. However, sampling will inevitably miss or even distort the original data characteristics to some extent. This paper studies the effect of probabilistic sampling on a set of data with unbalanced size distribution. We introduce the Lorenz curve, widely used in economics, associated with the crossover split, a recently proposed quantifier, to measure the deviation of size distribution before and after sampling. By using simulation and real Internet data, we observe that as the sampling probability decreases, the size distribution becomes less unbalanced. We call this phenomenon the even-out effect. The relations among the probability sampling, the crossover split and Pareto distribution are also revealed
Keywords :
Internet; Pareto distribution; data analysis; sampling methods; Lorenz curve; Pareto distribution; data storage processing; probabilistic sampling; real Internet data; Area measurement; Data engineering; Distortion measurement; Internet; Memory; Sampling methods; Size measurement; Statistical distributions; Telecommunication traffic; Volume measurement;
Conference_Titel :
Computer and Computational Sciences, 2006. IMSCCS '06. First International Multi-Symposiums on
Conference_Location :
Hanzhou, Zhejiang
Print_ISBN :
0-7695-2581-4
DOI :
10.1109/IMSCCS.2006.245