DocumentCode :
3374629
Title :
On The Even-Out Effect of Probabilistic Sampling
Author :
Liu, Ziqian ; Chen, Changjia
Author_Institution :
Sch. of Electron. & Inf. Eng., Beijing Jiaotong Univ.
Volume :
2
fYear :
2006
fDate :
20-24 June 2006
Firstpage :
692
Lastpage :
698
Abstract :
Sampling is widely used in social investigations and network measurements since it can significantly reduce the expense of data storage and processing. However, sampling will inevitably miss or even distort the original data characteristics to some extent. This paper studies the effect of probabilistic sampling on a set of data with unbalanced size distribution. We introduce the Lorenz curve, widely used in economics, associated with the crossover split, a recently proposed quantifier, to measure the deviation of size distribution before and after sampling. By using simulation and real Internet data, we observe that as the sampling probability decreases, the size distribution becomes less unbalanced. We call this phenomenon the even-out effect. The relations among the probability sampling, the crossover split and Pareto distribution are also revealed
Keywords :
Internet; Pareto distribution; data analysis; sampling methods; Lorenz curve; Pareto distribution; data storage processing; probabilistic sampling; real Internet data; Area measurement; Data engineering; Distortion measurement; Internet; Memory; Sampling methods; Size measurement; Statistical distributions; Telecommunication traffic; Volume measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Computational Sciences, 2006. IMSCCS '06. First International Multi-Symposiums on
Conference_Location :
Hanzhou, Zhejiang
Print_ISBN :
0-7695-2581-4
Type :
conf
DOI :
10.1109/IMSCCS.2006.245
Filename :
4673787
Link To Document :
بازگشت