DocumentCode :
3134663
Title :
A Monte Carlo sampling method for drawing representative samples from large databases
Author :
Guo, Hong ; Hou, Wen-Chi ; Yan, Feng ; Zhu, Qiang
Author_Institution :
Dept. of Comput. Sci., Southern Illinois Univ., Carbondale, IL, USA
fYear :
2004
fDate :
21-23 June 2004
Firstpage :
419
Lastpage :
420
Abstract :
Sampling is important in areas like data mining, OLAP, selectivity estimation, clustering, etc. It has also become a necessity in social, economical, engineering, scientific, and statistical studies where databases are too large to handle. In this paper, a sampling method based on the Metropolis algorithm is proposed. Unlike the conventional uniform sampling methods, this method is able to select objects consistent with the underlying probability distribution. It is a simple, efficient, and powerful method suitable for all distributions. We have performed experiments to examine the qualities of the samples by comparing their statistical properties with the underlying population. The experimental results show that the samples selected by our method are bona fide representative.
Keywords :
Monte Carlo methods; data mining; sampling methods; statistical databases; statistical distributions; very large databases; Metropolis algorithm; Monte Carlo sampling method; OLAP; data clustering; data mining; economical studies; engineering studies; large databases; object selection; probability distribution; representative samples; scientific studies; selectivity estimation; social studies; statistical property comparison; statistical studies; Clustering algorithms; Data engineering; Data mining; Databases; Engineering drawings; Monte Carlo methods; Power engineering and energy; Power generation economics; Probability distribution; Sampling methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
ISSN :
1099-3371
Print_ISBN :
0-7695-2146-0
Type :
conf
DOI :
10.1109/SSDM.2004.1311239
Filename :
1311239
Link To Document :
بازگشت