DocumentCode :
2190115
Title :
Exploring the Influence of Sampling on Pattern Support Distribution
Author :
Xu, Luofeng ; Marsland, Stephen ; Wang, Ruili
Author_Institution :
Sch. of Eng. & Adv. Technol., Massey Univ., Palmerston North
fYear :
2008
fDate :
8-11 July 2008
Firstpage :
66
Lastpage :
71
Abstract :
Identifying the pattern support distribution (PSD) in datasets is useful for many data mining tasks, such as market basket analysis. The support of a pattern is the frequency of its occurrence in a dataset. Calculating the distribution of these supports over an entire dataset is computationally expensive; this cost can be reduced by sampling from the dataset and computing the PSD on a relatively small sample. However, this may miscount patterns and cause significant changes in the distribution identified. Based on the fact that the PSD shows a power-law relationship, in this paper we investigate the influence of sampling on the characteristics of the power-law relationship in the pattern support distribution. We consider sampling effect on this relationship under two assumptions: uniform distribution of pattern supports, and independent identically distributed (i.i.d.) distributions. We experimentally evaluate the influence on data from four real-world transaction datasets.
Keywords :
data mining; data mining tasks; independent identically distributed distributions; pattern support distribution sampling; pattern supports uniform distribution; power-law relationship; transaction datasets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology Workshops, 2008. CIT Workshops 2008. IEEE 8th International Conference on
Conference_Location :
Sydney, QLD
Print_ISBN :
978-0-7695-3242-4
Electronic_ISBN :
978-0-7695-3239-1
Type :
conf
DOI :
10.1109/CIT.2008.Workshops.91
Filename :
4568481
Link To Document :
بازگشت