Title :
Finding interesting associations without support pruning
Author :
Cohen, Edith ; Datar, Mayur ; Fujiwara, Shingo ; Gionis, Aristides ; Indyk, Piotr ; Motwani, Ravi ; Ullman, Jeffrey D. ; Yang, Chao
Author_Institution :
AT&T Shannon Lab., Florham Park, NJ, USA
Abstract :
Association rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar Web documents, clustering and collaborative filtering, where the rules of interest have comparatively few instances in the data. In these cases, we must look for highly correlated items, or possibly even causal relationships between infrequent items. We develop a family of algorithms for solving this problem, employing a combination of random sampling and hashing techniques. We provide an analysis of the algorithms developed and conduct experiments on real and synthetic data to obtain a comparative performance analysis
Keywords :
correlation methods; data mining; file organisation; importance sampling; pattern clustering; software performance evaluation; World Wide Web documents; a-priori algorithm; algorithm analysis; association rule mining; causal relationships; clustering; collaborative filtering; data instances; data mining; frequently-occurring relationships; hashing techniques; high support conditions; highly correlated items; infrequent items; interesting associations discovery; performance analysis; random sampling; similar document identification; support pruning; Application software; Association rules; Computer science; Data mining; Filters; Hip; Ores; Sampling methods;
Conference_Titel :
Data Engineering, 2000. Proceedings. 16th International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
0-7695-0506-6
DOI :
10.1109/ICDE.2000.839448