DocumentCode :
140922
Title :
SILVERBACK: Scalable association mining for temporal data in columnar probabilistic databases
Author :
Yusheng Xie ; Palsetia, Diana ; Trajcevski, G. ; Agrawal, Ankit ; Choudhary, Alok
Author_Institution :
Voxsup Inc., Chicago, IL, USA
fYear :
2014
fDate :
March 31 2014-April 4 2014
Firstpage :
1072
Lastpage :
1083
Abstract :
We address the problem of large scale probabilistic association rule mining and consider the trade-offs between accuracy of the mining results and quest of scalability on modest hardware infrastructure. We demonstrate how extensions and adaptations of research findings can be integrated in an industrial application, and we present the commercially deployed SILVERBACK framework, developed at Voxsup Inc. SILVERBACK tackles the storage efficiency problem by proposing a probabilistic columnar infrastructure and using Bloom filters and reservoir sampling techniques. In addition, a probabilistic pruning technique has been introduced based on Apriori for mining frequent item-sets. The proposed target-driven technique yields a significant reduction on the size of the frequent item-set candidates. We present extensive experimental evaluations which demonstrate the benefits of a context-aware incorporation of infrastructure limitations into corresponding research techniques. The experiments indicate that, when compared to the traditional Hadoop-based approach for improving scalability by adding more hosts, SILVERBACK - which has been commercially deployed and developed at Voxsup Inc. since May 2011 - has much better run-time performance with negligible accuracy sacrifices.
Keywords :
data mining; data structures; probability; sampling methods; storage management; Apriori; Bloom filters; Hadoop-based approach; SILVERBACK framework; Voxsup Inc; columnar probabilistic databases; context-aware infrastructure limitation incorporation; frequent item-set mining; large scale probabilistic association rule mining; modest hardware infrastructure; probabilistic pruning technique; reservoir sampling techniques; scalable association mining; storage efficiency problem; temporal data; Accuracy; Databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
Type :
conf
DOI :
10.1109/ICDE.2014.6816724
Filename :
6816724
Link To Document :
بازگشت