DocumentCode :
633081
Title :
Highly Scalable Sequential Pattern Mining Based on MapReduce Model on the Cloud
Author :
Chun-Chieh Chen ; Chi-Yao Tseng ; Ming-Syan Chen
Author_Institution :
Grad. Inst. of Networking & Multimedia, Nat. Taiwan Univ., Taipei, Taiwan
fYear :
2013
fDate :
June 27 2013-July 2 2013
Firstpage :
310
Lastpage :
317
Abstract :
Sequential pattern mining is an essential data mining technique that has been widely applied to many real world applications. However, traditional algorithms generally suffer from the scalability problem when dealing with big data. In this paper, we aim to significantly upgrade the scale and propose Sequential PAttern Mining algorithm based on MapReduce model on the Cloud (abbreviated as SPAMC). Derived from the prior SPAM algorithm, we design an iterative MapReduce framework to efficiently generate and prune candidate patterns when constructing the lexical sequence tree. This framework not only distributes the sub-tasks of tree construction to independent mappers in parallel, but also enables the parallel processing of support counting. We conduct extensive experiments on the cloud environment of 32 virtual machines with up to 12.8 million transactional sequences. Experimental results show that SPAMC can significantly reduce mining time with big data, achieve extremely high scalability, and provide perfect load balancing on the cloud cluster.
Keywords :
cloud computing; data mining; parallel processing; resource allocation; trees (mathematics); virtual machines; SPAMC; big data; candidate pattern generation; candidate pattern pruning; cloud cluster; cloud environment; data mining technique; iterative MapReduce framework; lexical sequence tree; load balancing; parallel processing; scalability problem; sequential pattern mining algorithm; support counting; transactional sequences; virtual machines; Algorithm design and analysis; Data mining; Databases; Partitioning algorithms; Reactive power; Transforms; Unsolicited electronic mail; Big Data; Cloud Computing; MapReduce framework; Sequential Pattern Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5006-0
Type :
conf
DOI :
10.1109/BigData.Congress.2013.48
Filename :
6597152
Link To Document :
بازگشت