Title :
The Parallel Improved Apriori Algorithm Research Based on Spark
Author :
Shaosong Yang; Guoyan Xu; Zhijian Wang; Fachao Zhou
Author_Institution :
Coll. of Comput. &
Abstract :
Apriori algorithm is one of the classical algorithm in the association rule mining field, this paper analyzes the shortcomings of classical Apriori algorithm, then improves it by constructing a new data structure and optimizing the prepruning step. Based on the improved Apriori algorithm and combined with the Spark support for fine-grained data processing, we elaborate the idea of the improved Apriori algorithm´s parallel processing, and propose the SIAP algorithms. We experimented by comparing with the Apriori algorithms based on Hadoop and the Apriori algorithms based on Spark, and the results show that the SIAP algorithm has a higher efficiency.
Keywords :
"Itemsets","Algorithm design and analysis","Clustering algorithms","Sparks","Data mining","Heuristic algorithms"
Conference_Titel :
Frontier of Computer Science and Technology (FCST), 2015 Ninth International Conference on
DOI :
10.1109/FCST.2015.28