Title :
Complex statistical analysis of big data: Implementation and application of Apriori and FP-Growth algorithm based on MapReduce
Author :
Zhuobo Rong ; Dawen Xia ; Zili Zhang
Author_Institution :
Sch. of Comput. & Inf. Sci., Southwest Univ., Chongqing, China
Abstract :
In the single machine environment, the problems of Apriori and FP-Growth algorithm in large-scale data association rules mining are high memory consumption, low computing performance, poor scalability and reliability and so on. Therefore, we put forward a new implementation method which is based on MapReduce parallel environment for mining frequent itemsets to generate association rules and is verified by using different sizes of real datasets with different nodes in the cluster, meanwhile, selecting “speedup, scalability and reliability” as an indicator. The results show that our method is feasible and valid and is able to improve the overall performance and efficiency of Apriori and FP-Growth algorithm to meet the needs of large-scale data association rules mining.
Keywords :
data analysis; data mining; parallel processing; statistical analysis; FP-Growth Algorithm; MapReduce parallel environment; apriori algorithm; complex statistical big data analysis; computing performance; frequent itemsets mining; large-scale data association rules mining; memory consumption; single machine environment; Educational institutions; Indexes; apriori; association analysis; big data statistics; fP-growth; mapreduce;
Conference_Titel :
Software Engineering and Service Science (ICSESS), 2013 4th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-4997-0
DOI :
10.1109/ICSESS.2013.6615467