Title :
Incremental FP-Growth mining strategy for dynamic threshold value and database based on MapReduce
Author :
Xiaoting Wei ; Yunlong Ma ; Feng Zhang ; Min Liu ; Weiming Shen
Author_Institution :
Sch. of Electron. & Inf. Eng., Tongji Univ., Shanghai, China
Abstract :
With the coming of the Big Data era, data mining has been confronted with new opportunities and challenges. Some limitations are exposed when traditional association rule mining algorithms are used to deal with large-scale data. In the Apriori algorithm, scanning the external storage repeatedly leads to high I/O load and brings about low performance. As for FP-Growth algorithm, the effectiveness is limited by internal memory size because mining process is on the base of large tree-form data structure. What´s more, although remarkable achievements have been scored, there are still problems in dynamic scenarios. The paper presents a parallelized incremental FP-Growth mining strategy based on MapReduce, which aims to process large-scale data. The proposed incremental algorithm realizes effective data mining when threshold value and original database change at the same time. This novel algorithm is implemented on Hadoop and shows great advantages according to the experimental results.
Keywords :
Big Data; data mining; parallel processing; tree data structures; FP-growth algorithm; I/O load; MapReduce; apriori algorithm; big data; data mining; dynamic threshold value; external storage scanning; internal memory size; large tree-form data structure; large-scale data; parallelized incremental FP-Growth mining strategy; rule mining algorithms; Algorithm design and analysis; Association rules; Classification algorithms; Heuristic algorithms; Itemsets; Association rule mining; FP-Growth algorithm; MapReduce; dynamic database; threshold value;
Conference_Titel :
Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on
Conference_Location :
Hsinchu
DOI :
10.1109/CSCWD.2014.6846854