Title :
Distributed FP-ARMH algorithm in Hadoop map reduce framework
Author :
Natarajan, Sriraam ; Sehar, Sountharrajan
Author_Institution :
Comput. Sci. & Eng., Bannari Amman Inst. of Technol., Sathyamangalam, India
Abstract :
Evolution of Cloud computing technology over the Internet and drastic increase in data size and intensity (Big Data) persuade Map Reduce and distributed file systems like HDFS (Hadoop Distributed File System) as the paradigm of choice for distributed data mining applications. With size and complexity of data growing every day, distributed data mining algorithms has to be designed to handle Big Data in compatible with the latest technology available on distributed computing. Earlier research activities in data mining comprises, focus on increasing the performance for single task computing algorithms rather than distributed computing which would provide more fast and scalable environment for processing large datasets. Existing algorithms in the field of distributed frequent pattern data mining includes, TPFP-tree, BTP tree, and CARM. But these algorithms suffer from unbalanced workload management among its clusters. In this paper, a novel algorithm, named Association rule mining based on Hadoop (ARMH) has been proposed to utilize the clusters effectively and mining frequent pattern from large databases. Hadoop distributed framework helps in managing the workload among the clusters. The ARMH was implemented in hadoop using Map Reduce programming paradigm.
Keywords :
Big Data; cloud computing; data mining; distributed algorithms; distributed databases; BTP tree; Big Data; CARM; HDFS; Hadoop Distributed File System; Hadoop Map reduce framework; Hadoop distributed framework; Internet; TPFP-tree; association rule mining based on Hadoop; cloud computing technology; database; distributed FP-ARMH algorithm; distributed data mining applications; distributed frequent pattern data mining; single task computing algorithms; Algorithm design and analysis; Association rules; Distributed databases; File systems; Programming; Data Mining; Distributed Computing; Hadoop; Map Reduce;
Conference_Titel :
Green Computing, Communication and Conservation of Energy (ICGCE), 2013 International Conference on
Conference_Location :
Chennai
DOI :
10.1109/ICGCE.2013.6823442