• DocumentCode
    685707
  • Title

    Distributed FP-ARMH algorithm in Hadoop map reduce framework

  • Author

    Natarajan, Sriraam ; Sehar, Sountharrajan

  • Author_Institution
    Comput. Sci. & Eng., Bannari Amman Inst. of Technol., Sathyamangalam, India
  • fYear
    2013
  • fDate
    12-14 Dec. 2013
  • Firstpage
    264
  • Lastpage
    270
  • Abstract
    Evolution of Cloud computing technology over the Internet and drastic increase in data size and intensity (Big Data) persuade Map Reduce and distributed file systems like HDFS (Hadoop Distributed File System) as the paradigm of choice for distributed data mining applications. With size and complexity of data growing every day, distributed data mining algorithms has to be designed to handle Big Data in compatible with the latest technology available on distributed computing. Earlier research activities in data mining comprises, focus on increasing the performance for single task computing algorithms rather than distributed computing which would provide more fast and scalable environment for processing large datasets. Existing algorithms in the field of distributed frequent pattern data mining includes, TPFP-tree, BTP tree, and CARM. But these algorithms suffer from unbalanced workload management among its clusters. In this paper, a novel algorithm, named Association rule mining based on Hadoop (ARMH) has been proposed to utilize the clusters effectively and mining frequent pattern from large databases. Hadoop distributed framework helps in managing the workload among the clusters. The ARMH was implemented in hadoop using Map Reduce programming paradigm.
  • Keywords
    Big Data; cloud computing; data mining; distributed algorithms; distributed databases; BTP tree; Big Data; CARM; HDFS; Hadoop Distributed File System; Hadoop Map reduce framework; Hadoop distributed framework; Internet; TPFP-tree; association rule mining based on Hadoop; cloud computing technology; database; distributed FP-ARMH algorithm; distributed data mining applications; distributed frequent pattern data mining; single task computing algorithms; Algorithm design and analysis; Association rules; Distributed databases; File systems; Programming; Data Mining; Distributed Computing; Hadoop; Map Reduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Green Computing, Communication and Conservation of Energy (ICGCE), 2013 International Conference on
  • Conference_Location
    Chennai
  • Type

    conf

  • DOI
    10.1109/ICGCE.2013.6823442
  • Filename
    6823442