Title :
FedLoop: Looping on Federated MapReduce
Author :
Chun-Yu Wang ; Tzu-Li Tai ; Kuan-Chieh Huang ; Tse-En Liu ; Jyh-Biau Chang ; Ce-Kuen Shieh
Author_Institution :
Dept. of Electr. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
Abstract :
The challenges of the Big Data era has motivated many organizations to turn towards distributed, large-scale processing platforms to deal with their data. Map Reduce, and its open-source implementation, Hadoop, has grown to be highly popular with its successful programming model for simplified cluster processing. As a result, many organizations deploy their own Map Reduce/Hadoop clusters to store and process large amounts of useful data. This multicluster setting is gradually growing attention. Numerous previous works have researched on how to execute Map Reduce across geographically distributed data in this setting. However, an important class of applications have not been explored for multicluster Map Reduce: iterative computation. In this paper, we propose Fed Loop, a composite system aimed at providing iterative Map Reduce computation for geographically distributed data in multicluster settings. Fed Loop is capable of transparently executing both iterative and non-iterative Map Reduce jobs on either a single cluster or multiple clusters. For our performance evaluation, two well-known iterative algorithms was executed over 4 independent clusters (16 physical nodes in total) using Fed Loop: K-Means and Page Rank. Results helped us discover how different iterative applications may differ in execution efficiency for mutlicluster environments and how iterative multicluster computation systems like Fed Loop can be optimized.
Keywords :
Big Data; data handling; distributed databases; parallel programming; pattern clustering; public domain software; Big Data; FedLoop; K-Means; MapReduce/Hadoop clusters; PageRank; cluster processing; composite system; data processing; data storage; distributed-large-scale processing platforms; execution efficiency; federated MapReduce; geographically distributed data; iterative MapReduce computation; iterative MapReduce jobs; iterative multicluster computation systems; multicluster MapReduce; noniterative MapReduce jobs; open-source implementation; performance evaluation; physical nodes; Data communication; Data models; Distributed databases; Iterative methods; Organizations; Programming; Synchronization; Cloud Computing; Iterative Computation; MapReduce; Multicluster;
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on
Conference_Location :
Beijing
DOI :
10.1109/TrustCom.2014.99