Title :
IntegrityMR: Integrity assurance framework for big data analytics and management applications
Author :
Yongzhi Wang ; Jinpeng Wei ; Srivatsa, Mudhakar ; Yucong Duan ; Wencai Du
Author_Institution :
Florida Int. Univ., Miami, FL, USA
Abstract :
Big data analytics and knowledge management is becoming a hot topic with the emerging techniques of cloud computing and big data computing model such as MapReduce. However, large-scale adoption of MapReduce applications on public clouds is hindered by the lack of trust on the participating virtual machines deployed on the public cloud. In this paper, we extend the existing hybrid cloud MapReduce architecture to multiple public clouds. Based on such architecture, we propose IntegrityMR, an integrity assurance framework for big data analytics and management applications. We explore the result integrity check techniques at two alternative software layers: the MapReduce task layer and the applications layer. We design and implement the system at both layers based on Apache Hadoop MapReduce and Pig Latin, and perform a series of experiments with popular big data analytics and management applications such as Apache Mahout and Pig on commercial public clouds (Amazon EC2 and Microsoft Azure) and local cluster environment. The experimental result of the task layer approach shows high integrity (98% with a credit threshold of 5) with non-negligible performance overhead (18% to 82% extra running time compared to original MapReduce). The experimental result of the application layer approach shows better performance compared with the task layer approach (less than 35% of extra running time compared with the original MapReduce).
Keywords :
Big Data; cloud computing; knowledge management; virtual machines; Amazon EC2; Apache Hadoop MapReduce; Apache Mahout; IntegrityMR; Microsoft Azure; Pig Latin; alternative software layers; big data analytics; big data computing model; cloud computing; commercial public clouds; hybrid cloud MapReduce architecture; integrity assurance framework; integrity check techniques; knowledge management; large-scale adoption; local cluster environment; management applications; nonnegligible performance overhead; virtual machines; Accuracy; Cloud computing; Computer architecture; Data handling; Data storage systems; Error analysis; Information management; Big Data; Cloud Computing; Integrity Assurance; MapReduce;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691780