Title :
MapReduce Analysis for Cloud-Archived Data
Author :
Palanisamy, Balaji ; Singh, Ashutosh ; Mandagere, Nagapramod ; Alatorre, Gabriel ; Ling Liu
Abstract :
Public storage clouds have become a popular choice for archiving certain classes of enterprise data - for example, application and infrastructure logs. These logs contain sensitive information like IP addresses or user logins due to which regulatory and security requirements often require data to be encrypted before moved to the cloud. In order to leverage such data for any business value, analytics systems (e.g. Hadoop/MapReduce) first download data from these public clouds, decrypt it and then process it at the secure enterprise site. We propose VNCache: an efficient solution for MapReduceanalysis of such cloud-archived log data without requiring an apriori data transfer and loading into the local Hadoop cluster. VNcache dynamically integrates cloud-archived data into a virtual namespace at the enterprise Hadoop cluster. Through a seamless data streaming and prefetching model, Hadoop jobs can begin execution as soon as they are launched without requiring any apriori downloading. With VNcache´s accurate pre-fetching and caching, jobs often run on a local cached copy of the data block significantly improving performance. When no longer needed, data is safely evicted from the enterprise cluster reducing the total storage footprint. Uniquely, VNcache is implemented with NO changes to the Hadoop application stack.
Keywords :
cache storage; cloud computing; parallel programming; storage management; Hadoop application stack; Hadoop cluster; MapReduce analysis; VNCache; analytics systems; cloud-archived data; data streaming; encryption; enterprise data archiving; prefetching model; public storage clouds; regulatory requirements; security requirements; storage footprint reduction; virtual namespace; Cloud computing; Cryptography; Data models; Heuristic algorithms; Monitoring; Prefetching; Caching; Cloud Computing; Filesystem; MapReduce;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
Conference_Location :
Chicago, IL
DOI :
10.1109/CCGrid.2014.13