Title :
Towards an Optimized Big Data Processing System
Author :
Ghit, Bogdan ; Iosup, Alexandru ; Epema, Dick
Abstract :
Scalable by design to very large computing systems such as grids and clouds, MapReduce is currently a major big data processing paradigm. Nevertheless, existing performance models for MapReduce only comply with specific workloads that process a small fraction of the entire data set, thus failing to assess the capabilities of the MapReduce paradigm under heavy workloads that process exponentially increasing data volumes. The goal of my PhD is to build and analyze a scalable and dynamic big data processing system, including storage (distributed file system), execution engine (MapReduce), and query language (Pig). My contributions for the first two years of PhD research are the following: 1) the design and implementation of a resource management system part of a MapReduce-based processing system for deploying and resizing MapReduce clusters over multicluster systems, 2) the design and implementation of a benchmarking tool for the MapReduce processing system, and 3) the evaluation and modeling of MapReduce using workloads with very large data sets. Furthermore, based on the first two years research, we will optimize the MapReduce system to efficiently process terabytes of data.
Keywords :
data handling; distributed processing; pattern clustering; MapReduce clusters; MapReduce paradigm; MapReduce-based processing system; Pig; distributed file system; execution engine; multicluster systems; optimized big data processing system; query language; resource management system; storage; Analytical models; Benchmark testing; Big data; Computational modeling; Data models; Dynamic scheduling; Processor scheduling;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on
Conference_Location :
Delft
Print_ISBN :
978-1-4673-6465-2
DOI :
10.1109/CCGrid.2013.53