Title :
MapReduce Framework Optimization via Performance Modeling
Author_Institution :
Inst. of Software, Beijing, China
Abstract :
MapReduce framework has become the state-of-the-art paradigm for large-scale data processing. In our ongoing work, we attempt to solve the three interrelated problems: how to build an accurate MapReduce performance model, how to use it to automatically detect and optimize slow-running MapReduce jobs, and how to use it to help scheduler arrange job execution sequence. Currently, we mainly study the job execution time model and its training method. We also present several policies to optimize the job configuration and scheduler.
Keywords :
data analysis; distributed processing; public domain software; scheduling; Apache Hadoop; MapReduce framework optimization; MapReduce performance model; job configuration; job execution sequence; job execution time model; job scheduler; large-scale data analysis; large-scale data processing; open-source implementation; performance modeling; training method; Degradation; Encyclopedias; Optimization; Predictive models; Resource management; Training;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0974-5
DOI :
10.1109/IPDPSW.2012.313