DocumentCode
154096
Title
Reducing MapReduce Abstraction Costs for Text-centric Applications
Author
Chun-Hung Hsiao ; Cafarella, Michael ; Narayanasamy, Satish
Author_Institution
Univ. of Michigan, Ann Arbor, MI, USA
fYear
2014
fDate
9-12 Sept. 2014
Firstpage
40
Lastpage
49
Abstract
The MapReduce framework has become widely popular for programming large clusters, even though MapReduce jobs may use underlying resources relatively inefficiently. There has been substantial research in improving MapReduce performance for applications that were inspired by relational database queries, but almost none for text-centric applications, including inverted index construction, processing large log files, and so on. We identify two simple optimizations to improve MapReduce performance on text-centric tasks: frequency-buffering and spill-matcher. The former approach improves buffer efficiency for intermediate map outputs by identifying frequent keys, effectively shrinking the amount of work that the shuffle phase must perform. Spill-matcher is a runtime controller that improves parallelization of MapReduce framework background tasks. Together, our two optimizations improve the performance of text-centric applications by up to 39.1%. We demonstrate gains on both a small local cluster and Amazon´s EC2 cloud service. Unlike other MapReduce optimizations, these techniques require no user code changes, and only small changes to the MapReduce system.
Keywords
cloud computing; optimisation; parallel programming; relational databases; text analysis; Amazon´s EC2 cloud service; MapReduce abstraction cost reduction; MapReduce framework background task parallelization; MapReduce performance improvement; buffer efficiency; frequency-buffering; frequent keys; runtime controller; shuffle phase; spill-matcher; text-centric applications; text-centric tasks; Indexes; Instruction sets; Optimization; Parallel processing; Runtime; Sorting; Standards;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location
Minneapolis MN
ISSN
0190-3918
Type
conf
DOI
10.1109/ICPP.2014.13
Filename
6957213
Link To Document