Title :
Introducing SSDs to the Hadoop MapReduce Framework
Author :
Sangwhan Moon ; Jaehwan Lee ; Yang Suk Kee
fDate :
June 27 2014-July 2 2014
Abstract :
Solid State Drive (SSD) cost-per-bit continues to decrease. Consequently, system architects increasingly consider replacing Hard Disk Drives (HDDs) with SSDs to accelerate Hadoop MapReduce processing. When attempting this, system architects usually realize that SSD characteristics and today´s Hadoop framework exhibit mismatches that impede indiscriminate SSD integration. Hence, cost-effective SSD utilization has proved challenging within many Hadoop environments. This paper compares SSD performance to HDD performance within a Hadoop MapReduce framework. It identifies extensible best practices that can exploit SSD benefits within Hadoop frameworks when combined with high network bandwidth and increased parallel storage access. Terasort benchmark results demonstrate that SSDs presently deliver significant cost-effectiveness when they store intermediate Hadoop data, leaving HDDs to store Hadoop Distributed File System (HDFS) source data.
Keywords :
disc drives; hard discs; parallel processing; storage management; HDD; HDFS source data; Hadoop MapReduce Framework; Hadoop distributed file system; SSD; cost-per-bit; hard disk drives; parallel storage access; solid state drive; Bandwidth; Benchmark testing; Media; Performance evaluation; Random access memory; Resource management; Throughput;
Conference_Titel :
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5062-1
DOI :
10.1109/CLOUD.2014.45