Title :
VENU: Orchestrating SSDs in hadoop storage
Author :
Krish, K.R. ; Iqbal, M. Safdar ; Butt, Ali R.
Abstract :
A major obstacle in sustaining high performance and scalability in the Hadoop data processing framework is managing the growing data and the need for very high I/O rates. Solid State Disks (SSDs) are promising and are being employed alongside the slower hard disk drives (HDDs) in emerging storage architectures. However, we observed that SSDs are not always a cost-effective option for all Hadoop workloads, and there is a critical need to identify usecases where SSDs can help. To this end, we present VENU, a dynamic data management system for Hadoop. VENU aims to improve overall I/O throughput via effective use of SSDs as a cache for the slower HDDs, not for all data, but for only the workloads that are expected to benefit from SSDs. In addition, we design placement and retrieval schemes to efficiently use the SSD cache. We evaluate our implementation of VENU on a medium-sized cluster and show that it achieves 11% improvement in application completion times when 10% of the available storage is provided by SSDs.
Keywords :
cache storage; data handling; parallel processing; Hadoop data processing framework; Hadoop storage; I/O throughput; SSD cache; VENU; dynamic data management system; solid state disks; Bandwidth; Benchmark testing; Distributed databases; Performance evaluation; Prefetching; Throughput; Venus;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004234