DocumentCode :
3403939
Title :
A New Schedule Strategy for Heterogenous Workload-aware in Hadoop
Author :
Zhe Wang ; Zhengdong Zhu ; Pengfei Zheng ; Qiang Liu ; Xiaoshe Dong
Author_Institution :
Dept. of Comput. Sci. & Technol., Xi´an JiaoTong Univ., Xi´an, China
fYear :
2013
fDate :
22-23 Aug. 2013
Firstpage :
80
Lastpage :
85
Abstract :
Demand for large-scale data mining and data analysis has led both industry and academia to design highly scalable data-intensive computing platforms. MapReduce is a well-known programming model to process large amount of data. However, current implementations perform poorly and are inefficient, even to run a single MapReduce job. To manage and process enormous data, multi-jobs instead of single job, running in the platform. Different research and different Job processing, there are different characters in request and utilization of resources. Most schedule strategy applied in Hadoop ignores these differences, so resources utilization rate and job processing efficiencies may be impaired. As to this problem, we put forward a schedule strategy based on job type classification. In this paper, we put forward a schedule strategy based on job type classification. This schedule strategy includes two parts. 1) Divide the job dynamically into two types based on cluster historical operating data: CPU-intensive and I/O-intensive. 2) To remove the influence of noise data on the reliability of historical data, we put forward a schedule strategy-- CICS (CPU and I/O Characteristic Estimation Strategy. That is mainly based on classical FCFS and has been modified intensively on Fairness.
Keywords :
data analysis; data mining; parallel programming; processor scheduling; resource allocation; CICS; CPU and I-O characteristic estimation strategy; CPU intensive job; FCFS; Hadoop; I-O intensive job; MapReduce job; MapReduce programming model; data analysis; heterogeneous workload-aware; highly scalable data-intensive computing platforms; historical data reliability; job processing efficiencies; job type classification; large-scale data mining; resource utilization; scheduler strategy; Benchmark testing; Data models; Estimation; Hardware; Schedules; Scheduling; Tuning; Characteristic Estimation; Hadoop; Scheduler;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
ChinaGrid Annual Conference (ChinaGrid), 2013 8th
Conference_Location :
Changchun
Print_ISBN :
978-0-7695-5058-9
Type :
conf
DOI :
10.1109/ChinaGrid.2013.21
Filename :
6623871
Link To Document :
بازگشت