مرکز منطقه ای اطلاع رساني علوم و فناوري - A New Schedule Strategy for Heterogenous Workload-aware in Hadoop

DocumentCode :

3403939

Title :

A New Schedule Strategy for Heterogenous Workload-aware in Hadoop

Author :

Zhe Wang ; Zhengdong Zhu ; Pengfei Zheng ; Qiang Liu ; Xiaoshe Dong

Author_Institution :

Dept. of Comput. Sci. & Technol., Xi´an JiaoTong Univ., Xi´an, China

fYear :

2013

fDate :

22-23 Aug. 2013

Firstpage :

Lastpage :

Abstract :

Demand for large-scale data mining and data analysis has led both industry and academia to design highly scalable data-intensive computing platforms. MapReduce is a well-known programming model to process large amount of data. However, current implementations perform poorly and are inefficient, even to run a single MapReduce job. To manage and process enormous data, multi-jobs instead of single job, running in the platform. Different research and different Job processing, there are different characters in request and utilization of resources. Most schedule strategy applied in Hadoop ignores these differences, so resources utilization rate and job processing efficiencies may be impaired. As to this problem, we put forward a schedule strategy based on job type classification. In this paper, we put forward a schedule strategy based on job type classification. This schedule strategy includes two parts. 1) Divide the job dynamically into two types based on cluster historical operating data: CPU-intensive and I/O-intensive. 2) To remove the influence of noise data on the reliability of historical data, we put forward a schedule strategy-- CICS (CPU and I/O Characteristic Estimation Strategy. That is mainly based on classical FCFS and has been modified intensively on Fairness.

Keywords :

data analysis; data mining; parallel programming; processor scheduling; resource allocation; CICS; CPU and I-O characteristic estimation strategy; CPU intensive job; FCFS; Hadoop; I-O intensive job; MapReduce job; MapReduce programming model; data analysis; heterogeneous workload-aware; highly scalable data-intensive computing platforms; historical data reliability; job processing efficiencies; job type classification; large-scale data mining; resource utilization; scheduler strategy; Benchmark testing; Data models; Estimation; Hardware; Schedules; Scheduling; Tuning; Characteristic Estimation; Hadoop; Scheduler;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

ChinaGrid Annual Conference (ChinaGrid), 2013 8th

Conference_Location :

Changchun

Print_ISBN :

978-0-7695-5058-9

Type :

conf

DOI :

10.1109/ChinaGrid.2013.21

Filename :

6623871

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3403939