DocumentCode
2009339
Title
PipeFlow Engine: Pipeline Scheduling with Distributed Workflow Made Simple
Author
Yin Li ; Chuang Lin
Author_Institution
Tsinghua Nat. Lab. for Inf. Sci. & Technol. (TNList), Tsinghua Univ., Beijing, China
fYear
2013
fDate
15-18 Dec. 2013
Firstpage
142
Lastpage
149
Abstract
Distributed computing system is considered as a fundamental architecture to extend resources such as computation speed, storage capacity, and network bandwidth, which are limited for a single processor. Emerging big data processing techniques like Hadoop take advantages of distributed servers to accomplish scalable parallel computations. Large-scale processing jobs can run on different servers or even different clusters interdependently and be combined together as a workflow to provide meaningful outputs. In this paper, we analyze the common demands of big-data processing and distributed big-data workflow processing. According to that, we design Pipe Flow Engine that has the matching features to meet each of these demands. It orchestrates all involved jobs and schedules them in a batched pipeline mode. We also present two online ranking algorithms that make use of the Pipe Flow, sharing the experience and best practice of using Pipe Flow.
Keywords
Big Data; parallel processing; pipeline processing; processor scheduling; Hadoop; big data processing techniques; distributed big-data workflow processing; distributed computing system; distributed servers; distributed workflow; fundamental architecture; large-scale processing jobs; online ranking algorithms; parallel computations; pipeflow engine; pipeline scheduling; Data handling; Data storage systems; Engines; Information management; Measurement; Pipelines; Servers; PipeFlow; performance; pipeline; workflow;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Systems (ICPADS), 2013 International Conference on
Conference_Location
Seoul
ISSN
1521-9097
Type
conf
DOI
10.1109/ICPADS.2013.31
Filename
6808168
Link To Document