DocumentCode
2991845
Title
A Highly Efficient Consolidated Platform for Stream Computing and Hadoop
Author
Matsuura, Hiroya ; Ganse, Masaru ; Suzumura, Toyotaro
Author_Institution
Tokyo Inst. of Technol., Tokyo, Japan
fYear
2012
fDate
21-25 May 2012
Firstpage
2026
Lastpage
2034
Abstract
Data Stream Processing or stream computing is the new computing paradigm for processing a massive amount of streaming data in real-time without storing them in secondary storage. In this paper we propose an integrated execution platform for Data Stream Processing and Hadoop with dynamic load balancing mechanism to realize an efficient operation of computer systems and reduction of latency of Data Stream Processing. Our implementation is built on top of System S, a distributed data stream processing system developed by IBM Research. Our experimental results show that our load balancing mechanism could increase CPU usage from 47.77% to 72.14% when compared to the one with no load balancing. Moreover, the result shows that latency for stream processing jobs are kept low even in a bursty situation by dynamically allocating more compute resources to stream processing jobs.
Keywords
data handling; distributed processing; real-time systems; resource allocation; CPU usage; Hadoop; computer systems; distributed data stream processing system; dynamic load balancing mechanism; integrated execution platform; latency reduction; real-time data processing; resource allocation; stream computing; Batch production systems; Heuristic algorithms; Load management; Prediction algorithms; Processor scheduling; Real time systems; Time series analysis; hadoop; stream computing;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location
Shanghai
Print_ISBN
978-1-4673-0974-5
Type
conf
DOI
10.1109/IPDPSW.2012.252
Filename
6270411
Link To Document