DocumentCode :
3435148
Title :
StreamCloud: A Large Scale Data Streaming System
Author :
Gulisano, Vincenzo ; Jimenez-Peris, Ricardo ; Patino-Martinez, Marta ; Valduriez, Patrick
Author_Institution :
Fac. de Inf., Univ. Politec. de Madrid, Madrid, Spain
fYear :
2010
fDate :
21-25 June 2010
Firstpage :
126
Lastpage :
137
Abstract :
Data streaming has become an important paradigm for the real-time processing of continuous data flows in domains such as finance, telecommunications, networking, Some applications in these domains require to process massive data flows that current technology is unable to manage, that is, streams that, even for a single query operator, require the capacity of potentially many machines. Research efforts on data streaming have mainly focused on scaling in the number of queries or query operators, but overlooked the scalability issue with respect to the stream volume. In this paper, we present StreamCloud a large scale data streaming system for processing large data stream volumes. We focus on how to parallelize continuous queries to obtain a highly scalable data streaming infrastructure. StreamCloud goes beyond the state of the art by using a novel parallelization technique that splits queries into subqueries that are allocated to independent sets of nodes in a way that minimizes the distribution overhead. StreamCloud is implemented as a middleware and is highly independent of the underlying data streaming engine. We explore and evaluate different strategies to parallelize data streaming and tackle with the main bottlenecks and overheads to achieve scalability. The paper presents the system design, implementation and a thorough evaluation of the scalability of the fully implemented system.
Keywords :
middleware; parallel processing; query processing; StreamCloud system; continuous data flow processing; continuous queries; data streaming system; middleware; parallelization technique; query operator; Distributed computing; Engines; Finance; Large-scale systems; Middleware; Real time systems; Scalability; System analysis and design; Technology management; Telephony;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on
Conference_Location :
Genova
ISSN :
1063-6927
Print_ISBN :
978-1-4244-7261-1
Type :
conf
DOI :
10.1109/ICDCS.2010.72
Filename :
5541697
Link To Document :
بازگشت