DocumentCode :
2688479
Title :
Watershed: A High Performance Distributed Stream Processing System
Author :
De Souza Ramos, Thatyene Louise Alves ; Oliveira, Rodrigo Silva ; De Carvalho, Ana Paula ; Ferreira, Renato Antônio Celso ; Meira, Wagner, Jr.
Author_Institution :
Dept. of Comput. Sci., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
fYear :
2011
fDate :
26-29 Oct. 2011
Firstpage :
191
Lastpage :
198
Abstract :
The task of extracting information from datasets that become larger at a daily basis, such as those collected from the web, is an increasing challenge, but also provides more interesting insights and analysis. Current analyses went beyond content and now focus on tracking and understanding users´ relationships and interactions. Such computation is intensive both in terms of the processing demand imposed by the algorithms and also the sheer amount of data that has to handled. In this paper we introduce Watershed, a distributed computing framework designed to support the analysis of very large data streams online and in real-time. Data are obtained from streams by the system´s processing components, transformed, and directed to other streams, creating large flows of information. The processing components are decoupled from each other and their connections are strictly data-driven. They can be dynamically inserted and removed, providing an environment in which it is feasible that different applications share intermediate results or cooperate to a global purpose. Our experiments demonstrate the flexibility in creating a set of data analysis algorithms and their composition into a powerful stream analysis environment.
Keywords :
data analysis; distributed processing; Watershed; data analysis algorithms; distributed computing framework; high performance distributed stream processing system; information extraction; online data streams; Computer architecture; Data analysis; Distributed databases; Libraries; Parallel processing; XML; Data-driven architectures; Distributed systems; Dynamic application topology; High-performance computing; Stream processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture and High Performance Computing (SBAC-PAD), 2011 23rd International Symposium on
Conference_Location :
Vitoria, Espirito Santo
ISSN :
1550-6533
Print_ISBN :
978-1-4577-2050-5
Type :
conf
DOI :
10.1109/SBAC-PAD.2011.31
Filename :
6106022
Link To Document :
بازگشت