Title :
Storage Optimization for Large-Scale Distributed Stream Processing Systems
Author :
Hildrum, Kirsten ; Douglis, Fred ; Wolf, Joel L. ; Yu, Philip ; Fleischer, Lisa ; Katta, Akshay
Author_Institution :
IBM Thomas J. Watson Res. Center, Hawthorne, NY
Abstract :
We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, incoming data and intermediate results may need to be stored to enable future analyses. The quantity of such data would dominate even the largest storage system. Thus, a mechanism is needed to keep the most useful data. One recently introduced approach is to employ retention value functions, which effectively assign each data object a value that changes over time. Storage space is then reclaimed automatically by deleting data of lowest current value. In such large systems, there can naturally be multiple file systems available, each with different properties. Choosing the right file system for a given incoming data stream presents a challenge. In this paper we provide a novel and effective scheme for optimizing the placement of data within a distributed storage subsystem employing retention value functions. The goal is to keep the data of highest overall value, while simultaneously balancing the read load to the file system.
Keywords :
optimisation; resource allocation; storage management; file system; large-scale distributed stream processing system; resource allocation; storage optimization; Application software; Distributed computing; Educational institutions; File systems; Information services; Internet; Large-scale systems; Relational databases; Streaming media; Web sites; Storage management; file assignment problem; load balancing; optimization; streaming systems; theory;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
Conference_Location :
Long Beach, CA
Print_ISBN :
1-4244-0910-1
Electronic_ISBN :
1-4244-0910-1
DOI :
10.1109/IPDPS.2007.370633