DocumentCode :
2448921
Title :
BlobSeer: Efficient data management for data-intensive applications distributed at large-scale
Author :
Nicolae, Bogdan ; Antoniu, Gabriel ; Bougé, Luc
Author_Institution :
IRISA, Univ. of Rennes 1, Rennes, France
fYear :
2010
fDate :
19-23 April 2010
Firstpage :
1
Lastpage :
4
Abstract :
As the rate, scale and variety of data increases in complexity, the need for flexible applications that can crunch huge amounts of heterogeneous data fast and cost-effective is of utmost importance. Such applications are data-intensive: in a typical scenario, they continuously acquire massive datasets (e.g. by crawling the Web or analyzing access logs) while performing computations over these changing datasets (e.g. building up-to-date search indexes). In order to achieve scalability and performance, data acquisitions and computations need to be distributed at large scale in infrastructures comprising hundreds and thousands of machines. As these applications focus on data rather then on computation, a heavy burden is put on the storage service employed to handle data management, because it must efficiently deal with massively parallel data accesses. In order to achieve this, a series of issues need to be address properly: scalable aggregation of storage space from the participating nodes with minimal overhead, the ability to store huge data objects, efficient fine-grain access to data subsets, high throughput even under heavy access concurrency, versioning, as well as fault tolerance and a high quality of service for access throughput. This paper introduces BlobSeer, an efficient distributed data management service that addresses the issues presented above. In BlobSeer, long sequences of bytes representing unstructured data are called blobs (Binary Large OBject).
Keywords :
concurrency control; data structures; distributed databases; fault tolerant computing; object-oriented databases; query processing; BlobSeer; access concurrency; access throughput; binary large object; blobs; data acquisition; data objects; data-intensive applications; distributed data management service; fault tolerance; heterogeneous data; massively parallel data access; scalable storage space aggregation; storage service; unstructured data; up-to-date search index; Concurrent computing; Data acquisition; Distributed computing; Distribution strategy; Large-scale systems; Performance analysis; Proposals; Quality of service; Scalability; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6533-0
Type :
conf
DOI :
10.1109/IPDPSW.2010.5470802
Filename :
5470802
Link To Document :
بازگشت