Title :
Distributed data access in the Sequential Access Model at the D0 experiment at Fermilab
Author :
Terekhov, Igor ; White, Victoria
Author_Institution :
Fermi Nat. Accel. Lab., Batavia, IL, USA
Abstract :
Presents the Sequential Access Model (SAM), which is the data-handling system for D0, one of two primary high-energy experiments at Fermilab. During the next several years, the D0 experiment will store a total of about 1 PByte of data, including raw detector data and data processed at various levels. The design of SAM is not specific to the D0 experiment and carries few assumptions about the underlying mass storage level; its ideas are applicable to any sequential data access. By definition, in the sequential access mode, a user application needs to process a stream of data by accessing each data unit exactly once, the order of the data units in the stream being irrelevant. The units of data are laid out sequentially in files. The adopted model allows for a significant optimization of system performance, a reduction in user file latency and an increase in the overall throughput. In particular, caching is done with the knowledge of all the files that are needed “in the near future”, which is defined as all the files being used by already-running or submitted jobs. The bulk of the data is stored in files on tape in the mass storage system Enstore. All of the data managed by SAM is cataloged in great detail in a relational database (Oracle)
Keywords :
cache storage; data acquisition; data handling; distributed databases; high energy physics instrumentation computing; magnetic tape storage; relational databases; 1 PByte; Enstore mass storage system; Fermi National Accelerator Laboratory; Fermilab D0 experiment; Oracle relational database; Sequential Access Model; caching; data cataloguing; data files; data handling system; data stream; data units; distributed data access; high-energy physics experiment; magnetic tape storage; mass storage; processed data; raw detector data; running jobs; sequential data access; submitted jobs; system performance optimization; throughput; user file latency; Data handling; Delay; Information retrieval; Laboratories; Libraries; Relational databases; Samarium; Storage automation; System performance; Throughput;
Conference_Titel :
High-Performance Distributed Computing, 2000. Proceedings. The Ninth International Symposium on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
0-7695-0783-2
DOI :
10.1109/HPDC.2000.868672