Title :
Reclustering of high energy physics data
Author :
Schaller, Martin
Author_Institution :
EP Div., CERN, Geneva, Switzerland
Abstract :
The coming high energy physics experiments will store Petabytes of data into object databases. Analysis jobs will frequently traverse collections containing millions of stored objects. Clustering is one of the most effective means to enhance the performance of these applications. The paper presents a reclustering algorithm for independent objects contained in multiple possibly overlapping collections on secondary storage. The algorithm decomposes the stored objects into a number of independent chunks and then maps these chunks to a traveling salesman problem. Under a set of realistic assumptions, the number of disk seeks is reduced almost to the theoretical minimum. Experimental results obtained from a prototype are included
Keywords :
data analysis; high energy physics instrumentation computing; object-oriented databases; scientific information systems; storage management; travelling salesman problems; analysis jobs; disk seeks; high energy physics data reclustering; high energy physics experiments; independent chunks; independent objects; object databases; overlapping collections; realistic assumptions; reclustering algorithm; secondary storage; stored objects; traveling salesman problem; Data analysis; Databases; Delay; Electrical capacitance tomography; Engines; Lapping; Physics; Prototypes; Query processing; Read only memory;
Conference_Titel :
Scientific and Statistical Database Management, 1999. Eleventh International Conference on
Conference_Location :
Cleveland, OH
Print_ISBN :
0-7695-0046-3
DOI :
10.1109/SSDM.1999.787635