Author_Institution :
Fac. of Autom. Control, Electron. & Comput. Sci., Silesian Univ. of Technol., Gliwice, Poland
Abstract :
Modern detectors used in high energy physics experiments are complex instruments designed to register collisions of particles at a rate in the MHz range. Data that correspond to a single collision of particles, referred to as an event, are acquired from millions of readout channels, and filtered, first by dedicated hardware, and then by computing farms running sophisticated filtering algorithms. In case of data acquisition systems with single-stage software filtration, due to the high input rate (the order of 100 kHz), the data are usually distributed in a static way between filtering nodes. However, the static distribution determines strongly the system, and results in decreased fault tolerance. The main objective of the presented studies is to increase the system´s overall fault tolerance through dynamic load balancing. The proposed method aims to balance the workload inside heterogeneous systems, as well as, homogeneous systems, where the imbalance could be caused by faults. Moreover, our research includes developing a scalable load balancing protocol along with a distributed asynchronous load assignment policy. As a case study we consider the Data Acquisition system of the Compact Muon Solenoid experiment at CERN´s new Large Hadron Collider.
Keywords :
data acquisition; distributed processing; fault tolerant computing; filtering theory; physics computing; resource allocation; sensors; compact Muon solenoid experiment; complex instruments; distributed asynchronous load assignment policy; dynamic load balancing; fault tolerance; fault tolerant data acquisition system; filtering algorithms; high energy physics experiments; particle collisions; single-stage software filtration; Algorithm design and analysis; Data acquisition; Distributed databases; Fault tolerance; Fault tolerant systems; Filtering; Load management;