DocumentCode :
1064172
Title :
Management of Online Processing Farms in the ATLAS Experiment
Author :
Dobson, Marc ; Malik, Usman Ahmad ; Elejabarrieta, Hegoi Garitaonandia
Author_Institution :
Eur. Organ. for Nucl. Res., Geneva
Volume :
55
Issue :
1
fYear :
2008
Firstpage :
411
Lastpage :
416
Abstract :
The ATLAS experiment will use of order three thousand nodes for the online processing farms. The administration of such a large cluster is a challenge. The ability to quickly turn on/off machines, especially after a power cut, and the ability to remote monitor the hardware health whether the machine be on or off are some of the major issues. To solve these problems ATLAS has decided wherever possible to use Intelligent Platform Management Interfaces (IPMI) for its nodes. This paper will present the mechanisms which were developed to allow the distribution of management and monitoring commands to many machines. These commands were run simultaneously on the prototype farm, by taking into account the specificities of the different IPMI versions and implementations, and the network topology. Results from timing measurements for the distribution of commands to many nodes, for booting and for shutting down of the nodes will be shown with an extrapolation to the final cluster size.
Keywords :
computerised monitoring; high energy physics instrumentation computing; position sensitive particle detectors; ATLAS experiment; hardware monitoring; intelligent platform management interfaces; network topology; online processing farms; Condition monitoring; Hardware; Machine intelligence; Network topology; Personal communication networks; Pipelines; Prototypes; Remote monitoring; Sensor phenomena and characterization; Temperature sensors; ATLAS; Administration; cluster; farm;
fLanguage :
English
Journal_Title :
Nuclear Science, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9499
Type :
jour
DOI :
10.1109/TNS.2007.913489
Filename :
4448473
Link To Document :
بازگشت