DocumentCode :
2092997
Title :
Optimizing Distributed Architectures to Improve Performance on Checkpointing Applications
Author :
Núñez, Alberto ; Fernandez, J. ; Carretero, Jesús ; Prada, Laura ; Blaum, Mario
Author_Institution :
Comput. Sci. Dept., Univ. Carlos III de Madrid, Leganés, Spain
fYear :
2011
fDate :
2-4 Sept. 2011
Firstpage :
487
Lastpage :
492
Abstract :
Nowadays, satisfying the global throughput targets of each application in High Performance Computing systems is a difficult task because of the high number of architectural configurations having a considerable impact on the overall system performance, such as the number of storage servers, features of the communication links, number of CPU cores per node, etc. In this paper we have performed a thorough study of the compared performance of scaling up HPC cluster architectures using a checkpointing application model. This study is specifically focused on multi-core HPC clusters and the scaling process is oriented towards the three main resources: computing power, communications and storage. The main goal of this work is to evaluate and analyze how evolves both scalability and bottlenecks existent on different HPC multi-core architectures using different architectural configurations. In order to achieve this goal, a set of simulation experiments has been achieved using a simulation framework, called SIMCAN, specifically designed for modeling and simulating HPC architectures. The results obtained show that the computing power is well suited thanks to the multi-core processors, while the problems are found on the storage and on the communications channels, being the storage network the main bottleneck.
Keywords :
checkpointing; computer architecture; mainframes; multiprocessing systems; performance evaluation; storage area networks; SIMCAN framework; architectural configuration; checkpointing application model; distributed architecture optimization; high performance computing system; multicore HPC cluster architecture; performance improvement; storage network; Bandwidth; Checkpointing; Computational modeling; Computer architecture; Servers; System performance; Throughput; Checkpointing applications; Performance; Simulation of HPC systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on
Conference_Location :
Banff, AB
Print_ISBN :
978-1-4577-1564-8
Electronic_ISBN :
978-0-7695-4538-7
Type :
conf
DOI :
10.1109/HPCC.2011.172
Filename :
6063029
Link To Document :
بازگشت