Title :
Computing Properties of Large Scalable and Fault-Tolerant Logical Networks
Author :
Cérin, Christophe ; Yu Lei ; Koskas, Michel
Author_Institution :
LIPN, Univ. Paris 13, Villetaneuse, France
Abstract :
As the number of processors embedded in high performance computing platforms becomes higher and higher, it is vital to force the developers to enhance the scalability of their codes in order to exploit all the resources of the platforms. This often requires new algorithms, techniques and methods for code development that add to the application code new properties: the presence of faults is no more an occasional event but a challenge. Scalability and Fault-Tolerance issues are also present in hidden part of any platform: the overlay network that is necessary to build for controlling the application or in the runtime system support for messaging which is also required to be scalable and fault tolerant. In this paper, we focus on the computational challenges to experiment with large scale (many millions of nodes) logical topologies. We compute Fault-Tolerant properties of different variants of Binomial Graphs (BMG) that are generated at random. For instance, we exhibit interesting properties regarding the number of links regarding some desired Fault-Tolerant properties and we compare different metrics with the Binomial Graph structure as the reference structure. A software tool has been developed for this study and we show experimental results with topologies containing 21000 nodes. We also explain the computational challenge when we deal with such large scale topologies and we introduce various probabilistic algorithms to solve the problems of computing the conventional metrics.
Keywords :
distributed processing; fault tolerance; graph theory; probability; software tools; binomial graph structure; code development; computing properties; fault-tolerant logical networks; high performance computing platforms; overlay network; probabilistic algorithms; scalable logical networks; software tool; Computational modeling; Fault tolerance; Fault tolerant systems; Measurement; Probabilistic logic; Programming; Topology;
Conference_Titel :
Computer Architecture and High Performance Computing (SBAC-PAD), 2011 23rd International Symposium on
Conference_Location :
Vitoria, Espirito Santo
Print_ISBN :
978-1-4577-2050-5
DOI :
10.1109/SBAC-PAD.2011.22