DocumentCode :
3102680
Title :
Fault-tolerance for macro dataflow parallel computations on grid
Author :
Jafar, Samir ; Roch, Jean-Louis
Author_Institution :
Lab. ID-IMAG, Monbonnot, France
fYear :
2004
fDate :
19-23 April 2004
Firstpage :
583
Lastpage :
584
Abstract :
We present a portable fault tolerant mechanism for execution of macro dataflow parallel programs on a large scale distributed and heterogeneous grid including SMP nodes. Our mechanism is based on a portable checkpoint-rollback and supports both parallel programs with dependencies and addition or resilience of heterogeneous resources. We have implemented this mechanism on top of Athapascan programming interface and experimental results are presented.
Keywords :
checkpointing; data flow computing; fault tolerant computing; grid computing; macros; multiprocessing systems; parallel languages; parallel programming; Athapascan programming interface; heterogeneous grid resources; macro dataflow parallel computation; portable fault tolerant mechanism; Computer architecture; Concurrent computing; Distributed computing; Fault tolerance; Grid computing; Large-scale systems; Parallel languages; Parallel processing; Portable computers; Resilience;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies: From Theory to Applications, 2004. Proceedings. 2004 International Conference on
Print_ISBN :
0-7803-8482-2
Type :
conf
DOI :
10.1109/ICTTA.2004.1307897
Filename :
1307897
Link To Document :
بازگشت