DocumentCode :
2260005
Title :
PCI-DMA/CPU handoff for increased effectiveness of checkpointing functionalities in CCL
Author :
Santoro, Andrea ; Quaglia, Francesco
Author_Institution :
Dipt. di Informatica e Sistemistica, Universita di Roma, Italy
fYear :
2003
fDate :
23-25 Oct. 2003
Firstpage :
120
Lastpage :
127
Abstract :
Checkpointing and Communication Library (CCL) is recently developed software in support of optimistic parallel discrete event simulation on myrinet clusters. Beyond low latency message delivery functionalities, CCL also offers non-blocking checkpointing functionalities supported by a programmable PCI DMA engine on board of myrinet cards. CCL employs resynchronization functionality between PCI DMA activities and CPU activities to maintain the consistency of checkpointed information (i.e. to prevent the CPU from updating information that still needs to be copied through DMAing). If re-synchronization is invoked before the checkpoint operation is completed, simulation activities carried out by the CPU may be forced to wait for checkpoint completion. Since data copy through the PCI DMA is slower than what achievable with the CPU, in pathological situations a re-synchronization period may last more than a whole checkpoint operation performed by the CPU, thus ifying the potential benefit from offloading checkpointing from the CPU. This paper tackles such an issue by presenting the design and implementation of a handoff mechanism of checkpoint operations between PCI (Peripheral Component Interconnect) DMA (direct memory access)and CPU to enhance the effectiveness of checkpointing functionalities offered by CCL. Although a checkpoint operation is initially entrusted to the PCI DMA, whenever re-synchronization forces the simulation application to wait for its completion, the checkpoint operation is dynamically switched to the CPU, namely the fastest available device, since its timely completion has become a performance critical task for the simulation application.
Keywords :
discrete event simulation; parallel processing; peripheral interfaces; system recovery; CCL; Checkpointing and Communication Library; PDES; Peripheral Component Interconnect; direct memory access; handoff mechanism; message delivery; myrinet clusters; nonblocking checkpointing; parallel discrete event simulation; programmable PCI DMA; resynchronization; software support; Checkpointing; Clocks; Delay; Discrete event simulation; Engines; Pathology; Remuneration; Software libraries; Synchronization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Simulation and Real-Time Applications, 2003. Proceedings. Seventh IEEE International Symposium on
ISSN :
1530-1990
Print_ISBN :
0-7695-2036-7
Type :
conf
DOI :
10.1109/DISRTA.2003.1243005
Filename :
1243005
Link To Document :
بازگشت