Title :
Monitoring remotely executing shared memory programs in software DSMs
Author :
Fei, Long ; Fang, Xing ; Hu, Y. Charlie ; Midkiff, Samuel P.
Author_Institution :
Purdue Univ., West Lafayette, IN
Abstract :
Peer-to-peer (P2P) cycle sharing over the Internet has become increasingly popular as a way to share idle cycles. A fundamental problem faced by P2P cycle sharing systems is how to incrementally monitor and verify, with low overhead, the execution of jobs submitted to a remote untrusted hosting machine, or cluster of machines. In this paper, we present the design and implementation of GripCop DSM, a novel incremental execution monitoring and verification scheme for software distributed shared memory (SDSM) programs running on remote clusters. Our scheme maximally leverages the shared memory abstraction provided by the SDSM system by extending the shared memory abstraction to the monitoring process by replicating one of the processes running on the host cluster to verify intermediate results at runtime. Our GripCop DSM employs two monitoring schemes: (i) a full-scale monitoring scheme that completely replicates the computation of a process running on the cluster; and (ii) a decoy monitoring scheme that deceives the host cluster into believing that full-scale monitoring is being performed without it ever actually being done, thereby incurring negligible overhead. Experiments show that the combined use of full-scale and decoy monitoring ensures faithful execution with low performance impact, even over a wide area network
Keywords :
peer-to-peer computing; program verification; shared memory systems; system monitoring; GripCop; P2P; decoy monitoring; full-scale monitoring; incremental execution monitoring; peer-to-peer cycle sharing; program verification; remotely executing shared memory program monitoring; shared memory abstraction; software distributed shared memory; Central Processing Unit; Computer networks; Computerized monitoring; Condition monitoring; IP networks; Internet; Peer to peer computing; Remote monitoring; Runtime; Workstations;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Conference_Location :
Rhodes Island
Print_ISBN :
1-4244-0054-6
DOI :
10.1109/IPDPS.2006.1639276