DocumentCode :
3042555
Title :
Highly efficient synchronization based on active memory operations
Author :
Zhang, Lixin ; Fang, Zhen ; Carter, John B.
Author_Institution :
IBM Austin Res. Lab., TX, USA
fYear :
2004
fDate :
26-30 April 2004
Firstpage :
58
Abstract :
Summary form only given. Synchronization is a crucial operation in many parallel applications. As network latency approaches thousands of processor cycles for large scale multiprocessors, conventional synchronization techniques are failing to keep up with the increasing demand for scalable and efficient synchronization operations. We present a mechanism that allows atomic synchronization operations to be executed on the home memory controller of the synchronization variable. By performing atomic operations near where the data resides, our proposed mechanism can significantly reduce the number of network messages required by synchronization operations. Our proposed design also enhances performance by using fine-grained updates to selectively "push " the results of offloaded synchronization operations back to processors when they complete (e.g., when a barrier count reaches the desired value). We use the proposed mechanism to optimize two of the most widely used synchronization operations, barriers and spin locks. Our simulation results show that the proposed mechanism outperforms conventional implementations based on load-linked/store-conditional, processor-centric atomic instructions, conventional memory-side atomic instructions, or active messages. It speeds up conventional barriers by up to 2.1 (4 processors) to 61.9 (256 processors) and spin locks by a factor of up to 2.0 (4 processors) to 10.4 (256 processors).
Keywords :
distributed memory systems; parallel processing; synchronisation; active memory operation; active messages; home memory controller; large scale multiprocessor; memory-side atomic instructions; network latency approach; parallel application; processor cycles; processor-centric atomic instruction; synchronization techniques; Cities and towns; Concurrent computing; Delay; Distributed processing; Large-scale systems; Libraries; Protection; Random access memory; Reduced instruction set computing; System performance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International
Print_ISBN :
0-7695-2132-0
Type :
conf
DOI :
10.1109/IPDPS.2004.1302981
Filename :
1302981
Link To Document :
بازگشت