DocumentCode :
2721341
Title :
Run-time optimization of sends, receives and file I/O
Author :
Natvig, Thorvald ; Elster, Anne C.
Author_Institution :
Norwegian Univ. of Sci. & Technol.(NTNU), Trondheim, Norway
fYear :
2010
fDate :
20-24 Sept. 2010
Firstpage :
1
Lastpage :
8
Abstract :
On today´s commodity clusters, achieving near-optimal speedup is very hard. We have previously shown that parallel applications using synchronous sequential MPI calls can be optimized by runtime replacement with corresponding asynchronous MPI operations. This may be achieved by protecting memory used by the MPI call from application writes until the operation is complete. However, changing the protection bits of memory pages, which alters the page table and flushes the CPU caches, may introduce a significant overhead. In the case of overlapping requests, which are common for applications with domain decompositions using border exchanges of ghost cells, this would have to be done multiple times for each communications phase. In this paper, we overcome this problem by mapping the same memory twice into the virtual address space, but with different protection bits set for each view. This allows us to optimize overlapping MPI requests without changing the page protection bits until all the requests are finished. The same method is also extended to cover basic file I/O, overlapping file operations with computation without rewriting the original application. We also add distributed locking to I/O operations, which allows aggressive read-ahead and write-merging for parallel applications, reducing the wall-clock time of the file I/O phases that commonly surround the computation of the solution.
Keywords :
application program interfaces; message passing; parallel processing; CPU cache; asynchronous MPI operation; file I/O; message passing interface; overlapping MPI request; overlapping file operation; parallel application; read-ahead; run-time optimization; runtime replacement; synchronous sequential MPI call; virtual address space; write-merging; Asynchronous communication; Instruction sets; Kernel; Layout; Optimization; Resource management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on
Conference_Location :
Heraklion, Crete
Print_ISBN :
978-1-4244-8395-2
Electronic_ISBN :
978-1-4244-8397-6
Type :
conf
DOI :
10.1109/CLUSTERWKSP.2010.5613104
Filename :
5613104
Link To Document :
بازگشت