DocumentCode :
2800750
Title :
Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P
Author :
Desai, Narayan ; Buntinas, Darius ; Buettner, Danel ; Balaji, Pavan ; Chan, Anthony
Author_Institution :
Math. & Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL, USA
fYear :
2009
fDate :
22-25 Sept. 2009
Firstpage :
333
Lastpage :
339
Abstract :
High-end computing (HEC) systems have passed the petaflop barrier and continue to move toward the next frontier of {exascale} computing. As companies and research institutes continue to work toward architecting these enormous systems, it is becoming increasingly clear that these systems will utilize a significant amount of shared hardware between processing units, including shared caches, memory management engines, and network infrastructure. While these systems are optimized to use all of the hardware available in a dedicated manner to achieve the best performance, in practice, the shared nature of this hardware makes scheduling applications on it difficult and wasteful. For example, while the IBM Blue Gene/P system has been designed to use a torus network for efficient communication, some of the torus links (especially those connecting different racks) are shared between multiple racks. Thus, a job running on one rack, might preclude another job from running on a second rack in spite of having its compute resources completely idle. In this paper, we assess the relative performance degradation noticed by real applications when such shared network hardware is completely unutilized for some cases. Our measurements on Intrepid, one of the largest Blue Gene/P installations in the world, demonstrate less than 5% degradation for several leadership applications commonly run on the Intrepid system. Further, we demonstrate that the additional scheduling flexibility offered by not sharing such hardware can improve the overall job turnaround time by nearly 40% in some cases.
Keywords :
computer architecture; multiprocessing systems; performance evaluation; resource allocation; Blue Gene/P installation; Intrepid implementation; exascale computing; high-end computing system; network allocation constraint; performance degradation; resource availability improvement; scheduling flexibility; shared network hardware performance evaluation; torus network; Availability; Computer networks; Concurrent computing; Degradation; Hardware; Joining processes; Laboratories; Memory management; Resource management; Scheduling; Job Scheduling; Networking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing, 2009. ICPP '09. International Conference on
Conference_Location :
Vienna
ISSN :
0190-3918
Print_ISBN :
978-1-4244-4961-3
Electronic_ISBN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2009.33
Filename :
5362384
Link To Document :
بازگشت