Title :
Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms
Author :
McCurdy, Collin ; Vetter, Jeffrey
Author_Institution :
Future Technol. Group, Oak Ridge Nat. Lab., Oak Ridge, TN, USA
Abstract :
Until recently, most high-end scientific applications have been immune to performance problems caused by Non-Uniform Memory Access (NUMA). However, current trends in micro-processor design are pushing NUMA to smaller and smaller scales. This paper examines the current state of NUMA and makes several contributions. First, we summarize the performance problems that NUMA can present for multi-threaded applications and describe methods of addressing them. Second, we demonstrate that NUMA can indeed be a significant problem for scientific applications, showing that it can mean the difference between an application scaling perfectly and failing to scale at all. Third, we describe, in increasing order of usefulness, three methods of using hardware performance counters to aid in finding NUMA-related problems. Finally, we introduce Memphis, a data-centric toolset that uses Instruction Based Sampling to help pinpoint problematic memory accesses, and demonstrate how we used it to improve the performance of several production-level codes - HYCOM, XGC1 and CAM - by 13%, 23% and 24% respectively.
Keywords :
multiprocessing programs; performance evaluation; Memphis; NUMA related performance problems; data-centric toolset; hardware performance counters; instruction based sampling; micro-processor design; multi-core platforms; non-uniform memory access; problematic memory accesses; Automatic control; CADCAM; Computer aided manufacturing; Counting circuits; Delay; Hardware; Laboratories; Programming profession; Sampling methods; Sockets;
Conference_Titel :
Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on
Conference_Location :
White Plains, NY
Print_ISBN :
978-1-4244-6023-6
Electronic_ISBN :
978-1-4244-6024-3
DOI :
10.1109/ISPASS.2010.5452060