Title :
A lightweight hybrid hardware/software approach for object-relative memory profiling
Author :
Chen, Licheng ; Cui, Zehan ; Bao, Yungang ; Chen, Mingyu ; Huang, Yongbing ; Tan, Guangming
Author_Institution :
State Key Lab. of Comput. Archit., Inst. of Comput. Technol., Beijing, China
Abstract :
Memory profiling is the process of collecting memory address traces during the execution of a program, then analyzing and characterizing the memory behavior of the program offline. With the trend that there will be more and more cores integrated in a processor chip, the “Memory Wall” problem will become more serious in the chip multiprocessor (CMP) system. Thus accurate and effective memory profiling is becoming one of the keys to identify the source of memory system bottlenecks. A large body of work has been contributed to memory profiling, however, most adopts instrumentation, simulator which suffers heavy overhead, or hardware performance counter which is lack of detail trace information. Furthermore, correlating the raw memory address traces with object-relative information allows us to separate regular pattern for certain object from the irregular mixed, thus helps the optimization. In this paper, we propose a lightweight hybrid hardware/software approach for object-relative memory profiling. We monitor physical memory addresses through hardware snooping with negligible overhead; meanwhile we dump Linux kernel page tables of processes, as well as object-relative memory allocation information. Our approach supports not only to collect applications´ full memory traces with detail object relative information, but also to identify hardware-generated memory accesses such as page memory walks due to TLB miss at object level. The experimental results on real system show that our approach is highly accurate (the largest error is 2.04%) and low overhead (the average overhead is 1.60%). Furthermore, we profile two multi-thread applications in detail, and successfully identity hot TLB-miss objects. With object-targeted optimization, we can improve applications´ performance by nearly 6.86%.
Keywords :
Linux; multi-threading; multiprocessing systems; object-oriented programming; operating system kernels; program diagnostics; storage allocation; CMP system; Linux kernel page tables; TLB miss; TLB-miss objects; chip multiprocessor system; detail trace information; full memory traces; hardware performance counter; hardware snooping; hardware-generated memory accesses; lightweight hybrid hardware-software approach; memory behavior; memory system bottlenecks; memory wall problem; multithread applications; negligible overhead; object level; object relative information; object-relative information; object-relative memory allocation information; object-relative memory profiling; object-targeted optimization; page memory walks; physical memory addresses; processor chip; program execution; program offline; raw memory address traces; Arrays; Hardware; Kernel; Linux; Monitoring; Optimization; full memory traces; hybrid; memory profiling; object; page memory walks;
Conference_Titel :
Performance Analysis of Systems and Software (ISPASS), 2012 IEEE International Symposium on
Conference_Location :
New Brunswick, NJ
Print_ISBN :
978-1-4673-1143-4
Electronic_ISBN :
978-1-4673-1145-8
DOI :
10.1109/ISPASS.2012.6189205