DocumentCode :
1490598
Title :
An Analytic Framework for Detailed Resource Profiling in Large and Parallel Programs and Its Application for Memory Use
Author :
Finkler, Ulrich
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
59
Issue :
3
fYear :
2010
fDate :
3/1/2010 12:00:00 AM
Firstpage :
358
Lastpage :
370
Abstract :
Profiling is an essential and widely used technique to understand the resource use of applications. For example, the memory use of large applications is becoming an important cost factor. Very large systems are typically sized to accommodate designated tasks, and thus, the price, as well as cache and TLB efficiency, depends significantly on the memory footprint of the target applications. Importantly, the increasing use of multicore systems magnifies the problem since memory use grows with the number of parallel tasks. Additionally, the presence of multiple tasks or threads makes the problem of correlating resource use to the program structure harder. Thus, tools that correlate resource use with program structure with quantitative error margins are essential for optimizing the resource use of complex software applications. While efficient tools for the profiling of execution time are available, the choices for detailed profiling of memory use or other hardware resources are very limited. We were unable to find tools that provided sufficiently accurate insight into, e.g., memory use without adding unacceptable overhead in memory use and execution time for the performance analysis of very large applications. In this paper, we present a highly efficient probabilistic method for profiling that provides detailed resource usage information R?(t) indexed by the full location descriptor ? (e.g., process id, thread id, and call chain) and time t. Importantly, we provide an analytical framework, which provides error estimates and allows to analyze and quantitatively optimize a wide variety of profiling scenarios. We employed the probabilistic approach to implement a memory profiling tool that adds minimal overhead and does not require recompilation or relinking. The tool provides the memory use M? (t) for all location descriptors ? over the execution time for single and multithreaded programs. Experimental results confirm that execution time and memory o- - verhead are less than 10 percent of the unprofiled, optimized execution. Importantly, the technique is sufficiently general to be applicable to profiling of other hardware resources as cache or TLB misses over time for all location descriptors with similarly low overhead and across multiple processes, threads, and processors.
Keywords :
cache storage; multi-threading; program diagnostics; resource allocation; TLB efficiency; analytic framework; cache efficiency; complex software applications; cost factor; detailed resource profiling; full location descriptor; memory footprint; memory use; multithreaded programs; parallel programs; performance analysis; probabilistic method; quantitative error margins; resource usage information; Application software; Computer errors; Costs; Hardware; Multicore processing; Performance analysis; Software tools; Yarn; Resource usage; call chain; memory usage; numerical.; probabilistic; profiling;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2009.149
Filename :
5276794
Link To Document :
بازگشت