DocumentCode :
1340379
Title :
Understanding why correlation profiling improves the predictability of data cache misses in nonnumeric applications
Author :
Mowry, Todd C. ; Luk, Chi-Keung
Author_Institution :
Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Volume :
49
Issue :
4
fYear :
2000
fDate :
4/1/2000 12:00:00 AM
Firstpage :
369
Lastpage :
384
Abstract :
Latency-tolerance techniques offer the potential for bridging the ever-increasing speed gap between the memory subsystem and today´s high-performance processors. However, to fully exploit the benefit of these techniques, one must be careful to apply them only to the dynamic references that are likely to suffer cache misses-otherwise the runtime overheads can potentially offset any gains. In this paper, we focus on isolating dynamic miss instances in nonnumeric applications, which is a difficult but important problem. Although compilers cannot statically analyze data locality in nonnumeric applications, one viable approach is to use profiling information to measure the actual miss behavior. Unfortunately, the state-of-the-art in cache miss profiling (which we call summary profiling) is inadequate for references with intermediate miss ratios-it either misses opportunities to hide latency, or else inserts overhead that is unnecessary. To overcome this problem, we propose and evaluate a new profiling technique that helps predict which dynamic instances of a static memory reference will hit or miss in the cache: correlation profiling Our experimental results demonstrate that roughly half of the 21 nonnumeric applications we study can potentially enjoy significant reductions in memory stall time by exploiting at least one of the three forms of correlation profiling we consider: control-flow correlation, self correlation, and global correlation. In addition, our detailed case studies illustrate that self correlation succeeds because a given reference´s cache outcomes often contain repeated patterns and control-flow correlation succeeds because cache outcomes are often call-chain dependent. Finally, we suggest a number of ways to exploit correlation profiling in practice and demonstrate that software prefetching can achieve better performance on a modern superscalar processor when directed by correlation profiling rather than summary profiling information
Keywords :
cache storage; program compilers; software performance evaluation; control-flow correlation; correlation profiling; data cache misses; data locality; global correlation; high-performance processors; latency-tolerance techniques; memory subsystem; nonnumeric applications; self correlation; software prefetching; summary profiling; superscalar processor; Data analysis; Delay; Information analysis; Prefetching; Runtime; Software performance;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/12.844349
Filename :
844349
Link To Document :
بازگشت