Abstract :
The performance of many important commercial workloads, such as on-line transaction processing, is limited by the frequent stalls due to off-chip instruction and data accesses. These applications are characterized by irregular control flow and complex data access patterns that render many low-cost prefetching schemes, such as stream-based and stride-based prefetching, ineffective. For such applications, correlation-based prefetching, which is capable of capturing complex data access patterns, has been shown to be a more promising approach. However, the large instruction and data working sets of these applications require extremely large correlation tables, making these tables impractical to be implemented on-chip. This paper proposes the epoch-based correlation prefetcher, which cost-effectively stores its correlation table in main memory and exploits the concept of epochs to hide the long latency of its correlation table access, and which attempts to eliminate entire epochs instead of individual instruction and data misses. Experimental results demonstrate that the epoch-based correlation prefetches which requires minimal on-chip real estate to implement, improves the performance of a suite of important commercial benchmarks by 13% to 31% and significantly outperforms previously proposed correlation prefetchers.