مرکز منطقه ای اطلاع رساني علوم و فناوري - Physical experimentation with prefetching helper threads on Intel´s hyper-threaded processors

DocumentCode :

2746024

Title :

Physical experimentation with prefetching helper threads on Intel´s hyper-threaded processors

Author :

Kim, Dongkeun ; Liao, Steve Shih-wei ; Wang, Perry H. ; Del Cuvillo, Juan ; Tian, Xinmin ; Zou, Xiang ; Wang, Hong ; Yeung, Donald ; Girkar, Milind ; Shen, John P.

Author_Institution :

Microarchitecture Res. Lab., Intel Corp., USA

fYear :

2004

fDate :

20-24 March 2004

Firstpage :

Lastpage :

Abstract :

Pre-execution techniques have received much attention as an effective way of prefetching cache blocks to tolerate the ever-increasing memory latency. A number of pre-execution techniques based on hardware, compiler, or both have been proposed and studied extensively by researchers. They report promising results on simulators that model a simultaneous multithreading (SMT) processor. We apply the helper threading idea on a real multithreaded machine, i.e., Intel Pentium 4 processor with hyper-threading technology, and show that indeed it can provide wall-clock speedup on real silicon. To achieve further performance improvements via helper threads, we investigate three helper threading scenarios that are driven by automated compiler infrastructure, and identify several key challenges and opportunities for novel hardware and software optimizations. Our study shows a program behavior changes dynamically during execution. In addition, the organizations of certain critical hardware structures in the hyper-threaded processors are either shared or partitioned in the multithreading mode and thus, the tradeoffs regarding resource contention can be intricate. Therefore, it is essential to judiciously invoke helper threads by adapting to the dynamic program behavior so that we can alleviate potential performance degradation due to resource contention. Moreover, since adapting to the dynamic behavior requires frequent thread synchronization, having light-weight thread synchronization mechanisms is important.

Keywords :

cache storage; multi-threading; optimising compilers; program diagnostics; resource allocation; synchronisation; Intel Pentium 4 processor; Intel´s hyper-threaded processors; automated compiler infrastructure; cache blocks; hardware optimization; helper threads prefetching; hyper-threading technology; memory latency; multithreaded machine; pre-execution techniques; program behavior; resource contention; simultaneous multithreading processor; software optimizations; thread synchronization; Degradation; Delay; Hardware; Multithreading; Optimizing compilers; Prefetching; Silicon; Software performance; Surface-mount technology; Yarn;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Code Generation and Optimization, 2004. CGO 2004. International Symposium on

Print_ISBN :

0-7695-2102-9

Type :

conf

DOI :

10.1109/CGO.2004.1281661

Filename :

1281661

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2746024