مرکز منطقه ای اطلاع رساني علوم و فناوري - High performance and alleviated hot-spot problem in processor frontend with enhanced instruction fetch bandwidth utilization

DocumentCode :

1901497

Title :

High performance and alleviated hot-spot problem in processor frontend with enhanced instruction fetch bandwidth utilization

Author :

Rajamani, Prabhu ; Shah, Jatan P. ; Sankaranarayanan, Vadhiraj ; Sangireddy, Rama

Author_Institution :

Dept. of Electr. Eng., Texas Univ., Dallas, TX

fYear :

2006

fDate :

10-12 April 2006

Lastpage :

Abstract :

Current day wide-issue processors require the fetch engine in the frontend to continuously supply instructions to the issue queue in the backend to extract maximum possible amount of instruction level parallelism (ILP). Further, due to the continuous access of level-1 instruction cache (IL1) for fetching instructions, the power dissipation due to switching activity in IL1 is overwhelmingly high and continuous, and hence IL1 is one of the prominent hot-spots on the chip. In this paper, we alleviate the effect of control dependencies whenever a branch instruction forms a small loop. We use replicator architecture, a novel mechanism to supply twice the number of loop instructions in the same cycle, leading to a faster supply of instructions to the backend. This leads to an improvement in processor throughput in terms of instructions committed per cycle (IPC) due to extraction of higher ILP. Further, the mechanism results in a significant reduction in the total number of IL1 accesses. Implementation of the proposed technique in an 8-wide out-of-order issue processor results in a 19% improvement in IPC, and a 8.5% reduction in overall energy consumption on average, for various processor evaluation benchmark programs. Further, an enhanced Replicator mechanism results in larger reduction in IL1 accesses, leading to a 16% reduction in the overall energy consumed. The enhanced architecture removes the continuity in the access to the IL1 by feeding the instructions to the backend all by itself whenever a loop occurs. This gives a break to switching activity in IL1 and hence mitigates the hot-spot problem in the frontend of the processor

Keywords :

bandwidth allocation; benchmark testing; cache storage; instruction sets; parallel architectures; queueing theory; IL1; ILP; backend queue; bandwidth utilization; benchmark program; branch instruction; energy consumption; fetch engine; frontend processor; hot-spot problem; instruction level parallelism; level-1 instruction cache; processor evaluation; replicator architecture; switching activity; Bandwidth; Computer aided instruction; Concurrent computing; Engines; Frequency; Hardware; High performance computing; Parallel processing; Throughput; Transistors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Performance, Computing, and Communications Conference, 2006. IPCCC 2006. 25th IEEE International

Conference_Location :

Phoenix, AZ

Print_ISBN :

1-4244-0198-4

Type :

conf

DOI :

10.1109/.2006.1629391

Filename :

1629391

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1901497