Author_Institution :
Dept. of Electr. Eng., Texas Univ., Richardson, TX
Abstract :
With the advent of clustered microarchitectures, the rename map table at the front-end is shared by the clusters, and hence it´s critical path delay should not become a bottleneck in determining the processor clock cycle time. Also, renaming logic in the front-end is one of the largest contributors of peak temperatures on the chip, and so demands attention to reduce the power consumption. Analysis of characteristics of Spec2000int programs reveals that, when the programs are processed in a 4-wide (8-wide) processor, none or only one two-source instruction (an instruction with two source registers) is renamed in a cycle for 94% (92%) of total execution time. In this paper, we propose a novel technique to significantly reduce the number of ports in the rename map table. The novelty of the technique is that it is easy to implement and succeeds in reducing the access time, power, and area of the rename logic, without any additional power, area, and delay overheads in any other logic on the chip. With this technique in an 8-wide processor, a rename map table with 9 read ports, instead of 16, results in a reduction in access time, power, and area by 14%, 42%, and 49%, respectively, with only 4.7% loss in IPC. Similar gains are also seen in a 4-wide processor
Keywords :
circuit complexity; high-speed integrated circuits; logic design; low-power electronics; microprocessor chips; multiport networks; Spec2000int programs; clustered microarchitectures; instruction level parallelism; logic circuit complexity; processor clock cycle time; processor front-end; rename map table; Clocks; Data mining; Delay effects; Energy consumption; Logic circuits; Logic design; Microarchitecture; Read-write memory; Registers; Temperature;