Abstract :
As the width of the processor grows, complexity of a register file (RF) with multiple ports grows more than linearly and leads to larger register access time and higher power consumption. Analysis of characteristics of the Spec2000 benchmark programs when run in an 8-wide processor reveals that only two or less two-source instructions (that require both source registers) are executed in a cycle for a significant portion of total execution time (more than 98% tune for Spec2000 integer and 93% tune for Spec2000 floating-point). Thus the analysis observes that the register port bandwidth is highly underutilized for a significant portion of tune in general purpose computing. In this paper, we propose a novel technique to significantly reduce the number of register ports with a very minor modification in the select logic to issue only a limited number of two-source instructions. This is achieved with no significant impact on processor´s performance. The novelty of the technique is that it is easy to implement and succeeds in reducing the access tune, power, and area of the register file, without shifting burden, in terms of these factors, to any other logic on the chip. With this technique in an 8-wide processor, as compared to a conventional 128-entry RF with 16 read ports, for Spec2000 integer programs a register file can be designed with 11 or 10 read ports as these configurations result in instructions per cycle (IPC) degradation of only 0.929% and 3.38%, respectively. This significantly low degradation in IPC is achieved while reducing the register access tune by 9% and 12%, respectively, and reducing power by 35% and 50%, respectively. For Spec2000 floating-point programs, a register file can be designed with 12 read ports (1.16% IPC loss, 8% less access tune, and 28% less power) or with 11 read ports (3.5% IPC loss, 9% less access time, and 35 % less power). The paper analyzes the performance of all the possible flavors of the proposed technique for register file- - in both 4-wide and 8-wide processors, and presents a choice of the performance and register port complexity combination to the designer
Keywords :
computational complexity; file organisation; instruction sets; 8-wide processor; Spec2000 benchmark program; Spec2000 floating-point program; Spec2000 integer program; high performance processor; instructions per cycle degradation; processor width; register access time; register access tune; register file complexity; register port bandwidth; register port complexity combination; register port complexity reduction; two-source instruction format; Bandwidth; Degradation; Energy consumption; High performance computing; Laboratories; Logic; Performance analysis; Power engineering computing; Radio frequency; Registers;