Title : 
Zero-cycle loads: microarchitecture support for reducing load latency
         
        
            Author : 
Austin, Todd M. ; Sohi, Gurindar S.
         
        
            Author_Institution : 
Wisconsin Univ., Madison, WI, USA
         
        
        
            fDate : 
29 Nov-1 Dec 1995
         
        
        
        
            Abstract : 
Untolerated load instruction latencies often have a significant impact on overall program performance. As one means of mitigating this effect we present an aggressive hardware-based mechanism that provides effective support for reducing the latency of load instructions. Through the judicious use of instruction predecode, base register caching, and fast address calculation, it becomes possible to complete load instructions up to two cycles earlier than traditional pipeline designs. For a pipeline with one cycle data cache access, this results in what we term a zero-cycle load. A zero-cycle load produces a result prior to reaching the execute stage of the pipeline, allowing subsequent dependent instructions to issue unfettered by load dependencies. Programs executing on processors with support for zero-cycle loads experience significantly fewer pipeline stalls due to load instructions and increased overall performance. We present two pipeline designs supporting zero-cycle loads: one for pipelines with a single stage of instruction decode, and another for pipelines with multiple decode stages. We evaluate these designs in a number of contexts: with and without software support, in-order vs. out-of-order issue, and on architectures with many and few registers. We find that our approach is quite effective at reducing the impact of load latency, even more so on architectures with in-order issue and few registers
         
        
            Keywords : 
parallel architectures; performance evaluation; pipeline processing; base register caching; instruction predecode; load instruction latencies; load instructions; load latency; microarchitecture support; pipeline designs; pipeline stalls; program performance; zero-cycle loads; Computer architecture; Decoding; Delay; Hazards; Impedance; Microarchitecture; Out of order; Pipelines; Processor scheduling; Registers;
         
        
        
        
            Conference_Titel : 
Microarchitecture, 1995., Proceedings of the 28th Annual International Symposium on
         
        
            Conference_Location : 
Ann Arbor, MI
         
        
        
            Print_ISBN : 
0-8186-7349-4
         
        
        
            DOI : 
10.1109/MICRO.1995.476815