• DocumentCode
    964056
  • Title

    Complexity-effective reorder buffer designs for superscalar processors

  • Author

    Kucuk, Gurhan ; Ponomarev, Dmitry V. ; Ergin, Oguz ; Ghose, Kanad

  • Author_Institution
    Dept. of Comput. Sci., State Univ. of New York, Binghamton, NY, USA
  • Volume
    53
  • Issue
    6
  • fYear
    2004
  • fDate
    6/1/2004 12:00:00 AM
  • Firstpage
    653
  • Lastpage
    665
  • Abstract
    All contemporary dynamically scheduled processors support register renaming to cope with false data dependencies. One of the ways to implement register renaming is to use the slots within the reorder buffer (ROB) as physical registers. In such designs, the ROB is a large multiported structure that occupies a significant portion of the die area and dissipates a sizable fraction of the total chip power. The heavily ported ROB is also likely to have a large delay that can limit the processor clock rate. We consider several approaches for reducing the ROB complexity in processors that use the ROB slots to implement physical registers. The first approach exploits the fact that the bulk of the source operand reads are satisfied through forwarding or reading of the committed register values. Our technique completely eliminates the read ports needed on the ROB for reading source operands. A small set of associatively addressed retention latches is used to compensate for the resulting performance degradation by caching the most recently produced results. The second technique relies on a distributed implementation that spreads the centralized ROB structure across the function units (FUs)´, with each distributed component sized to match the FU workload and with one write port and two read ports on each component. The third approach combines the use of retention latches and a distributed ROB implementation that uses minimally ported distributed components. The net result of combining the two techniques is the ROB distribution with minimal conflicts over the read and no conflicts over the write ports. Our designs are evaluated using the simulation of SPEC 2000 benchmarks and measurements of the actual ROB layouts in a 0.18 micron CMOS process.
  • Keywords
    buffer storage; computational complexity; parallel architectures; CMOS process; complexity-effective design; false data dependencies; low-power datapath; register file; register renaming; reorder buffer; superscalar processor; CMOS process; Clocks; Degradation; Delay; Dynamic scheduling; Microprocessors; Out of order; Process design; Processor scheduling; Registers; 65; Reorder buffer; complexity-effective design; low-power datapath; register file.;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2004.5
  • Filename
    1288541