• DocumentCode
    2817415
  • Title

    Decoupling local variable accesses in a wide-issue superscalar processor

  • Author

    Cho, Sangyeun ; Pen-Chung Yew ; Gyungho Lee

  • Author_Institution
    Samsung Electron. Co., Yongin-City, South Korea
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    100
  • Lastpage
    110
  • Abstract
    Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data cache however becomes increasingly inefficient and can add to the hardware complexity significantly. This paper takes an alternative or complementary approach for providing more data bandwidth, called the data-decoupled architecture. The approach, with support from the compiler and/or hardware, partitions the memory stream into two independent streams early in the processor pipeline, and feeds each stream to a separate memory access queue and cache. Under this model, the paper studies the potential of decoupling memory accesses to program´s local variables that are allocated on the run-time stack. Using a set of integer and floating-point programs from the SPEC95 benchmark suite, it is shown that local variable accesses constitute a large portion of all the memory references, while their reference space is very small, averaging around 7 words per (static) procedure. To service local variable accesses quickly, two optimizations fast data forwarding and access combining, are proposed and studied. Some of the important design parameters, such as the cache size, the number of cache ports, and the degree of access combining, are studied based on simulations. The potential performance of the proposed scheme is measured using various configurations, and it is concluded that the scheme can become a viable alternative to building a single multi-ported data cache
  • Keywords
    fault tolerant computing; parallel architectures; performance evaluation; program compilers; SPEC95 benchmark suite; cache size; compiler; data bandwidth; data cache; data-decoupled architecture; floating-point programs; hardware complexity; local variable accesses decoupling; multi-ported data cache; performance potential; wide-issue superscalar processor; Bandwidth; Clocks; Delay; Feeds; Large scale integration; Pipelines; Read only memory; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture, 1999. Proceedings of the 26th International Symposium on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1063-6897
  • Print_ISBN
    0-7695-0170-2
  • Type

    conf

  • DOI
    10.1109/ISCA.1999.765943
  • Filename
    765943