• DocumentCode
    821608
  • Title

    Moving address translation closer to memory in distributed shared-memory multiprocessors

  • Author

    Qiu, Xiaogang ; Dubois, Michel

  • Author_Institution
    Sun Microsyst. Inc., Palo Alto, CA, USA
  • Volume
    16
  • Issue
    7
  • fYear
    2005
  • fDate
    7/1/2005 12:00:00 AM
  • Firstpage
    612
  • Lastpage
    623
  • Abstract
    To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (translation lookaside buffer), before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably, the latency and bandwidth demands on the TLB are difficult to meet, especially in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem. We describe and compare five options for virtual address translation in the context of distributed shared memory (DSM) multiprocessors, including CC-NUMAs (cache-coherent non-uniform memory access architectures) and COMAs (cache only memory access architectures). In CC-NUMAs, moving the TLB to shared memory is a bad idea because page placement, migration, and replication are all constrained by the virtual page address, which greatly affects processor node access locality. In the context of COMAs, the allocation of pages to processor nodes is not as critical because memory blocks can dynamically migrate and replicate freely among nodes. As the address translation is done deeper in the memory hierarchy, the frequency of translations drops because of the filtering effect. We also observe that the TLB is very effective when it is merged with the shared-memory, because of the sharing and prefetching effects and because there is no need to maintain TLB consistency. Even if the effectiveness of the TLB merged with the shared memory is very high, we also show that the TLB can be removed in a system with address translation done in memory because the frequency of translations is very low.
  • Keywords
    cache storage; distributed shared memory systems; memory architecture; paged storage; storage allocation; CC-NUMA; COMA; TLB; cache only memory access architecture; cache-coherent nonuniform memory access architecture; distributed shared memory multiprocessors; memory prefetching; page placement; translation lookaside buffer; virtual address translation; Bandwidth; Delay; Filtering; Frequency; Large-scale systems; Memory architecture; Multiprocessing systems; Prefetching; Scalability; Space technology; Multiprocessors; distributed shared memory; dynamic address translation; simulations; virtual memory; virtual-address caches.;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2005.84
  • Filename
    1435339