DocumentCode :
2788084
Title :
DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory
Author :
Villavieja, Carlos ; Karakostas, Vasileios ; Vilanova, Lluis ; Etsion, Yoav ; Ramirez, Alex ; Mendelson, Avi ; Navarro, Nacho ; Cristal, Adrián ; Unsal, Osman S.
Author_Institution :
Comput. Archit. Dept., Univ. Politec. de Catalunya (UPC), Barcelona, Spain
fYear :
2011
fDate :
10-14 Oct. 2011
Firstpage :
340
Lastpage :
349
Abstract :
Translation Look aside Buffers (TLBs) are ubiquitously used in modern architectures to cache virtual-to-physical mappings and, as they are looked up on every memory access, are paramount to performance scalability. The emergence of chip-multiprocessors (CMPs) with per-core TLBs, has brought the problem of TLB coherence to front stage. TLBs are kept coherent at the software-level by the operating system (OS). Whenever the OS modifies page permissions in a page table, it must initiate a coherency transaction among TLBs, a process known as a TLB shoot down. Current CMPs rely on the OS to approximate the set of TLBs caching a mapping and synchronize TLBs using costly Inter-Proceessor Interrupts (IPIs) and software handlers. In this paper, we characterize the impact of TLB shoot downs on multiprocessor performance and scalability, and present the design of a scalable TLB coherency mechanism. First, we show that both TLB shoot down cost and frequency increase with the number of processors and project that software-based TLB shoot downs would thwart the performance of large multiprocessors. We then present a scalable architectural mechanism that couples a shared TLB directory with load/store queue support for lightweight TLB invalidation, and thereby eliminates the need for costly IPIs. Finally, we show that the proposed mechanism reduces the fraction of machine cycles wasted on TLB shoot downs by an order of magnitude.
Keywords :
cache storage; interrupts; microprocessor chips; multiprocessing systems; DiDi; TLB caching; TLB shootdown; chip multiprocessor; coherency transaction; interproceessor interrupt; load-store queue support; machine cycle; memory access; operating system; per-core TLB; scalable TLB coherency mechanism; scalable architectural mechanism; shared TLB directory; software handler; translation lookaside buffer; virtual-to-physical mapping; Benchmark testing; Coherence; Hardware; Linux; Memory management; Scalability; Software; Shared TLB; Shootdown characterization; TLB Shootdown;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
Conference_Location :
Galveston, TX
ISSN :
1089-795X
Print_ISBN :
978-1-4577-1794-9
Type :
conf
DOI :
10.1109/PACT.2011.65
Filename :
6113842
Link To Document :
بازگشت