Author :
Bruening, Derek ; Kiriansky, Vladimir ; Garnett, Timothy ; Banerji, Sanjeev
Abstract :
Software code caches are increasingly being used to amortize the runtime overhead of dynamic optimizers, simulators, emulators, dynamic translators, dynamic compilers, and other tools. Despite the now-wide spread use of code caches, techniques for efficiently sharing them across multiple threads have not been fully explored. Some systems simply do not support threads, while others resort to thread-private code caches. Although thread-private caches are much simpler to manage, synchronize, and provide scratch space for, they simply do not scale when applied to many-threaded programs. Thread-shared code caches are needed to target server applications, which employ hundreds of worker threads all performing similar tasks. Yet, those systems that do share their code caches often have brute-force, inefficient solutions to the challenges of concurrent code cache access: a single global lock on runtime system code and suspension of all threads for any cache management action. This limits the possibilities for cache design and has performance problems with applications that require frequent cache invalidations to maintain cache consistency. In this paper, we discuss the design choices when building thread-shared code caches and enumerate the difficulties of thread-local storage, synchronization, trace building, in-cache lookup tables, and cache eviction. We present efficient solutions to these problems that both scale well and do not require thread suspension. We evaluate our results in DynamoRIO, an industrial-strength dynamic binary translation system, on real-world server applications. On these applications our thread-shared caches use an order of magnitude less memory and improve throughput by up to four times compared to thread-private caches.
Keywords :
cache storage; DynamoRIO; brute-force solution; cache eviction; cache management; concurrent code cache access; dynamic compilers; dynamic emulators; dynamic optimizers; dynamic simulators; dynamic translators; in-cache lookup tables; industrial-strength dynamic binary translation system; runtime overhead; runtime system code; server applications; thread suspension; thread-local storage; thread-shared software code caches; trace building; Buildings; Cache storage; Dynamic compiler; File servers; Optimizing compilers; Runtime; Software tools; Table lookup; Throughput; Yarn;