DocumentCode
1564824
Title
Decoupled software pipelining with the synchronization array
Author
Rangan, Ram ; Vachharajani, Neil ; Vachharajani, Manish ; August, David I.
Author_Institution
Dept. of Comput. Sci., Princeton Univ., NJ, USA
fYear
2004
Firstpage
177
Lastpage
188
Abstract
Despite the success of instruction-level parallelism (ILP) optimizations in increasing the performance of microprocessors, certain codes remain elusive. In particular, codes containing recursive data structure (RDS) traversal loops have been largely immune to ILP optimizations, due to the fundamental serialization and variable latency of the loop-carried dependence through a pointer-chasing load. To address these and other situations, we introduce decoupled software pipelining (DSWP), a technique that statically splits a single-threaded sequential loop into multiple nonspeculative threads, each of which performs useful computation essential for overall program correctness. The resulting threads execute on thread-parallel architectures such as simultaneous multithreaded (SMT) cores or chip multiprocessors (CMP), expose additional instruction level parallelism, and tolerate latency better than the original single-threaded RDS loop. To reduce overhead, these threads communicate using a synchronization array, a dedicated hardware structure for pipelined inter-thread communication. DSWP used in conjunction with the synchronization array achieves an 11% to 76% speedup in the optimized functions on both statically and dynamically scheduled processors.
Keywords
data structures; dynamic scheduling; instruction sets; multi-threading; optimising compilers; parallel architectures; pipeline processing; processor scheduling; program control structures; synchronisation; chip multiprocessors; decoupled software pipelining; dynamic processor scheduling; instruction-level parallelism optimizations; microprocessor performance; multiple nonspeculative threads; pipelined inter-thread communication; pointer-chasing load; recursive data structure traversal loops; simultaneous multithreaded cores; single-threaded RDS loop; single-threaded sequential loop; static processor scheduling; synchronization array; thread-parallel architectures; Computer architecture; Data structures; Delay; Hardware; Microprocessors; Parallel processing; Pipeline processing; Software performance; Surface-mount technology; Yarn;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Architecture and Compilation Techniques, 2004. PACT 2004. Proceedings. 13th International Conference on
ISSN
1089-795X
Print_ISBN
0-7695-2229-7
Type
conf
DOI
10.1109/PACT.2004.1342552
Filename
1342552
Link To Document