Title :
Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors
Author :
Ye Zhang ; Rauchwerger, Lawrence ; Torrellas, Josep
Author_Institution :
Illinois Univ., Urbana, IL, USA
Abstract :
Recently, we introduced a novel framework for speculative parallelization in hardware (Y. Zhang et al., 1998). The scheme is based on a software based run time parallelization scheme that we proposed earlier (L. Rauchwerger and D. Padue, 1995). The idea is to execute the code (loops) speculatively in parallel. As parallel execution proceeds, extra hardware added to the directory based cache coherence of the DSM machine detects if there is a dependence violation. If such a violation occurs, execution is interrupted, the state is rolled back in software to the most recent safe state, and the code is re-executed serially from that point. The safe state is typically established at the beginning of the loop. Such a scheme is somewhat related to speculative parallelization inside a multiprocessor chip, which also relies on extending the cache coherence protocol to detect dependence violations. Our scheme, however, is targeted to large scale DSM parallelism. In addition, it does not have some of the limitations of the proposed chip-multiprocessor schemes. Such limitations include the need to bound the size of the speculative state to fit in a buffer or L1 cache, and a strict in-order task commit policy that may result in load imbalance among processors. Unfortunately, our scheme has higher recovery costs if a dependence violation is detected, because execution has to backtrack to a safe state that is usually the beginning of the loop. Therefore, the aim of the paper is to extend our previous hardware scheme to effectively handle codes (loops) with a modest number of cross-iteration dependences
Keywords :
distributed shared memory systems; parallel programming; program control structures; DSM machine; DSM multiprocessors; L1 cache; cache coherence protocol; chip-multiprocessor schemes; code re-execution; cross-iteration dependences; dependence violation; dependence violations; directory based cache coherence; large scale DSM parallelism; load imbalance; multiprocessor chip; parallel execution; partially-parallel loops; recovery costs; safe state; software based run time parallelization scheme; speculative parallelization; speculative state; strict in-order task commit policy; Hardware;
Conference_Titel :
High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium On
Conference_Location :
Orlando, FL
Print_ISBN :
0-7695-0004-8
DOI :
10.1109/HPCA.1999.744351