Abstract :
The microprocessor industry has moved toward chip multiprocessor (CMP) designs as a means of utilizing the increasing transistor counts in the face of physical and micro-architectural limitations. Despite this move, CMPs do not directly improve the performance of single-threaded codes, a characteristic of most applications. In order to support parallelization of general-purpose applications, computer architects have proposed CMPs with lightweight scalar communication mechanisms (R. Rangan et al., 2004), (K. Sankaralingam, et al., 2003), (M.B. Taylor et al., 2005). Despite such support, most existing compiler multi-threading techniques have generally demonstrated little effectiveness in extracting parallelism from non-scientific applications (W. Lee et al., 1998), (W. Lee et al., 2002), (K. Rich and M. Farrens, 2004). The main reason for this is that such techniques are mostly restricted to extracting parallelism within straight-line regions of code. In this paper, we first propose a framework that enables global multi-threaded instruction scheduling in general. We then describe GREMIO, a scheduler built using this framework. GREMIO operates at a global scope, at the procedure level, and uses control dependence analysis to extract non-speculative thread-level parallelism from sequential codes. Using a fully automatic compiler implementation of GREMIO and a validated processor model, this paper demonstrates gains for a dual-core CMP model running a variety of codes. Our experiments demonstrate the advantage of exploiting global scheduling for multithreaded architectures, and present gains in a detailed comparison with the decoupled software pipelining (DSWP) multi-threading technique (G. Ottoni et al., 2005). Furthermore, our experiments show that adding GREMIO to a compiler with DSWP improves the average speedup from 16.5% to 32.8% for important benchmark functions when utilizing two cores, indicating the importance of this technique in making compilers extract threa- - ds effectively.
Keywords :
multi-threading; processor scheduling; GREMIO; chip multiprocessor designs; compiler multithreading techniques; global multithreaded instruction scheduling; lightweight scalar communication mechanisms; microprocessor industry; multithreaded architectures; nonspeculative thread-level parallelism; sequential codes; single-threaded codes; Application software; Automatic control; Computer applications; Computer architecture; Concurrent computing; Job shop scheduling; Microprocessors; Parallel processing; Pipeline processing; Processor scheduling;