Title :
Multigrain parallel processing on OSCAR CMP
Author :
Kimura, Keiji ; Kodaka, Takeshi ; Obata, Motoki ; Kasahara, Hironori
Author_Institution :
Adv. Res. Inst. for Sci. & Eng., Waseda Univ., Tokyo, Japan
Abstract :
It seems that instruction level parallelism (ILP) approach, which has been used by various superscalar processors and VLIW processors for a long time, reaches its limitation of performance improvement. To obtain scalable performance improvement, cost effectiveness and high productivity even in the era of one billion transistors, the cooperative work between software and hardware is getting increasingly important. For this reason, the authors have developed OSCAR (Optimally SCheduled Advanced multiprocessoR) Chip Multiprocessor (OSCAR CMP) and OSCAR multigrain compiler simultaneously. To preserve the scalability in the future, OSCAR CMP has mechanisms for efficient use of parallelism and data locality, and for hiding data transfer overhead. These mechanisms can be fully controlled by the OSCAR multigrain compiler. In this paper, the authors focus on multigrain parallel processing on OSCAR CMP, which enables us to exploit loop iteration level parallelism and coarse grain task parallelism in addition to ILP from the entire of a program. Performance of multigrain parallel processing on OSCAR CMP architecture is evaluated using SPEC fp 2000/95 benchmark suite. When microSPARC like single issue core is used, OSCAR CMP gives us from 1.77 to 3.96 times speedup for four processors against single processor. In addition, OSCAR CMP is compared with Sun UltraSPARC II like processor to evaluate cost effectiveness. As a result, OSCAR CMP gives us 1.66 times better performance on the average under the condition that OSCAR CMP and UltraSPARC II are built from almost same number of transistors.
Keywords :
instruction sets; microprocessor chips; multiprocessing systems; parallel architectures; processor scheduling; program compilers; ILP approach; OSCAR CMP; OSCAR multigrain compiler; Optimally SCheduled Advanced multiprocessoR Chip Multiprocessor; SPEC fp 2000/95 benchmark suite; UltraSPARC II; VLIW processors; coarse grain task parallelism; data locality; data transfer overhead; instruction level parallelism; loop iteration level parallelism; multigrain parallel processing; scalable performance improvement; superscalar processors; transistors; Collaborative work; Costs; Hardware; Parallel processing; Processor scheduling; Productivity; Scalability; Software performance; Transistors; VLIW;
Conference_Titel :
Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003
Print_ISBN :
0-7695-2019-7
DOI :
10.1109/IWIA.2003.1262783