DocumentCode :
3757201
Title :
On the Performance Improvement of an Architecture towards Sharing FPUs across Cores for the Design of Multithreading Multicore CPUs
Author :
Kazuhiro Takaki;Takanori Kurihara;Yamin Li
Author_Institution :
Grad. Sch. of CIS, Hosei Univ., Tokyo, Japan
fYear :
2015
Firstpage :
408
Lastpage :
411
Abstract :
The multithreading and multicore techniques are widely adopted in the design of the modern high-performance CPUs. Multithreading technique allows multiple threads to share the functional units (FUs) within a core for the better utilization of the FUs. Thus there will be confliction on the use of some FUs, the floating-point unit (FPU) for instance. In such a case, some floating-point instructions will be suspended until the FPU is available for use. Multicore technique implements a small-scale multiprocessor on a chip. A thread that runs on one core cannot use the FUs of other cores. This results in poor utilization of the FPU in some cores if the threads running on those cores do not contain floating-point instructions at all, although in other cores, the threads are straggling to complete for the FPU. Different from the traditional multiprocessors that are implemented with multiple CPU chips, because the multicore CPUs implement multiprocessors on the same chip, it becomes possible to let the threads in a core group share all the FPUs in the group. When a conflict on the use of FPU occurs, some floating-point operations can be redirected to the cores of the same group in which the FPUs are in idle state, so that the overall performance of the multicore CPU will be improved. This paper investigates such a group architecture and gives the performance improvement of the proposed architecture to that of the traditional multicore architecture. Our experimental results show that, on average for the floating-point benchmarks, 53.2%, 71.0%, and 72.5% performance improvements can be achieved by redirecting the floating-point operations to other cores within the group with the group sizes of two, four, and eight, respectively.
Keywords :
"Multicore processing","Pipelines","Multithreading","Instruction sets","Clocks","Performance evaluation","Benchmark testing"
Publisher :
ieee
Conference_Titel :
Computing and Networking (CANDAR), 2015 Third International Symposium on
Electronic_ISBN :
2379-1896
Type :
conf
DOI :
10.1109/CANDAR.2015.48
Filename :
7424748
Link To Document :
بازگشت