مرکز منطقه ای اطلاع رساني علوم و فناوري - On the Performance Improvement of an Architecture towards Sharing FPUs across Cores for the Design of Multithreading Multicore CPUs

Abstract :

The multithreading and multicore techniques are widely adopted in the design of the modern high-performance CPUs. Multithreading technique allows multiple threads to share the functional units (FUs) within a core for the better utilization of the FUs. Thus there will be confliction on the use of some FUs, the floating-point unit (FPU) for instance. In such a case, some floating-point instructions will be suspended until the FPU is available for use. Multicore technique implements a small-scale multiprocessor on a chip. A thread that runs on one core cannot use the FUs of other cores. This results in poor utilization of the FPU in some cores if the threads running on those cores do not contain floating-point instructions at all, although in other cores, the threads are straggling to complete for the FPU. Different from the traditional multiprocessors that are implemented with multiple CPU chips, because the multicore CPUs implement multiprocessors on the same chip, it becomes possible to let the threads in a core group share all the FPUs in the group. When a conflict on the use of FPU occurs, some floating-point operations can be redirected to the cores of the same group in which the FPUs are in idle state, so that the overall performance of the multicore CPU will be improved. This paper investigates such a group architecture and gives the performance improvement of the proposed architecture to that of the traditional multicore architecture. Our experimental results show that, on average for the floating-point benchmarks, 53.2%, 71.0%, and 72.5% performance improvements can be achieved by redirecting the floating-point operations to other cores within the group with the group sizes of two, four, and eight, respectively.