Title :
Revisiting accelerator-rich CMPs: Challenges and solutions
Author :
Teimouri, Nasibeh ; Tabkhi, Hamed ; Schirner, Gunar
Author_Institution :
Dept. of Electr. & Comput. Eng., Northeastern Univ. Boston, Boston, MA, USA
Abstract :
Heterogeneous Chip Multi Processors (CMP)s, which combine processor cores with specialized HW accelerators, are one main approach to high-performance low-power computing. While it is promising for few accelerators, the scalability is a major challenge with increasing number of accelerators. Resources including memory, communication fabric and processor turn into bottlenecks and result in accelerator under-utilization and cripple the performance. This paper analyzes the scalability of heterogeneous CMPs with many accelerators and identifies bottlenecks and their impacts on system performance. It introduces an analytical method for scalability/bottleneck analysis that is backed up by a simulation-based performance analysis (using automatically generated virtual platforms). This paper proposes a novel architecture template: Transparent Self-Synchronizing (TSS) accelerators for efficient/scalable realization of streaming applications. TSS achieves the efficiency / scalability through configurable point-to-point connections and self synchronization between HW accelerators and efficient management of accelerator´s memory. This article demonstrates the TSS benefits using both analytical and simulation methods. TSS significantly reduces the pressure on the communication fabric, processor load, and memory requirements to improve scalability. Even with increasing number of accelerators, TSS can achieve more than 85% accelerator utilization. In contrast, in ACC-based CMPs the accelerator utilization drops fast; less than 40% with six accelerators or even worse with more accelerators. The scalability benefits of TSS are more pronounced as the number of hardware accelerators increases.
Keywords :
integrated circuit reliability; low-power electronics; microprocessor chips; ACC; TSS accelerators; accelerator-rich CMP; analytical method; automatically generated virtual platforms; bottleneck analysis; communication fabric; configurable point-to-point connections; heterogeneous chip multi processors; high-performance low-power computing; memory; memory requirements; processor cores; processor load; processor tum; scalability analysis; self synchronization; simulation-based performance analysis; specialized HW accelerators; streaming applications; transparent self-synchronizing accelerators; Fabrics; Logic gates; Ports (Computers); Runtime; Synchronization;
Conference_Titel :
Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE
Conference_Location :
San Francisco, CA
DOI :
10.1145/2744769.2744902