Title :
A Flexible Heterogeneous Multi-Core Architecture
Author :
Pericàs, Miquel ; Cristal, Adrian ; Cazorla, Francisco J. ; González, Ruben ; Jiménez, Daniel A. ; Valero, Mateo
Author_Institution :
Univ. Polytech. de Catalunya, Barcelona
Abstract :
Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexible enough to provide high throughput for uniform parallel applications as well as high performance for more general workloads. Heterogeneous architectures are a first step in this direction, but partitioning remains static and only roughly fits application requirements. This paper proposes the Flexible Heterogeneous Mul-tiCore processor (FMC), the first dynamic heterogeneous multi-core architecture capable of reconfiguring itself to fit application requirements without programmer intervention. The basic building block of this microarchitecture is a scalable, variable-size window microarchitecture that exploits the concept of Execution Locality to provide large-window capabilities. This allows to overcome the memory wall for applications with high memory-level parallelism (MLP). The microarchitecture contains a set of small and fast cache processors that execute high locality code and a network of small in-order memory engines that together exploit low locality code. Single-threaded applications can use the entire network of cores while multi-threaded applications can efficiently share the resources. The sizing of critical structures remains small enough to handle current power envelopes. In single-threaded mode this processor is able to outperform previous state-of-the-art high-performance processor research by 12% on SpecFP. We show how in a quad- threaded/quad-core environment the processor outperforms a statically allocated configuration in both throughput and harmonic mean, two commonly used metrics to evaluate SMTperformance, by around 2-4%. This is achieved while using a very simple sharing algorithm.
Keywords :
multi-threading; multiprocessing systems; parallel architectures; reconfigurable architectures; cache processor; flexible heterogeneous multicore architecture; instruction-level parallelism; memory-level parallelism; reconfigurable architecture; thread-level parallelism; Application software; Computer architecture; Concurrent computing; Delay; Engines; Microarchitecture; Multicore processing; Parallel processing; Throughput; Yarn;
Conference_Titel :
Parallel Architecture and Compilation Techniques, 2007. PACT 2007. 16th International Conference on
Conference_Location :
Brasov
Print_ISBN :
978-0-7695-2944-8
DOI :
10.1109/PACT.2007.4336196