مرکز منطقه ای اطلاع رساني علوم و فناوري - Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

DocumentCode :

3355026

Title :

Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Author :

Gibert, Enric ; Sanchez, Javier ; González, Antonio

Author_Institution :

Dept. d´´Arquitectura de Computadors, Univ. Politecnica de Catalunya, Barcelona, Spain

fYear :

2003

fDate :

23-26 March 2003

Firstpage :

193

Lastpage :

203

Abstract :

Clustering is a common technique to deal with wire delays. Fully-distributed architectures, where the register file, the functional units and the cache memory are partitioned, are particularly effective to deal with these constraints and besides they are very scalable. However the distribution of the data cache introduces a new problem: memory instructions may reach the cache in an order different to the sequential program order, thus possibly violating its contents. In this paper two local scheduling mechanisms that guarantee the serialization of aliased memory instructions are proposed and evaluated: the construction of memory dependent chains (MDC solution), and two transformations (store replication and load-store synchronization) applied to the original data dependence graph (DDGT solution). These solutions do not require any extra hardware. The proposed scheduling techniques are evaluated for a word-interleaved cache clustered VLIW processor (although these techniques can also be used for any other distributed cache configuration). Results for the Mediabench benchmark suite demonstrate the effectiveness of such techniques. In particular, the DDGT solution increases the proportion of local accesses by 16% compared to MDC, and stall time is reduced by 32% since load instructions can be freely scheduled in any cluster However the MDC solution reduces compute time and it often outperforms the former. Finally the impact of both techniques on an architecture with attraction buffers is studied and evaluated.

Keywords :

cache storage; interleaved storage; parallel architectures; parallel memories; processor scheduling; synchronisation; Local Scheduling Techniques; Mediabench benchmark suite; Memory Coherence; aliased memory instructions; architecture; attraction buffers; cache memory; clustering; data dependence graph; distributed data cache; fully distributed architectures; functional units; load instructions; load-store synchronization; local accesses; memory dependent chains; memory instructions; register file; serialization; store replication; transformations; wire delays; word-interleaved cache clustered VLIW processor; Algorithm design and analysis; Distributed computing; Electronic mail; Hardware; Interleaved codes; Memory architecture; Processor scheduling; Registers; Scheduling algorithm; VLIW;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Code Generation and Optimization, 2003. CGO 2003. International Symposium on

Print_ISBN :

0-7695-1913-X

Type :

conf

DOI :

10.1109/CGO.2003.1191545

Filename :

1191545

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3355026