DocumentCode :
1995771
Title :
Latency and bandwidth efficient communication through system customization for embedded multiprocessors
Author :
Yu, Chenjie ; Petrov, Peter
Author_Institution :
Maryland Univ., College Park, MD
fYear :
2008
fDate :
8-13 June 2008
Firstpage :
766
Lastpage :
771
Abstract :
We present a cross-layer customization methodology for latency and bandwidth efficient inter-core communication in embedded multiprocessors. The methodology integrates compiler, operating system, and hardware support to achieve a bandwidth efficient, snoop- free, and coherence cache miss-free shared memory communication between synchronized producer and consumers cores. A compiler- driven code transformation is introduced that utilizes a simple ISA support in the form of a special write-through store instruction. It ensures that producer writes are propagated to the consumers with a single bus transaction per cache block when the producer performs the last write to that cache line before exiting its synchronization region. Information regarding the shared buffers involved in the communications is captured by the OS and provided to the cores with the purpose of filtering bus traffic and performing remote updates when necessary. The end result of the proposed methodology is a single bus transaction per shared cache block and snoop-free communication between a producer and a set of consumers with no intervening coherence misses on the consumer caches. Our experiments demonstrate the significant reductions in both bus traffic and cache misses for a set of multiprocessor benchmarks.
Keywords :
buffer circuits; cache storage; embedded systems; multiprocessing systems; operating systems (computers); program compilers; synchronisation; system buses; bandwidth efficient inter-core communication; bus traffic filtering; coherence cache miss-free shared memory communication; compiler; compiler- driven code transformation; cross-layer customization methodology; operating system; shared cache block; single bus transaction per cache block; snoop-free communication; synchronization region; write-through store instruction; Bandwidth; Broadcasting; Delay; Educational institutions; Hardware; Information filtering; Information filters; Instruction sets; Operating systems; Protocols; Embedded Multiprocessor; Snoop Protocol;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE
Conference_Location :
Anaheim, CA
ISSN :
0738-100X
Print_ISBN :
978-1-60558-115-6
Type :
conf
Filename :
4555922
Link To Document :
بازگشت