Abstract :
With current FPGAs, designers can now instantiate several embedded processors, memory units, and a wide variety of IP blocks to build a single-chip, high-performance multiprocessor embedded system. Furthermore, multi-FPGA systems can be built to provide massive parallelism given an efficient programming model. In this paper, we present a lightweight subset implementation of the standard message-passing interface, MPI, that is suitable for embedded processors. It does not require an operating system and uses a small memory footprint. With our MPI implementation (TMD-MPI), we provide a programming model capable of using multiple-FPGAs that hides hardware complexities from the programmer, facilitates the development of parallel code and promotes code portability. To enable intra-FPGA and inter-FPGA communications, a simple network-on-chip is also developed using a low overhead network packet protocol. Together, TMD-MPI and the network provide a homogeneous view of a cluster of embedded processors to the programmer. Performance parameters such as link latency, link bandwidth, and synchronization cost are measured by executing a set of microbenchmarks
Keywords :
application program interfaces; embedded systems; field programmable gate arrays; message passing; multiprocessing systems; network-on-chip; transport protocols; TMD-MPI; code portability; embedded processors; inter-FPGA communications; intra-FPGA communications; message-passing interface; multiple FPGA; multiple processors; network packet protocol; network-on-chip; parallel code; programming model; Bandwidth; Delay; Embedded system; Field programmable gate arrays; Hardware; Network-on-a-chip; Operating systems; Parallel programming; Programming profession; Protocols;