Title :
Design and implementation of a modular, low latency, fault-aware, FPGA-based network interface
Author :
Ammendola, Roberto ; Biagionil, Andrea ; Frezza, Ottorino ; Cicero, Francesca Lo ; Lonardo, Alessandro ; Paolucci, Pier Stanislao ; Rossetti, Davide ; Simula, Francesco ; Tosoratto, Laura ; Vicini, Piero
Abstract :
We describe the hands-on experience in developing a network-centric IP core supporting the RDMA protocol which is the engine of an FPGA-based PCIe NIC targeted for GPU-accelerated HPC clusters with a 3D-toroidal network topology. We report on different development areas related to our IP: the optimizations required to evolve the NIC to the current performance level (highlights of this work include the development of a RDMA engine with a dedicated translation-lookaside-buffer and a first-of-its-kind IP module that exploits the peer-to-peer protocol of NVIDIA GPUs); the addition of a component called LO|FA|MO IP that provides systemic fault-awareness to the network; the modifications to the core IP to turn it into low-latency interface called NaNet between a read-out board and a GPU farm in the data acquisition system of the low level trigger of a particle-physics experiment. Taking into account the forecast evolution of the FPGA platform (28 nm, PCIe Gen3, etc.), we conclude with future directions we envision for our IP.
Keywords :
fault tolerant computing; field programmable gate arrays; graphics processing units; optimisation; 3D-toroidal network topology; FPGA based network interface; FPGA-based PCIe NIC; GPU accelerated HPC clusters; GPU farm; IP core; NVIDIA GPU; RDMA engine; RDMA protocol; data acquisition system; network centric IP core; read out board; Bandwidth; Graphics processing units; Peer-to-peer computing; Ports (Computers); Protocols; Random access memory; Routing;
Conference_Titel :
Reconfigurable Computing and FPGAs (ReConFig), 2013 International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4799-2078-5
DOI :
10.1109/ReConFig.2013.6732275