A coarse-grained reconfigurable processing unit (RPU) consisting of
multi-functional processing elements (PEs) interconnected by an area-efficient line-switched mesh connect (LSMC) routing is implemented on a
die in TSMC 65 nm LP1P8M CMOS technology. A hierarchical configuration context (HCC) organization scheme is proposed to reduce the implementation overhead and the energy dissipation spent on fast reconfiguration. The proposed RPU is integrated into two system-on-a-chips (SoCs), targeting multiple-standard video decoding. The high-performance chip, comprising two RPU processors (named REMUS_HPP), can decode
H.264 video streams at 30 frames per second (fps) under 200 MHz. REMUS_HPP achieves a 25% performance gain over the XPP-III reconfigurable processor with only 280 mW power consumption, resulting in a
improvement on energy efficiency. The other chip (named REMUS_LPP), targeting low power applications, integrates only one RPU processor. REMUS_LPP can decode
H.264 video streams at 35fps with 24.5 mW under 75 MHz, achieving a 76% reduction in power dissipation and a
improvement on energy efficiency compared with the ADRES reconfigurable processor.