Abstract :
The authors designed an accelerator architecture for large-scale neural networks, with an emphasis on the impact of memory on accelerator design, performance, and energy. In this article, they present a concrete design at 65 nm that can perform 496 16-bit fixed-point operations in parallel every 1.02 ns, that is, 452 gop/s, in a 3.02mm2, 485-mw footprint (excluding main memory accesses).
Keywords :
neural nets; accelerator architecture; accelerator design; high-throughput neural network accelerator; size 65 nm; Accelerators; Artificial neural networks; Computer architecture; Graphics processing units; Machine learning; Market research; Neural networks; hardware accelerator; machine learning; neural network;