• DocumentCode
    254728
  • Title

    A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

  • Author

    Gokhale, Vinayak ; Jonghoon Jin ; Dundar, Aysegul ; Martini, Ben ; Culurciello, Eugenio

  • Author_Institution
    Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
  • fYear
    2014
  • fDate
    23-28 June 2014
  • Firstpage
    696
  • Lastpage
    701
  • Abstract
    Deep networks are state-of-the-art models used for understanding the content of images, videos, audio and raw input data. Current computing systems are not able to run deep network models in real-time with low power consumption. In this paper we present nn-X: a scalable, low-power coprocessor for enabling real-time execution of deep neural networks. nn-X is implemented on programmable logic devices and comprises an array of configurable processing elements called collections. These collections perform the most common operations in deep networks: convolution, subsampling and non-linear functions. The nn-X system includes 4 high-speed direct memory access interfaces to DDR3 memory and two ARM Cortex-A9 processors. Each port is capable of a sustained throughput of 950 MB/s in full duplex. nn-X is able to achieve a peak performance of 227 G-ops/s, a measured performance in deep learning applications of up to 200 G-ops/s while consuming less than 4 watts of power. This translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.
  • Keywords
    coprocessors; neural nets; programmable logic devices; ARM Cortex-A9 processors; DDR3 memory; configurable processing elements; convolution operation; deep neural networks; desktop processors; memory access interface; mobile coprocessor; mobile processors; nn-X coprocessor; nonlinear function operation; power consumption; programmable logic devices; subsampling operation; Artificial neural networks; Convolution; Coprocessors; Memory management; Performance evaluation; Program processors; Computer vision; convolutional neural networks; embedded vision system; hardware acceleration; machine learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on
  • Conference_Location
    Columbus, OH
  • Type

    conf

  • DOI
    10.1109/CVPRW.2014.106
  • Filename
    6910056