• DocumentCode
    3613052
  • Title

    An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications

  • Author

    Park, Seong-Wook ; Park, Junyoung ; Bong, Kyeongryeol ; Shin, Dongjoo ; Lee, Jinmook ; Choi, Sungpill ; Yoo, Hoi-Jun

  • Author_Institution
    Department of Electrical Engineering, KAIST, Daejeon, Korea (the Republic of)
  • Volume
    9
  • Issue
    6
  • fYear
    2015
  • Firstpage
    838
  • Lastpage
    848
  • Abstract
    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm \\times 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07\\times higher than the state-of-the-art.
  • Keywords
    Big data; Convolution; Graphics processing units; Machine learning; Neurons; Parallel architectures; Pattern recognition; Convolutional deep belief networks; deep inference; deep learning; fog computing; semi-supervised learning;
  • fLanguage
    English
  • Journal_Title
    Biomedical Circuits and Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1932-4545
  • Type

    jour

  • DOI
    10.1109/TBCAS.2015.2504563
  • Filename
    7384530