DocumentCode :
3605845
Title :
On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices
Author :
Jae-sun Seo ; Binbin Lin ; MinKyu Kim ; Pai-Yu Chen ; Kadetotad, Deepak ; Zihan Xu ; Mohanty, Abinash ; Vrudhula, Sarma ; Shimeng Yu ; Jieping Ye ; Yu Cao
Author_Institution :
Dept. of Electr., Arizona State Univ., Tempe, AZ, USA
Volume :
14
Issue :
6
fYear :
2015
Firstpage :
969
Lastpage :
979
Abstract :
Many recent advances in sparse coding led its wide adoption in signal processing, pattern classification, and object recognition applications. Even with improved performance in state-of-the-art algorithms and the hardware platform of CPUs/GPUs, solving a sparse coding problem still requires expensive computations, making real-time large-scale learning a very challenging problem. In this paper, we cooptimize algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding. The principle of hardware acceleration is to recognize the properties of learning algorithms, which involve many parallel operations of data fetch and matrix/vector multiplication/addition. Today´s von Neumann architecture, however, is not suitable for such parallelization, due to the separation of memory and the computing unit that makes sequential operations inevitable. Such principle drives both the selection of algorithms and the design evolution from CPU to CMOS application-specific integrated circuits (ASIC) to parallel architecture with resistive crosspoint array (PARCA) that we propose. The CMOS ASIC scheme implements sparse coding with SRAM dictionaries and all-digital circuits, and PARCA employs resistive-RAM dictionaries with special read and write circuits. We show that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by 394 and 2140×, respectively, compared to software running on a eight-core CPU. Simulated power for both hardware schemes lie in the milli-Watt range, making it viable for portable single-chip learning applications.
Keywords :
CMOS integrated circuits; SRAM chips; application specific integrated circuits; compressed sensing; learning (artificial intelligence); low-power electronics; resistive RAM; CMOS ASIC scheme; CMOS application-specific integrated circuits; CPU/GPU; PARCA; SRAM dictionaries; all-digital circuits; cooptimize algorithm; learning algorithms; object recognition; on-chip hardware acceleration; on-chip sparse learning acceleration; parallel architecture with resistive crosspoint array; pattern classification; read and write circuits; resistive-RAM dictionaries; signal processing; size 65 nm; sparse coding; synaptic devices; von Neumann architecture; Application specific integrated circuits; CMOS integrated circuits; Dictionaries; Hardware; Unsupervised learning; Very large scale integration; Crossbar array; VLSI; crossbar array; hardware acceleration; low power; resistive memory; sparse coding; unsupervised learning;
fLanguage :
English
Journal_Title :
Nanotechnology, IEEE Transactions on
Publisher :
ieee
ISSN :
1536-125X
Type :
jour
DOI :
10.1109/TNANO.2015.2478861
Filename :
7268884
Link To Document :
بازگشت