DocumentCode :
2061507
Title :
4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications
Author :
Seongwook Park ; Kyeongryeol Bong ; Dongjoo Shin ; Jinmook Lee ; Sungpill Choi ; Hoi-Jun Yoo
Author_Institution :
KAIST, Daejeon, South Korea
fYear :
2015
fDate :
22-26 Feb. 2015
Firstpage :
1
Lastpage :
3
Abstract :
Recently, deep learning (DL) has become a popular approach for big-data analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D image and motion recognition use DL due to its best-in-class recognition accuracy. There are 2 types of DL: supervised DL with labeled data and unsupervised DL with unlabeled data. With unsupervised DL, most of learning time is spent in massively iterative weight updates for a restricted Boltzmann machine [2]. For a -100MB training dataset, >100 TOP computational capability and ~40GB/s IO and SRAM data bandwidth is required. So, a 3.4GHz CPU needs >10 hours learning time with a -100K input-vector dataset and takes ~1 second for recognition, which is far from real-time processing. Thus, DL is typically done using cloud servers or high-performance GPU environments with learning-on-server capability. However, the wide use of smart portable devices, such as smartphones and tablets, results in many applications which need big-data processing with machine learning, such as tagging private photos in personal devices. A high-performance and energy-efficient DL/DI (deep inference) processor is required to realize user-centric pattern recognition in portable devices.
Keywords :
Big Data; Boltzmann machines; SRAM chips; graphics processing units; image retrieval; inference mechanisms; learning (artificial intelligence); parallel architectures; Boltzmann machine; CPU; GPU environment; IO; SRAM; big data applications; frequency 3.4 GHz; image retrieval; inference processor; input vector dataset; iterative weight update; labeled data; learning-on-server capability; machine learning; personal devices; private photos tagging; scalable deep learning processor; smart portable devices; supervised DL; tetra-parallel MIMD architecture; unlabeled data; unsupervised DL; user centric pattern recognition; Bandwidth; Multicore processing; Parallel processing; Pipelines; Program processors; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Solid- State Circuits Conference - (ISSCC), 2015 IEEE International
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4799-6223-5
Type :
conf
DOI :
10.1109/ISSCC.2015.7062935
Filename :
7062935
Link To Document :
بازگشت