مرکز منطقه ای اطلاع رساني علوم و فناوري - 4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications

DocumentCode :

2061507

Title :

4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications

Author :

Seongwook Park ; Kyeongryeol Bong ; Dongjoo Shin ; Jinmook Lee ; Sungpill Choi ; Hoi-Jun Yoo

Author_Institution :

KAIST, Daejeon, South Korea

fYear :

2015

fDate :

22-26 Feb. 2015

Firstpage :

Lastpage :

Abstract :

Recently, deep learning (DL) has become a popular approach for big-data analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D image and motion recognition use DL due to its best-in-class recognition accuracy. There are 2 types of DL: supervised DL with labeled data and unsupervised DL with unlabeled data. With unsupervised DL, most of learning time is spent in massively iterative weight updates for a restricted Boltzmann machine [2]. For a -100MB training dataset, >100 TOP computational capability and ~40GB/s IO and SRAM data bandwidth is required. So, a 3.4GHz CPU needs >10 hours learning time with a -100K input-vector dataset and takes ~1 second for recognition, which is far from real-time processing. Thus, DL is typically done using cloud servers or high-performance GPU environments with learning-on-server capability. However, the wide use of smart portable devices, such as smartphones and tablets, results in many applications which need big-data processing with machine learning, such as tagging private photos in personal devices. A high-performance and energy-efficient DL/DI (deep inference) processor is required to realize user-centric pattern recognition in portable devices.

Keywords :

Big Data; Boltzmann machines; SRAM chips; graphics processing units; image retrieval; inference mechanisms; learning (artificial intelligence); parallel architectures; Boltzmann machine; CPU; GPU environment; IO; SRAM; big data applications; frequency 3.4 GHz; image retrieval; inference processor; input vector dataset; iterative weight update; labeled data; learning-on-server capability; machine learning; personal devices; private photos tagging; scalable deep learning processor; smart portable devices; supervised DL; tetra-parallel MIMD architecture; unlabeled data; unsupervised DL; user centric pattern recognition; Bandwidth; Multicore processing; Parallel processing; Pipelines; Program processors; Scalability;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Solid- State Circuits Conference - (ISSCC), 2015 IEEE International

Conference_Location :

San Francisco, CA

Print_ISBN :

978-1-4799-6223-5

Type :

conf

DOI :

10.1109/ISSCC.2015.7062935

Filename :

7062935

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2061507