Title :
An Energy-Efficient Processor Core for Massively Parallel Computing
Author :
Yang, Qianming ; Wu, Nan ; Guan, Maolin ; Zhang, Chunyuan ; Cai, Jun
Author_Institution :
Comput. Sch., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
With the evolution of more sophisticated communication standards and algorithms, embedded applications exhibit demanding performance and efficiency requirements. Massively Parallel Computing based on many simple cores and few powerful cores is becoming the mainstream method of building high performance and low power processor. While aimed at the design of the simple core, this paper proposes an energy-efficient processor architecture named Smart Core. Following the idea of explicitly parallel and accurate computing, Smart Core uses exposed and non-deep pipeline to eliminate the pipeline registers and to reduce the cost of executing instructions. Multi-level data memory organization, consisted of streaming memory, multi-mode register file and fully distributed tiny operand register file, captures various data reuse and locality to reduce the cost of delivering data. To reduce the cost of delivering instructions, an asymmetric and fully distributed instruction register file is used to capture locality and reuse of instructions in a loop. Preliminary results show that Smart Core achieves an energy efficiency that is 25x greater than the traditional embedded RISC processor. When scaled to a 40nm CMOS technology, single chip multi-processor, consisted of many cores like Smart Core, is capable of providing more than 1TOPS performance while achieving efficiency of 100GOPS/W or more.
Keywords :
energy conservation; multiprocessing systems; parallel processing; pipeline processing; 1TOPS performance; CMOS technology; Smart Core; data locality; data reuse; distributed instruction register file; distributed tiny operand register file; embedded RISC processor; energy-efficient processor core architecture; mainstream method; massively parallel computing; multilevel data memory organization; multimode register file; nondeep pipeline; pipeline register elimination; single chip multiprocessor; size 40 nm; streaming memory; Computer architecture; Distributed databases; Kernel; Pipelines; Reduced instruction set computing; Registers; VLIW; Energy-Efficient; Smart Core; explicitly parallel; many cores;
Conference_Titel :
Computer Science & Service System (CSSS), 2012 International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4673-0721-5
DOI :
10.1109/CSSS.2012.582