مرکز منطقه ای اطلاع رساني علوم و فناوري - GRAPE-MPs: Implementation of an SIMD for Quadruple/Hexuple/Octuple-Precision Arithmetic Operation on a Structured ASIC and an FPGA

DocumentCode :

2255002

Title :

GRAPE-MPs: Implementation of an SIMD for Quadruple/Hexuple/Octuple-Precision Arithmetic Operation on a Structured ASIC and an FPGA

Author :

Nakasato, N. ; Daisaka, H. ; Fukushige, Takashi ; Kawai, A. ; Makino, J. ; Ishikawa, Takaaki ; Yuasa, F.

Author_Institution :

Univ. of Aizu, Aizu-Wakamatsu, Japan

fYear :

2012

fDate :

20-22 Sept. 2012

Firstpage :

Lastpage :

Abstract :

We describe the design and performance of the GRAPE-MPs, a series of SIMD accelerator boards for quadruple/hexuple/octuple-precision arithmetic operations. Basic design of GRAPE-MPs is that it consists of a number of processing elements (PE) and memory components which handle data with quadruple/hexuple/octuple-precision. A GRAPE-MPs processor is implemented on a structured ASIC chip and an FPGA chip. GRAPE-MP (quadruple-precision) uses a structured ASIC chip from eASIC corp., which has 6 PE and operates with 100MHz clock cycle. The theoretical peak quadruple-precision performance of the single board is 1.2 Gflops and the achieved performance for the Feynman loop integrals is about 0.5 Gflops. GRAPE-MP4/6/8 (quadruple/hexuple/octuple-precision) uses an FPGA chip from Aletra corporation. For example, in the current implementation, MP8 has 10 PE with 70MHz operation clock cycle. We also present the performance results with the multiple GRAPE-MPs boards. The achieved performance of four MP8 boards is about 1.6 Gflops. It is roughly 90 times faster than the performance of a single core of a CPU with comparable precision. We show that our hardware based approach to evaluate the Feynman loop integrals in high precision arithmetic operations is highly effective.

Keywords :

application specific integrated circuits; digital arithmetic; field programmable gate arrays; parallel processing; FPGA; FPGA chip; Feynman loop integrals; GRAPE-MP processor; PE; SIMD accelerator boards; computer speed 0.5 GFLOPS; computer speed 1.2 GFLOPS; computer speed 1.6 GFLOPS; eASIC corp; frequency 100 MHz; hardware based approach; memory components; processing elements; quadruple-hexuple-octuple-precision arithmetic operation; structured ASIC chip; Application specific integrated circuits; Clocks; Computer architecture; Field programmable gate arrays; Pipelines; Process control; Registers;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Embedded Multicore Socs (MCSoC), 2012 IEEE 6th International Symposium on

Conference_Location :

Aizu-Wakamatsu

Print_ISBN :

978-1-4673-2535-6

Electronic_ISBN :

978-0-7695-4800-5

Type :

conf

DOI :

10.1109/MCSoC.2012.31

Filename :

6354681

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2255002