DocumentCode :
2255002
Title :
GRAPE-MPs: Implementation of an SIMD for Quadruple/Hexuple/Octuple-Precision Arithmetic Operation on a Structured ASIC and an FPGA
Author :
Nakasato, N. ; Daisaka, H. ; Fukushige, Takashi ; Kawai, A. ; Makino, J. ; Ishikawa, Takaaki ; Yuasa, F.
Author_Institution :
Univ. of Aizu, Aizu-Wakamatsu, Japan
fYear :
2012
fDate :
20-22 Sept. 2012
Firstpage :
75
Lastpage :
83
Abstract :
We describe the design and performance of the GRAPE-MPs, a series of SIMD accelerator boards for quadruple/hexuple/octuple-precision arithmetic operations. Basic design of GRAPE-MPs is that it consists of a number of processing elements (PE) and memory components which handle data with quadruple/hexuple/octuple-precision. A GRAPE-MPs processor is implemented on a structured ASIC chip and an FPGA chip. GRAPE-MP (quadruple-precision) uses a structured ASIC chip from eASIC corp., which has 6 PE and operates with 100MHz clock cycle. The theoretical peak quadruple-precision performance of the single board is 1.2 Gflops and the achieved performance for the Feynman loop integrals is about 0.5 Gflops. GRAPE-MP4/6/8 (quadruple/hexuple/octuple-precision) uses an FPGA chip from Aletra corporation. For example, in the current implementation, MP8 has 10 PE with 70MHz operation clock cycle. We also present the performance results with the multiple GRAPE-MPs boards. The achieved performance of four MP8 boards is about 1.6 Gflops. It is roughly 90 times faster than the performance of a single core of a CPU with comparable precision. We show that our hardware based approach to evaluate the Feynman loop integrals in high precision arithmetic operations is highly effective.
Keywords :
application specific integrated circuits; digital arithmetic; field programmable gate arrays; parallel processing; FPGA; FPGA chip; Feynman loop integrals; GRAPE-MP processor; PE; SIMD accelerator boards; computer speed 0.5 GFLOPS; computer speed 1.2 GFLOPS; computer speed 1.6 GFLOPS; eASIC corp; frequency 100 MHz; hardware based approach; memory components; processing elements; quadruple-hexuple-octuple-precision arithmetic operation; structured ASIC chip; Application specific integrated circuits; Clocks; Computer architecture; Field programmable gate arrays; Pipelines; Process control; Registers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Embedded Multicore Socs (MCSoC), 2012 IEEE 6th International Symposium on
Conference_Location :
Aizu-Wakamatsu
Print_ISBN :
978-1-4673-2535-6
Electronic_ISBN :
978-0-7695-4800-5
Type :
conf
DOI :
10.1109/MCSoC.2012.31
Filename :
6354681
Link To Document :
بازگشت