Title :
Modelling a fast BLAS level-1 inspired vectorized FPU for ARM devices
Author :
Vigliar, M. ; Raiconi, G. ; D´Auria, Amedeo ; Del Mastro, Giuseppe
Author_Institution :
Univ. degli Studi di Salerno, Salerno, Italy
Abstract :
Modern collections of algorithms for DSP and multimedia often rely on linear algebra operators to perform massive numerical transformations on vectorized data. Embedded developers often experience the worst condition of having no FPU at all in their low-power systems, as many device producers consider FP-math as an expensive option in terms of gates and power consumption. Main aim of this work is to propose a lightweight structure, designed to be used in an ARM-based environment but easily retargetable to different architectures, capable to perform efficiently vectorized FP operations as described in BLAS Level 1 specification. Peculiar feature is the capability of such a coprocessor to work in a fully pipelined mode. Both single and double precision calculations can be performed. Many different CPU offloading techniques have been implemented, in order to enable reactive power management policies during idle/waiting time slices. An implementation in VHDL is presented as result, showing synthesis and placement results in different technologies. FPGA+ARM9 prototype is presented and benchmarked. Results have been compared to functionally equivalent solutions running in different environments and using different sets of processing primitives (up to x86´s SSE2/3/4). Finally, a complex application for Hidden Markov Model (HMM) training and evaluation is used as test case to evaluate real-world performance of the proposed approach.
Keywords :
coprocessors; digital signal processing chips; embedded systems; field programmable gate arrays; floating point arithmetic; hardware description languages; hidden Markov models; linear algebra; low-power electronics; microcontrollers; multiprocessing systems; pipeline arithmetic; power aware computing; ARM devices; CPU offloading technique; DSP; FPGA-ARM9 prototype; HMM training; VHDL; coprocessor; double precision calculation; embedded developers; fast BLAS level-1 inspired vectorized FPU modelling; floating-point operation; hidden Markov model; linear algebra operator; low-power system; numerical transformation; pipeline processing; power consumption; reactive power management; single precision calculation; Clocks; Coprocessors; Hidden Markov models; Performance evaluation; Switches; Timing; ARM9; BLAS; Floating-point; architectures; context switching time; coprocessing; vector processors;
Conference_Titel :
Circuits and Systems (MWSCAS), 2011 IEEE 54th International Midwest Symposium on
Conference_Location :
Seoul
Print_ISBN :
978-1-61284-856-3
Electronic_ISBN :
1548-3746
DOI :
10.1109/MWSCAS.2011.6026644