Title :
Accelerating householder bidiagonalization with ARM NEON technology
Author :
Wenjun Yang ; Zhenyu Liu
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Abstract :
Householder bidiagonalization is the first step of Singular Value Decomposition (SVD) - an important algorithm in numerical linear algebra that is widely used in video processing. NEON is a general-purpose Single Instruction Multiple Data (SIMD) engine introduced in ARMv7 architecture, which is targeted to accelerate multimedia and signal processing on mobile platforms. In this paper, we propose a NEON-based implementation and optimization of Householder bidiagonalization, aiming at testifying the potential of NEON to handle with low-dimensional macroblocks if applied to future computing-intensive video codecs. Intrinsics and inline assembly, two most commonly used ways to utilize NEON, are compared in performance. Solutions to the problem of leftover elements in vectorization is also discussed. Our study finally shows that with hand-coded inline assembly and all kinds of optimization, our NEON implementation of Householder bidiagonlization will gain a speedup of 2.3 over the plain C version.
Keywords :
instruction sets; matrix algebra; microcontrollers; mobile computing; multimedia communication; optimisation; parallel processing; singular value decomposition; video coding; ARM NEON technology; ARMv7 architecture; SIMD engine; SVD; computing-intensive video codecs; householder bidiagonalization; low-dimensional macroblocks; mobile platform; multimedia processing; numerical linear algebra; single instruction multiple data engine; singular value decomposition; video signal processing; Acceleration; Assembly; Computer architecture; Registers; Signal processing algorithms; Vectors;
Conference_Titel :
Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
Conference_Location :
Hollywood, CA
Print_ISBN :
978-1-4673-4863-8