Abstract :
This paper analyses the performance of the state-of-the-art media ISA (instruction set architecture) extensions in a general-purpose processor, when executing a video encoder based on an affine motion model. In addition to SIMD (single instruction multiple data) fixed-point instructions, these ISA extensions include SIMD floating-point instructions, special-purpose SIMD fixed-point instructions, and cacheability control instructions. In this study, eight time-consuming kernels of the video encoder were hand-optimized, using instructions in all four instruction categories of these media ISA extensions (the FLP version). These kernels were also hand-optimized using only SIMD fixed-point ISA extensions, without special-purpose instructions (the FXP version). For the FLP version, this study resulted in an average kernel-level speedup of 1.37X and an application-level speedup of 1.11X, compared to the FXP version, and an application-level speedup of 3.41X, compared to the C version
Keywords :
general purpose computers; instruction sets; motion compensation; motion estimation; parallel processing; video codecs; video coding; C version; FXP version; SIMD fixed-point ISA extensions; SIMD floating-point instructions; advanced video codec; affine motion model; application-level speedup; average kernel-level speedup; cacheability control instructions; general-purpose processor; hand-optimized kernels; instruction set architecture; media ISA extensions; motion compensation; motion estimation; performance analysis; single instruction multiple data; special-purpose SIMD fixed-point instructions; video encoder; Digital signal processing; Hardware; High performance computing; Instruction sets; Kernel; Motion analysis; Performance analysis; Prefetching; Video codecs; Video coding;