DocumentCode :
738162
Title :
Fast Motion Estimation Algorithm and Design for Real Time QFHD High Efficiency Video Coding
Author :
Shiaw-Yu Jou ; Shan-Jung Chang ; Tian-Sheuan Chang
Author_Institution :
PixArt, Hsinchu, Taiwan
Volume :
25
Issue :
9
fYear :
2015
Firstpage :
1533
Lastpage :
1544
Abstract :
Motion estimation (ME) in the latest High Efficiency Video Coding standard adopts the quadtree coding structure and up to a 64 × 64 prediction unit (PU) size to improve the coding gain. However, these techniques also have serious design problems regarding the complexity, data dependency, external memory bandwidth, and on-chip buffer size compared with previous standards, especially for real-time ultrahigh-definition video coding. To solve these problems, this paper proposes an efficient ME design with a joint algorithm and architecture optimization. To reduce complexity, we propose a predictive integer ME (IME) algorithm that selects the most probable search directions and steps through a statistical analysis to reduce the number of search points by 90.5%. We also employ a PU size-dependent fractional ME (FME) algorithm to reduce the interpolation filtering by 62.4% compared with the reference software. To resolve the corresponding dependency, we cascade the IME and FME computations via interlaced scheduling and propose an early motion vector prediction candidate approach. We use this scheduling with a 16 × 16 processing unit to compute the partial matching cost of all PUs with the same 16 × 16 current block in an interlaced order and share their common reference block to reduce the on-chip buffer size and off-chip memory bandwidth. The bandwidth is further reduced by a cache with double Z scan indexed addressing to simplify the cache controller. Implementation with a Taiwan Semiconductor Manufacturing Company 90-nm CMOS process supports the real-time encoding of 4 K × 2 K at 60 frames/s operated at 270 MHz with 778.7k logic gates and 17.4 KB of on-chip memory.
Keywords :
CMOS integrated circuits; VLSI; cache storage; filtering theory; integrated circuit design; interpolation; logic gates; motion estimation; quadtrees; search problems; statistical analysis; video coding; IME algorithm; PU size-dependent fractional ME algorithm; Taiwan Semiconductor Manufacturing Company 90-nm CMOS process; cache controller; coding gain improvement; complexity reduction; data dependency; design problems; early motion vector prediction candidate approach; external memory bandwidth; fast motion estimation algorithm; interlaced scheduling; interpolation filtering reduction; joint algorithm optimization; joint architecture optimization; logic gates; most probable search directions; off-chip memory bandwidth reduction; on-chip buffer size reduction; partial matching cost computation; prediction unit size; predictive integer ME algorithm; processing unit; quadtree coding structure; real time QFHD high efficiency video coding; statistical analysis; Algorithm design and analysis; Bandwidth; Complexity theory; Encoding; Prediction algorithms; Standards; System-on-chip; HEVC; High Efficiency Video Coding (HEVC); Motion estimation; VLSI architecture; motion estimation (ME); very-large-scale integration (VLSI) architecture;
fLanguage :
English
Journal_Title :
Circuits and Systems for Video Technology, IEEE Transactions on
Publisher :
ieee
ISSN :
1051-8215
Type :
jour
DOI :
10.1109/TCSVT.2015.2389472
Filename :
7005433
Link To Document :
بازگشت