مرکز منطقه ای اطلاع رساني علوم و فناوري - Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT

DocumentCode :

1441388

Title :

Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT

Author :

Mohanty, Basant K. ; Meher, Pramod Kumar

Author_Institution :

Dept. of Electron. & Commun. Eng., Jaypee Univ. of Eng. & Technol., Guna, India

Volume :

Issue :

fYear :

2011

fDate :

5/1/2011 12:00:00 AM

Firstpage :

2072

Lastpage :

2084

Abstract :

In this paper, we present a modular and pipeline architecture for lifting-based multilevel 2-D DWT, without using line-buffer and frame-buffer. Overall area-delay product is reduced in the proposed design by appropriate partitioning and scheduling of the computation of individual decomposition-levels. The processing for different levels is performed by a cascaded pipeline structure to maximize the hardware utilization efficiency (HUE). Moreover, the proposed structure is scalable for high-throughput and area-constrained implementation. We have removed all the redundancies resulting from decimated wavelet filtering to maximize the HUE. The proposed design involves L pyramid algorithm (PA) units and one recursive pyramid algorithm (RPA) unit, where R=N/P , L=⌈log₄P̅ ⌉ and P is the input block size, M and N, respectively, being the height and width of the image. The entire multilevel DWT is computed by the proposed structure in MR cycles. The proposed structure has O(8R×2L) cycles of output latency, which is very small compared to the latency of the existing structures. Interestingly, the proposed structure does not require any line-buffer or frame-buffer, unlike the existing folded structures which otherwise require a line-buffer of size O(N) and frame-buffer of size O(M/2×N/2) for multilevel 2-D computation. Instead of those buffers, the proposed structure involves only local registers and RAM of size O(N). The saving of line-buffer and frame-buffer achieved by the proposed design is an important advantage, since the image size could very often be as large as 512 × 512. From the simulation results we find that, the proposed scalable structure offers better slice-delay-product (SDP) for higher throughput of implementation since the on-chip memory of this structure remains almost unchanged with input block size. It has 17% less SDP than the best of the corresponding existing structures on average, for different input-block sizes and image sizes. It involves 1.92 times more transistors, but offers 12.2 times higher throughput and consumes 52% less power per output (PPO) compared to the other, on average for different input sizes.

Keywords :

VLSI; discrete wavelet transforms; image coding; pipeline processing; random-access storage; RAM; cascaded pipeline structure; decimated wavelet filtering; discrete wavelet transform; frame-buffer; hardware utilization efficiency; line-buffer; memory efficient modular VLSI architecture; modular architecture; multilevel lifting 2D DWT; on-chip memory; recursive pyramid algorithm; slice-delay-product; Computer architecture; Delay; Discrete wavelet transforms; Hardware; Pipelines; System-on-a-chip; Throughput; 2-dimensional (2-D) DWT; Discrete wavelet transform (DWT); VLSI; lifting; systolic array;

fLanguage :

English

Journal_Title :

Signal Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1053-587X

Type :

jour

DOI :

10.1109/TSP.2011.2109953

Filename :

5706375

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1441388