DocumentCode
927300
Title
Algorithm and architecture for a high density, low power scalar product macrocell
Author
Gu, J. ; Chang, C.-H. ; Yeo, K.-S.
Author_Institution
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
Volume
151
Issue
2
fYear
2004
fDate
3/19/2004 12:00:00 AM
Firstpage
161
Lastpage
172
Abstract
The authors present a design approach for an arithmetic macrocell that computes the scalar product of two vectors, an operation ubiquitously present in the solution of many communications and digital signal processing problems. The core of the proposed architecture is a full combinational design containing a partial product generator, a partial product accumulator and a vector accumulator. The design addresses the competing optimisation goals of VLSI area, power dissipation and latency in the deep submicron regime. Compared with conventional merged arithmetic architectures, the proposed macrocell design represents a substantial improvement in the VLSI layout with little area wastage, a high degree of regularity and a good scalability for different vector lengths and operand widths. A theoretical analysis shows that the design of a 16-bit scalar product multiplier for input vectors with 16 elements, in comparison with traditionally designed architecture, achieves a saving of 38.6% in the silicon area, an up to 73% increase in the area usage efficiency and a 29.4% saving in the interconnect delay. Post-layout simulations of the proposed circuit, based on a 0.18 μm CMOS process, show an average power dissipation of 64.96 mW and a latency of 6.92 ns at a standard supply voltage of 1.8 V, a superior performance for a single cycle instruction in a high-speed, low voltage 16-bit digital signal processor operating at 144 MHz. The use of shorter interconnects and more equalised interconnect delays, leads to the power dissipation and delay incurred by the interconnects being substantially reduced. Post-layout simulation of our proposed circuit at supply voltages ranging from 0.7 to 3.3 V shows a significant power reduction of 6 to 13% over the pre-layout simulation results of the conventional design.
Keywords
VLSI; circuit optimisation; circuit simulation; delay estimation; digital arithmetic; digital signal processing chips; integrated circuit design; integrated logic circuits; logic simulation; multiplying circuits; 16-bit scalar product multiplier; CMOS process; VLSI area; arithmetic macrocell design; combinational design; deep submicron regime; digital signal processing problems; interconnect delays; merged arithmetic architectures; optimisation goals; partial product accumulator; partial product generator; postlayout simulations; power dissipation; prelayout simulation; single cycle instruction; supply voltages; vector accumulator;
fLanguage
English
Journal_Title
Computers and Digital Techniques, IEE Proceedings -
Publisher
iet
ISSN
1350-2387
Type
jour
DOI
10.1049/ip-cdt:20040328
Filename
1274033
Link To Document