Algorithm and architecture for a high density, low power scalar product macrocell

Author

Gu, J. ; Chang, C.-H. ; Yeo, K.-S.

Author_Institution

Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore

Volume

151

Issue

2

fYear

2004

fDate

3/19/2004 12:00:00 AM

Firstpage

161

Lastpage

172

Abstract

The authors present a design approach for an arithmetic macrocell that computes the scalar product of two vectors, an operation ubiquitously present in the solution of many communications and digital signal processing problems. The core of the proposed architecture is a full combinational design containing a partial product generator, a partial product accumulator and a vector accumulator. The design addresses the competing optimisation goals of VLSI area, power dissipation and latency in the deep submicron regime. Compared with conventional merged arithmetic architectures, the proposed macrocell design represents a substantial improvement in the VLSI layout with little area wastage, a high degree of regularity and a good scalability for different vector lengths and operand widths. A theoretical analysis shows that the design of a 16-bit scalar product multiplier for input vectors with 16 elements, in comparison with traditionally designed architecture, achieves a saving of 38.6% in the silicon area, an up to 73% increase in the area usage efficiency and a 29.4% saving in the interconnect delay. Post-layout simulations of the proposed circuit, based on a 0.18 μm CMOS process, show an average power dissipation of 64.96 mW and a latency of 6.92 ns at a standard supply voltage of 1.8 V, a superior performance for a single cycle instruction in a high-speed, low voltage 16-bit digital signal processor operating at 144 MHz. The use of shorter interconnects and more equalised interconnect delays, leads to the power dissipation and delay incurred by the interconnects being substantially reduced. Post-layout simulation of our proposed circuit at supply voltages ranging from 0.7 to 3.3 V shows a significant power reduction of 6 to 13% over the pre-layout simulation results of the conventional design.

Keywords

VLSI; circuit optimisation; circuit simulation; delay estimation; digital arithmetic; digital signal processing chips; integrated circuit design; integrated logic circuits; logic simulation; multiplying circuits; 16-bit scalar product multiplier; CMOS process; VLSI area; arithmetic macrocell design; combinational design; deep submicron regime; digital signal processing problems; interconnect delays; merged arithmetic architectures; optimisation goals; partial product accumulator; partial product generator; postlayout simulations; power dissipation; prelayout simulation; single cycle instruction; supply voltages; vector accumulator;

fLanguage

English

Journal_Title

Computers and Digital Techniques, IEE Proceedings -

Publisher

iet

ISSN

1350-2387

Type

jour

DOI

10.1049/ip-cdt:20040328

Filename

1274033