• DocumentCode
    1220325
  • Title

    Architecture and implementation of a vector/SIMD multiply-accumulate unit

  • Author

    Danysh, Albert ; Tan, Dimitri

  • Author_Institution
    Freescale Semicond., Austin, TX, USA
  • Volume
    54
  • Issue
    3
  • fYear
    2005
  • fDate
    3/1/2005 12:00:00 AM
  • Firstpage
    284
  • Lastpage
    293
  • Abstract
    This work presents 64-bit fixed-point vector multiply-accumulator (MAC) architecture capable of supporting multiple precisions. The vector MAC can perform one 64×64, two 32×32, four 16×16, or eight 8×8 bit signed/unsigned multiply using essentially the same hardware as a scalar 64-bit MAC and with only a small increase in delay. The scalar MAC architecture is "vectorized" by inserting mode-dependent multiplexing into the partial product generation and by inserting mode-dependent kills in the carry chain of the reduction tree and the final carry-propagate adder. This is an example of "shared segmentation" in which the existing scalar structure is segmented and then shared between vector modes. The vector MAC is area efficient and can be fully pipelined, which makes it suitable for high-performance processors and, possibly, dynamically reconfigurable processors. The "shared segmentation" method is compared to an alternative method, referred to as the "shared subtree" method, by implementing vector MAC designs using two different technologies and three different vector widths.
  • Keywords
    adders; carry logic; parallel architectures; 64 bit; SIMD; VLSI; data-path design; final carry-propagate adder; fixed-point vector multiply-accumulator; high-speed arithmetic; multimedia; multiplier; reduction tree; shared segmentation; Delay; Fixed-point arithmetic; Hardware; Helium; Multiplexing; Very large scale integration;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2005.41
  • Filename
    1388193