Title :
Mechanical Derivation of Fused Multiply–Add Algorithms for Linear Transforms
Author :
Voronenko, Yevgen ; Püschel, Markus
Author_Institution :
Carnegie Mellon Univ., Pittsburgh
Abstract :
Several computer architectures offer fused multiply-add (FMA), also called multiply-and-accumulate (MAC) instructions, that are as fast as a single addition or multiplication. For the efficient implementation of linear transforms, such as the discrete Fourier transform or discrete cosine transforms, this poses a challenge to algorithm developers as standard transform algorithms have to be manipulated into FMA algorithms that make optimal use of FMA instructions. We present a general method to convert any transform algorithm into an FMA algorithm. The method works with both algorithms given as directed acyclic graphs (DAGs) and algorithms given as structured matrix factorizations. We prove bounds on the efficiency of the method. In particular, we show that it removes all single multiplications except at most as many as the transform has outputs. We implemented the DAG-based version of the method and show that we can generate many of the best-known hand-derived FMA algorithms from the literature as well as a few novel FMA algorithms.
Keywords :
digital arithmetic; directed graphs; discrete Fourier transforms; discrete cosine transforms; instruction sets; matrix decomposition; optimising compilers; FMA algorithm mechanical derivation; FMA optimization; automatic program generation; computer architectures; directed acyclic graphs; discrete Fourier transform; discrete cosine transforms; fused multiply-add algorithms; linear transforms; multiply-and-accumulate instructions; structured matrix factorizations; Computer aided instruction; Computer architecture; Costs; Discrete Fourier transforms; Discrete cosine transforms; Discrete transforms; Fourier transforms; Hardware; Matrix converters; Standards development; Automatic program generation; discrete Fourier transform (DFT); discrete cosine transform (DCT); fast algorithm; implementation; multiply and accumulate (MAC); multiply-and-accumulate (MAC) instruction;
Journal_Title :
Signal Processing, IEEE Transactions on
DOI :
10.1109/TSP.2007.896116