Title :
Correctly Rounded Multiplication by Arbitrary Precision Constants
Author :
Brisebarre, Nicolas ; Muller, Jean-Michel
Author_Institution :
Univ. Jean Monnet, Saint-Etienne
Abstract :
We introduce an algorithm for multiplying a floating-point number x by a constant C that is not exactly representable in floating-point arithmetic. Our algorithm uses a multiplication and a fused multiply and add instruction. Such instructions are available in some modern processors such as the IBM Power PC and the Intel/HP Itanium. We give three methods for checking whether, for a given value of C and a given floating-point format, our algorithm returns a correctly rounded result for any x. When it does not, some of our methods return all of the values x for which the algorithm fails. The three methods are complementary: The first two do not always allow one to conclude, yet they are simple enough to be used at compile time, while the third one always either proves that our algorithm returns a correctly rounded result for any x or gives all of the counterexamples. We generalize our study to the case where a wider internal format is used for the intermediate calculations, which gives a fourth method. Our programs and some additional information (such as the case where an arbitrary nonbinary even radix is used), as well as examples of runs of our programs, can be downloaded from http://perso.ens-lyon.fr/iean-michel.muller/MultConstant.html.
Keywords :
floating point arithmetic; matrix multiplication; arbitrary precision constants; correctly rounded multiplication; floating-point arithmetic; fused multiply-add instruction; Costs; Digital arithmetic; Fast Fourier transforms; Floating-point arithmetic; Polynomials; Roundoff errors; computer arithmetic; floating-point arithmetic;
Journal_Title :
Computers, IEEE Transactions on
DOI :
10.1109/TC.2007.70813