• DocumentCode
    960048
  • Title

    On the Precision Attainable with Various Floating-Point Number Systems

  • Author

    Brent, Richard P.

  • Author_Institution
    Mathematical Sciences Department, IBM T. J. Watson Research Center, Yorktown Heights, N.Y. 10598.; Computer Centre, Australian National University, Canberra, A.C.T., Australia.
  • Issue
    6
  • fYear
    1973
  • fDate
    6/1/1973 12:00:00 AM
  • Firstpage
    601
  • Lastpage
    607
  • Abstract
    For scientific computations on a digital computer the set of real numbers is usually approximated by a finite set F of ``floating-point´´ numbers. We compare the numerical accuracy possible with different choices of F having approximately the same range and requiring the same word length. In particular, we compare different choices of base (or radix) in the usual floating-point systems. The emphasis is on the choice of F, not on the details of the number representation or the arithmetic, but both rounded and truncated arithmetic are considered. Theoretical results are given, and some simulations of typical floating-point computations (forming sums, solving systems of linear equations, finding eigenvalues) are described. If the leading fraction bit of a normalized base-2 number is not stored explicitly (saving a bit), and the criterion is to minimize the mean square roundoff error, then base 2 is best. If unnormalized numbers are allowed, so the first bit must be stored explicitly, then base 4 (or sometimes base 8) is the best of the usual systems.
  • Keywords
    Computational modeling; Eigenvalues and eigenfunctions; Equations; Floating-point arithmetic; Roundoff errors; Scientific computing; Base; floating-point arithmetic; radix; representation error; rms error; rounding error; simulation;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.1973.5009113
  • Filename
    5009113