• DocumentCode
    3489035
  • Title

    Accurate floating-point operation using controlled floating-point precision

  • Author

    Zaki, A.M. ; Bahaa-Eldin, A.M. ; El-Shafey, M.H. ; Aly, G.M.

  • Author_Institution
    Dept. of Comput. & Syst. Eng., Ain Shams Univ., Cairo, Egypt
  • fYear
    2011
  • fDate
    23-26 Aug. 2011
  • Firstpage
    696
  • Lastpage
    701
  • Abstract
    Rounding and accumulation of errors when using floating point numbers are important factors in computer arithmetic. Many applications suffer from these problems. The underlying machine architecture and representation of floating point numbers play the major role in the level and value of errors in this type of calculations. A quantitative measure of a system error level is the machine epsilon. In the current representation of floating point numbers, the machine epsilon can be as small as 9.63E-35 in the 128 bit version of IEEE standard floating point representation system. In this work a novel solution that guarantees achieving the desired minimum error regardless of the machine architecture is presented. The proposed model can archive a machine epsilon of about 4.94E-324. A new representation model is given and a complete arithmetic system with basic operations is presented. The accuracy of the proposed method is verified by inverting a high order, Hilbert matrix, an ill-conditioned matrix that cannot be solved in the traditional floating point standard. Finally some comparisons are given.
  • Keywords
    Hilbert transforms; IEEE standards; floating point arithmetic; matrix algebra; Hilbert matrix; IEEE standard floating point representation system; computer arithmetic; floating point numbers; ill-conditioned matrix; machine architecture; machine epsilon; minimum error; word length 128 bit; Accuracy; IEEE standards; Linear systems; MATLAB; Manganese; Measurement uncertainty; Software algorithms; Hilbert matrix; accurate multiplication; accurate sum; dot-Product; floating-point; ill-conditioned matrix; machine-epsilon; relative error;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications, Computers and Signal Processing (PacRim), 2011 IEEE Pacific Rim Conference on
  • Conference_Location
    Victoria, BC
  • ISSN
    1555-5798
  • Print_ISBN
    978-1-4577-0252-5
  • Electronic_ISBN
    1555-5798
  • Type

    conf

  • DOI
    10.1109/PACRIM.2011.6032978
  • Filename
    6032978