مرکز منطقه ای اطلاع رساني علوم و فناوري - Accurate floating-point operation using controlled floating-point precision

DocumentCode :

3489035

Title :

Accurate floating-point operation using controlled floating-point precision

Author :

Zaki, A.M. ; Bahaa-Eldin, A.M. ; El-Shafey, M.H. ; Aly, G.M.

Author_Institution :

Dept. of Comput. & Syst. Eng., Ain Shams Univ., Cairo, Egypt

fYear :

2011

fDate :

23-26 Aug. 2011

Firstpage :

696

Lastpage :

701

Abstract :

Rounding and accumulation of errors when using floating point numbers are important factors in computer arithmetic. Many applications suffer from these problems. The underlying machine architecture and representation of floating point numbers play the major role in the level and value of errors in this type of calculations. A quantitative measure of a system error level is the machine epsilon. In the current representation of floating point numbers, the machine epsilon can be as small as 9.63E-35 in the 128 bit version of IEEE standard floating point representation system. In this work a novel solution that guarantees achieving the desired minimum error regardless of the machine architecture is presented. The proposed model can archive a machine epsilon of about 4.94E-324. A new representation model is given and a complete arithmetic system with basic operations is presented. The accuracy of the proposed method is verified by inverting a high order, Hilbert matrix, an ill-conditioned matrix that cannot be solved in the traditional floating point standard. Finally some comparisons are given.

Keywords :

Hilbert transforms; IEEE standards; floating point arithmetic; matrix algebra; Hilbert matrix; IEEE standard floating point representation system; computer arithmetic; floating point numbers; ill-conditioned matrix; machine architecture; machine epsilon; minimum error; word length 128 bit; Accuracy; IEEE standards; Linear systems; MATLAB; Manganese; Measurement uncertainty; Software algorithms; Hilbert matrix; accurate multiplication; accurate sum; dot-Product; floating-point; ill-conditioned matrix; machine-epsilon; relative error;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communications, Computers and Signal Processing (PacRim), 2011 IEEE Pacific Rim Conference on

Conference_Location :

Victoria, BC

ISSN :

1555-5798

Print_ISBN :

978-1-4577-0252-5

Electronic_ISBN :

1555-5798

Type :

conf

DOI :

10.1109/PACRIM.2011.6032978

Filename :

6032978

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3489035