next up previous contents index
Next: How to Measure Errors Up: Sources of Error in Previous: Sources of Error in   Contents   Index


Further Details: Floating Point Arithmetic

Roundoff error is bounded in terms of the machine precision $\epsilon$ , which is the smallest value satisfying

\begin{displaymath}
\vert fl(a \oplus b ) - (a \oplus b ) \vert \leq \epsilon \cdot \vert a \oplus b \vert \; \; ,
\end{displaymath}

where a and b are floating-point numbers, $\oplus$ is any one of the four operations +, -, x and $\div$ , and $fl(a \oplus b)$ is the floating-point result of $a \oplus b$ . Machine epsilon, $\epsilon$ , is the smallest value for which this inequality is true for all $\oplus$ , and for all a and b such that $a \oplus b$ is neither too large (magnitude exceeds the overflow threshold) nor too small (is nonzero with magnitude less than the underflow threshold) to be represented accurately in the machine. We also assume $\epsilon$ bounds the relative error in unary operations like square root:

\begin{displaymath}
\vert fl( \sqrt{a} ) - (\sqrt{a} ) \vert \leq \epsilon \cdot \vert \sqrt{a} \vert \; .
\end{displaymath}

A precise characterization of $\epsilon$ depends on the details of the machine arithmetic and sometimes even of the compiler. For example, if addition and subtraction are implemented without a guard digit4.1we must redefine $\epsilon$ to be the smallest number such that

\begin{displaymath}
\vert fl(a \pm b ) - (a \pm b ) \vert \leq \epsilon \cdot ( \vert a\vert + \vert b\vert ).
\end{displaymath}

In order to assure portability, machine parameters such as machine epsilon, the overflow threshold and underflow threshold are computed at runtime by the auxiliary routine xLAMCH4.2. The alternative, keeping a fixed table of machine parameter values, would degrade portability because the table would have to be changed when moving from one machine, or even one compiler, to another.

Actually, most machines, but not yet all, do have the same machine parameters because they implement IEEE Standard Floating Point Arithmetic [4,5], which exactly specifies floating-point number representations and operations. For these machines, including all modern workstations and PCs4.3, the values of these parameters are given in Table 4.1.


Table 4.1: Values of Machine Parameters in IEEE Floating Point Arithmetic
Machine parameter Single Precision (32 bits) Double Precision (64 bits)
Machine epsilon $\epsilon$ = xLAMCH('E') $2^{-24} \approx 5.96 \cdot 10^{-8}$ $2^{-53} \approx 1.11 \cdot 10^{-16}$
Underflow threshold = xLAMCH('U') $2^{-126} \approx 1.18 \cdot 10^{-38}$ $2^{-1022} \approx 2.23 \cdot 10^{-308}$
Overflow threshold = xLAMCH('O') $2^{128} (1-\epsilon) \approx 3.40 \cdot 10^{38}$ $2^{1024}(1-\epsilon) \approx 1.79 \cdot 10^{308}$

As stated above, we will ignore overflow and underflow in discussing error bounds. References [24,67] discuss extending error bounds to include underflow, and show that for many common computations, when underflow occurs it is less significant than roundoff. With some important exceptions described below, overflow usually means that a computation has failed so the error bounds do not apply.

Therefore, most of our error bounds will simply be proportional to machine epsilon. This means, for example, that if the same problem in solved in double precision and single precision, the error bound in double precision will be smaller than the error bound in single precision by a factor of $\epsilon_{\rm double} / \epsilon_{\rm single}$ . In IEEE arithmetic, this ratio is $2^{-53}/2^{-24} \approx 10^{-9}$ , meaning that one expects the double precision answer to have approximately nine more decimal digits correct than the single precision answer.

LAPACK routines are generally insensitive to the details of rounding and exception handling, like their counterparts in LINPACK and EISPACK. One algorithm, xLASV2, can return significantly more accurate results if addition and subtraction have a guard digit, but is still quite accurate if they do not (see the end of section 4.9).

However, several LAPACK routines do make assumptions about details of the floating point arithmetic. We list these routines here.


next up previous contents index
Next: How to Measure Errors Up: Sources of Error in Previous: Sources of Error in   Contents   Index
Susan Blackford
1999-10-01