Quantcast
Channel: Sanity Phailed.me » floating point
Viewing all articles
Browse latest Browse all 4

Fixing floating point cancellation I

$
0
0

The graph of f(x) is not smooth around 1e7
This sounds like an easy problem, but it turns out that the solution isn’t so simple. If we plot f(x) = \sqrt{x^2+1}-x, you’ll see that its graph is fairly smooth early on, but once x approaches 10^{7}, it is no longer smooth.

1. Uh Oh

This is puzzling, since it’s fairly obvious that f(x) is smooth everywhere; we can only conclude that floating point roundoff is to blame, and seeing the error manifest already on just an order of 10^{-8} means that this equation itself isn’t enough to solve our problem. In fact, using arbitrary precision floating point arithmetic, we see that f(x) is indeed smooth within this region, so we know who the culprit is.

smooth1

We’re plagued by roundoff because \sqrt{x^2+1} and x are extremely close together, so the IEEE standard guarantees that their subtraction will be flawless. Well, if this subtraction doesn’t incur any error whatsoever, why is there round-off? While the subtraction itself is error-free, the same cannot be said of the \sqrt{x^2+1}. A crude bound analysis on the relative error of the form

    \[\textsc{Flt}(\textsf{expr}_1 \circ \textsf{expr}_2) = \left(\textsc{Flt}(\textsf{expr}_1) \circ \textsc{Flt}(\textsf{expr}_2)\right)(1 + \delta_{\circ})\]

and

    \[\textsc{Flt}(\textsf{num}) = \textsf{num}(1 + \delta_{num})\]

where the \textsc{Flt} function says whatever is in its argument will be evaluated under floating point arithmetic, \circ represents one of the arithmetic operations or standard transcendental functions, and \delta_{\cdot} represents the relative error associated with an operation or with the representation of a number. Let’s look at the example in the problem. We’re going to assume that the subtraction gives no error, or that \delta_{minus} = 0. Furthermore, let’s just for the sake of simplicity assume that x is perfectly represented (so that \delta_x = 0 as well, and since integers will never overflow, we have \textsc{Flt}(x^2) is perfectly computed as well)

    \begin{align*} \textsc{Flt}\left(\sqrt{x^2+1}-x\right) &= \textsc{Flt}\left(\sqrt{x^2+1}\right) - \textsc{Flt}\left(x\right) \\ &= \sqrt{\textsc{Flt}\left(x^2+1\right)}(1+\delta_{\sqrt{}}) - x \\ &= \sqrt{\left(\textsc{Flt}(x^2)+\textsc{Flt}(1)\right)(1+\delta_+)}(1+\delta_{\sqrt{}}) - x \\ &\approx \sqrt{\left(\textsc{Flt}(x^2)+\textsc{Flt}(1)\right)}\left(1+\frac{\delta_+}{2}\right)(1+\delta_{\sqrt{}}) - x \\ &\approx \sqrt{x^2 + 1}\left(1+\frac{\delta_+}{2} + \delta_{\sqrt{}}\right) - x \\ &= \sqrt{x^2 + 1} - x + \sqrt{x^2 + 1}\left(\frac{\delta_+}{2} + \delta_{\sqrt{}}\right) \\ &= \left(\sqrt{x^2 + 1} - x\right)\left(1 + \frac{\sqrt{x^2 + 1}\left(\frac{\delta_+}{2} + \delta_{\sqrt{}}\right)}{\sqrt{x^2 + 1} - x}\right) \\ &\implies \textsf{relative error is bounded by }\frac{\sqrt{x^2 + 1}\left(\frac{\delta_+}{2} + \delta_{\sqrt{}}\right)}{\sqrt{x^2 + 1} - x} \end{align*}

This analysis is intriguing: it tells us that at the very worst, we can expect our relative error of the entire computation to be bounded by \frac{\sqrt{x^2 + 1}\left(\frac{\delta_+}{2} + \delta_{\sqrt{}}\right)}{\sqrt{x^2 + 1} - x} where each of the \delta_{\cdot} term is bounded above by \epsilon_{mach}, so the entire \frac{\delta_+}{2} + \delta_{\sqrt{}} is bounded to an order of magnitude of 10^{-16}. This spells out why the error gets blown up at around x=10^7. Our relative error is basically \epsilon \frac{\sqrt{x^2 + 1}}{\sqrt{x^2 + 1} - x}, and the absolute error is then just \epsilon \frac{\sqrt{x^2 + 1}}{\sqrt{x^2 + 1} - x} \left(\sqrt{x^2 + 1} - x\right) = \epsilon\sqrt{x^2 + 1} \approx \epsilon \times x, where when x \to 10^7, the absolute error is around 10^{-16}\times 10^7 \approx 10^{-9}, which is definitely not good enough.

Hold on, the original relative error of the computation \sqrt{x^2 + 1} only incurred a relative error of just 10^{-16}, why did the final error get blown up by that many orders of magnitude? (if you do the calculation, \sqrt{x^2 + 1} - x \approx \frac{1}{2x})? Here’s a more intuitive way of seeing this: when we computed \sqrt{x^2 + 1}, we got a small \epsilon amount of relative error on a large number. However, this amounts to \epsilon x amount of absolute error. When we subtracted off the rest of the x, we’re still left with an absolute error of \epsilon x, but this is the error on a much smaller number (\frac{1}{x}), so the original absolute error now becomes a rather large amount of relative error.

2. The Fix?

Intuition plays a large role here since finding workarounds to floating point issues is pretty much a dark art. Here, the subtraction is actively amplifying our error, so what if we find some equivalent form of this expression that doesn’t subtract two large but close numbers? Through simple algebra, we see that by multiplying the conjugate we get

    \begin{align*} \sqrt{x^2+1}-x &= \left(\sqrt{x^2+1}-x\right)\frac{\sqrt{x^2+1}+x}{\sqrt{x^2+1}+x} \\ &= \frac{x^2 + 1 - x^2}{\sqrt{x^2+1}+x} \\ &= \frac{1}{\sqrt{x^2+1}+x} \end{align*}

Is this form better? Let’s try it out.

corrected1

errorcorrected1

In this case, getting rid of the subtraction got our method’s error down to just 10^{-20}, which is much better than what we’re looking for.


Viewing all articles
Browse latest Browse all 4

Trending Articles