Quantcast
Channel: Sanity Phailed.me » floating point
Viewing all articles
Browse latest Browse all 4

0x5f400000: Understanding Fast Inverse Sqrt the Easy(ish) Way!

$
0
0


TL;DR – Start Here!

1. Let \g{f2b} be the float-to-bit function (that takes in a floating point number and outputs a 32-bit long representing its IEEE 754 encoding used on your everyday computer) and \g{b2f} be the bit-to-float function, then the fast inverse square root method can be rewritten as

    \[\sqrt{\tilde x^{-1}} \approx \g{b2f}(\g{0x5f3759df} - \g{f2b}(x)/2)\]

2. It turns out that \g{b2f}(\alpha + \g{f2b}(x)\times \beta) = C(x,\alpha, \beta) \times x^{\beta} for some non-linear function C(x,\alpha,\beta) \approx C(\alpha,\beta) that doesn’t vary a lot with x, so it becomes an optimization game to find \alpha such that C(\alpha, \beta) = 1.
3. We know that x^{\beta} = 1 when x = 1 no matter what \beta is, so if we want to find \alpha such that \g{b2f}(\alpha + \beta\g{f2b}(x)) \approx x^{\beta}, all we need to do is set x = 1 because then

    \[\g{b2f}(\alpha + \beta\g{f2b}(1)) = 1 = \g{b2f}(\g{f2b}(1))\]

so \alpha = \underbrace{\g{f2b}(1)}_{\g{0x3f800000}}(1-\beta), which let’s us make arbitrary fast \beta^{th} power methods on demand.
4. ???
5. Profit!



1. Introduction

The first time I saw the magical fast inverse square root method, I was a freshman in college and the code pretty much killed my brain. I’d just learned about algorithms and data structures and I thought that binary search trees were like the coolest things ever. Turns out I didn’t know them well enough to tackle this problem back then.

WTF?

If you’re not familiar with the fast inverse square root method, here’s what it looks like in C (source: wikipedia)

float Q_rsqrt( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;
 
	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &y;                        // evil floating point bit level hacking
	i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
	y  = * ( float * ) &i;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
 
	return y;
}

[Note: The use of pointer aliasing here is illegal, as Maristic pointed out in the followup discussion on reddit. Use unions instead.]
Okay, so first you declare i, x2, y, and a constant threehalfs. Next, you set x2 to be \frac{x}{2} and set y = x. So far so good!

Okay, next, i gets the dereferenced value of the float* pointer to y casted into a long* pointer instead?? Okay…

Next, i = 0x5f3759df – (i/2)??? Huh?

y = *(float*)&i????

y = y * ( threehalfs – ( x2 * y * y ) )??????

What. The. Fucking. Fuck.

Obsession

Before we go on, I should probably warn you that this code was a constant obsession of mine for the better part of my adult life. Because of this, a lot of what I’m about to say may sound preachy, cheesy, or even outright annoying. I’ll try to tone it down, but truth be told, I am proud of this work; I’ve come to see the fast inverse square root method as my baby and because of that, I’ve become quite possessive of it. As much as this is a post on the possible derivation of 0x5f3759df, this is also a memoir of my college life (minus the fun parts ;).

From the very beginning, I tried to wrap my head around this strange game of algebraic reasoning, and for the next 4 years, my academic life largely revolved around this mystery: I spent my winter break freshman year learning C and figured out that the “bit level” hack gave you the array of bits that your computer encoded floating point numbers as; sophomore year, I took a first course on numerical methods and understood how iterating the sequence

    \[y_{k+1} = y_k \pa{\frac{3}{2} - \frac{x}{2}y_k^2}\]

ad infinitum will eventually converge to y_{\infinity} = \frac{1}{\sqrt{x}}; then, during the summer before my junior year, I worked through a lot of tedious algebra dealing with the quirks of IEEE754 and found another bit-level hack to quickly compute the square root a la the spirit of the fast inverse square root method, but this time exploiting the distribution of floats and a convenient analogy to second order taylor expansions.

Indeed, my obsession with this method has given me an appetite for numerical analysis, programming languages, software verification, compilers, and even combinatorics and dynamical systems. Without it, I would have never explored most of the areas of computer science that I currently consider my home.

But through it all, there was always a constant question at the back of my mind that I’ve failed to answer time and time again: where did the 0x5f3759df come from?

I never got to a point where I can claim that I have a satisfactory answer to that question, but I think I did come to discover something along the road: a different magical constant 0x5f400000 that I believe lay at the very beginning of the history of this method, and it is the joy of discovering this constant that I would like to gift to my readers.

2. Switching Strategy

Towards the end of my college years, I realized that there was a big problem in the way I approached the problem of looking at 0x5f3759df. Most of the traditional research into the subject has been through the perspective that whoever wrote the algorithm went after the most optimal method possible. In the process of hunting for the jackpot, a lot of great intuition was left behind because those ideas weren’t efficient or effective enough. This is what led me down the path of looking for inefficient but elegant methods that could inspire the fast inverse square root method.

But first, let’s ramp up on some preliminaries. A lot of work and paper on this matter jump immediately to the representation of floating point numbers; I will completely forgo that, instead relying on mathematical intuition to do the job. (This is in part why I didn’t become a mathematician :P)

Converting between a Float and its Bits in memory

Let’s first consider two functions: \g{f2b}(n), which stands for float-to-bits will return the IEEE 754 bit array representation of the floating number n, effectively performing *(long*)&y; and \g{b2f}(b), which stands for bits-to-float and takes a bit array and turns it back into a float. Note that \g{f2b} = \g{b2f}^{-1} and vice versa so that \g{b2f}(\g{f2b}(n)) = n.

It turns out that the magical fast inverse square root method is equivalent to

    \[\g{qsqrt}(n) = \g{b2f}\pa{\g{0x5f3759df} - \frac{\g{f2b}(n)}{2}}\]

Looking at b2f(f2b(x)*b)

With the conversion functions out of the way, I next turned my attention to everything but the magic constant. Specifically, what are the effects of multiplying inside the \g{b2f}. I fired up my favorite plotting library and looked at

    \[\g{b2f}(\g{f2b}(x) \times 0.5).\]

After a little bit of wrestling around with the plot, I realized that when plotting \g{b2f}(\g{f2b}(x) \times 0.5) in log-log style, it has the same shape as x^{0.5} = \sqrt{x}. This means that

    \[\g{b2f}(\g{f2b}(x) \times 0.5) \approx C \times \sqrt{x}\]

for some constant C.

I fitted a “regression” line (this isn’t a real regression however, don’t be mislead) through the data onto \sqrt{x} and found that the relative error is quite small as can be seen in the figure below.

    \[\g{change }\beta\]

In fact, if you tick through the slider at the bottom of the figure, you’ll come to find that for any arbitrary \beta, it seems to be the case that

    \[\g{b2f}(\g{f2b}(x) \times \beta) \approx C_{\beta} \times x^{\beta}\]

for some constant C_{\beta}.

3. Homing In on 0x5F400000

Adding Alpha

Let’s now consider the following function:

    \[\gamma(x, \alpha,\beta) = \g{b2f}\pa{\g{f2b}(x)\times \beta + \alpha}\]

Notice here that we now have added an additive term \alpha to the equation. Again, we can rewrite the fast inverse square root method as

    \[\sqrt{\tilde x^{-1}}= \gamma\pa{x, \g{0x5f3759df}, -\frac{1}{2}}\]

A natural question one might consider is the effect of varying \alpha on the behavior of \gamma. It turns out that on the log-log graph of \gamma(x, \alpha, \beta), varying \alpha does not change the slope of the line, but only its y-intercept. In other words, varying \alpha will only affect magnitude of the constant C_{\beta}, as we can see below.

    \[\g{change }\alpha\]

This then suggests that \beta determines the degree of \gamma while \alpha determines its magnitude. Symbolically:

    \[\gamma(x, \alpha, \beta) \approx C_\beta(\alpha) x^{\beta}\]

where C_{\beta}(\alpha) is some constant.

0x3F800000 – Generalized fast power method

We are now left at an interesting point in the puzzle. We have on one hand the knowledge that

    \[\gamma(x, \alpha, \beta) \approx C_\beta(\alpha) x^{\beta}\]

for some function C_\beta(\alpha) and we want to determine \alpha such that

    \[\gamma(x,\alpha,\beta) \approx C_\beta(\alpha) x^{\beta} = x^{\beta}\]

Canceling out the x^{\beta} on either side, we are left with the functional equation

    \[C_\beta(\alpha) = 1\]

So right now we need to find \alpha such that C_\beta(\alpha) = 1. But for x = 1, x^{\beta} = 1 for any \beta, so the naive thing to do here is to reduce our approximation to

    \begin{align*} \gamma(1, \alpha, \beta) &\approx C_\beta(\alpha) = 1 \\ \g{b2f}(\g{f2b}(1)\times \beta + \alpha) &\approx 1 \\ & = \g{b2f}(\g{f2b}(1)) \\ \g{f2b}(1)\times \beta + \alpha &\approx \g{f2b}(1) \\ \alpha &\approx \g{f2b}(1) \times (1-\beta) \\ &= \g{0x3f800000} \times (1-\beta) \end{align*}

Let’s take a moment and appreciate this statement:

    \[\alpha \approx \g{0x3f800000} \times (1-\beta)\]

This says that for any exponent \beta (positive or negative, integer or fractional), we can approximate x^{\beta} with

    \[\g{b2f}\pa{\g{f2b}(x)\times \beta + \g{0x3f800000} \times (1-\beta)}\]

That’s mind blowing because we can immediately generate “fast \beta^{th} power” methods using nothing but \beta!

float fast_pow(float x, float n) {
  long bit = (*((long*)&x))*n + 0x3f800000*(1-n);
  return *((float*)&bit);
}

Going back to inverse square roots, we have \alpha = \g{0x3f800000} \times (1- (-0.5)) = \g{0x5f400000}, which is the namesake of this article.

4. Epilogue – The Hard Parts

Closing In On 0x5f3759df

After discovering the existence of 0x5f400000, things became much clearer. However, it was still quite off from SGI’s 0x5f3759df or Lomont’s 0x5f375a86, and so the question of how this constant came to be still nagged at the back of my head. Nevertheless, I am certain that 0x5f400000 appeared at least in some shape or form in the earlier versions of this numerical recipe.

Many others have previously discussed the possible origins of 0x5f3759df, and in the rest of this post, I will offer my own views on the matter.

For the most part, I believe that the discovery of 0x5f3759df is an iterative process: it starts with 0x5f40, then it gets refined to 0x5f375X, and finally someone must have decided on 0x5f3759df. As engineers are practitioners by nature, I don’t believe that anyone that worked on 0x5f3759df cared for its optimality; therefore, I believe that intuition and visualization play the most important part in its discovery.

In this section, I will first outline one path to refine 0x5f40 to 0x5f375X through intuition about the maximum error bounds of the recipe, followed by a discussion on the final discovery as a product of analyzing one step of newton’s iteration.

Refinement – 0x5f376000

If we look at the graph of the relative error of \gamma\pa{x, \g{0x5f400000}, -\frac{1}{2}} on a logarithmic x-axis below, you will notice a couple of peculiar things.

First and foremost, changing the value of the magic constant by small amounts seems to lift the relative error up. Next, the error is periodic with respect to \log_2(x) with a period of 2 (so that \g{re}(x) = \g{re}(2^{\log_2(x)+2}) = \g{re}(4x)). Finally, the position of the maximum and the minimum of the relative error is largely unchanged when we vary the magic constant.

We can use these observations to our advantage. Notice that the error for \g{magic} = \g{0x5f400000} is completely negative. However, if we were to use a smaller magic constant, we can shift part of the error up into the positive zone, which reduces the absolute error as can be seen below.

We are trying to reduce the error bounds as much as possible (under the infinity norm optimization as we would say). In order to do so, we need to ensure that the maximum point plus the minimum point in the non-absolute relative error plot is as close to 0 as possible.

Here’s what we’re going to do. First, we’re going to zoom in onto the section of the graph within the interval x \in [1,4] because it is periodic. Next, we will look at the maximum point and minimum point and look at their sum.

    \[\g{change magic constant }m\]

You will see that the optimal constant here lies somewhere close to 0x5f376000, which is about 10^3 away from the SGI constant.

Newton Iteration – 0x5f375a1a

In our final trick tonight, we will refine this constant down further to within 60 away from SGI’s 0x5f3759df.

One of the important things to notice from the quake code is that the second newton iteration was commented away. This suggests that whoever worked on the latest version of the fast inverse square root method must have optimized the constant with the effects of newton iteration applied. Let’s look at how well the constants do with the help of an iteration.

Wow, using 0x5f3759df as the magic constant improved the accuracy nearly tenfolds! This likely suggests that whoever worked on quake’s version of fast inverse square root method sought after 4 digits of accuracy. In order to achieve this with the vanilla constant 0x5f400000, they would need to run two newton iterations.

Let’s dig in a little bit and try to find the magic constant that minimizes the maximum error after one newton iteration.

    \[\g{change magic constant }m\]

Incidentally, this occurs when the right most two peeks in the above graph collide. We find that the magic constant 0x5f375a1a works best here, which is only 59 away from 0x5f3759df.

We can look at the mean error as well (the one-norm):

    \[\g{change magic constant }m\]

but as the bulk of the mass of the error is tucked under the two humps, the 1-norm/mean-metric (and in fact, the first few reasonable L-norms) will tend to disregard the significance of the outer edges and their maximum-bounds. Therefore, these will attempt to decrease the constant by as much as possible.

Finally, let’s look at the maximum relative error after two iterations:

    \[\g{change magic constant }m\]

This ended up giving us the same magic constant of 0x5f375a1a, which I guess makes sense.

Closing Thoughts

In this article, we’ve come across a couple of different constants: 0x5f400000, 0x54376000, 0x54375a1a. None of these are exactly the 0x543759df that we’ve been searching for, but the last one is really close, which leads me to believe that the discovery of 0x543759df went through a similar process: 1. intuition, and 2. refinement.

The dark art of fast reciprocal square roots have been largely done away with nowadays with faster intrinsics. If anything, this article contributes little technical value besides proposing a half-assed solution to a great puzzle. Nevertheless, I sincerely hope that, having made it this far, you’ve also found some value in this research. Good Luck!

Requested by averageisnothalfway: Differences between a1a and df



Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images