A while ago I was browsing the tubes and came across an article on Quake's fast inverse square root function. It was pretty interesting (and very clever...). It calculates 1/sqrt(x) using only bit shift and multiplication. I was thinking about how this could maybe be used in Spring to speed things up. I struck up a conversation in #sy and discussed it a little bit but nothing came out of it other than learning that Spring doesn't use approximations at all.
Today I coded up a fast sqrt functions using the magic number from Quake and newton's method. Note that the sqrt is just an approximation - which is what makes it fast.
Code: Select all
float bsqrt(float number) {
long i;
float x, y;
const float f = 1.5F;
x = number * 0.5F; // half number used for approximation
y = number;
i = * ( long * ) &y;
i = 0x5f3759df - ( i >> 1 ); // magic number from Quake 3. this is the initial guess
y = * ( float * ) &i;
y = y * ( f - ( x * y * y ) ); // Newton's Method (This is 2 iterations, use more means more accuracy
y = y * ( f - ( x * y * y ) ); // but it is slower. Less iterations isn't as accurate but faster.
return number * y;
}
So far the tests have been great when comparing with streflop's sqrt. When calculating sqrts of 1-100000, 100 times, the fast sqrt is roughly 25x faster.
See here (Times in milliseconds):
Code: Select all
Streflop Sqrt time: 7563
Fast Sqrt 1 Iteration: 265
Fast Sqrt 2 Iterations: 344
Fast Sqrt 3 Iterations: 453
(Note that 'error' is the difference between the fast sqrt and the real square root. so an error of 1 would mean, say, sqrt(x) was 5 and fastsqrt(x) was 6).
Code: Select all
1 Iteration:
Average error: 1.9649329910134016863310080225346609950065612792969
Highest error: 4.830078125 at sqrt(9999917)
2 Iterations:
Average Error: 0.0036126651906251906927469708108446866390295326709747
Highest Error: 0.01123046875 at sqrt(9975537)
3 Iterations:
Average Error: 0.000043552740287780761837405085756813605257775634527206
Highest Error: 0.000244140625 at sqrt(4196813)
Now, I was wondering if this would be worth implementing, and if so, how I would go about implementing and testing its performance against the old sqrt.
Also thanks to lurker for a lot of help.
Comments, etc, please?