Optimizing Math Functions

Optimizing Math Functions

Discuss the source code and development of Spring Engine in general from a technical point of view. Patches go here too.

Moderator: Moderators

Jonanin
Posts: 107
Joined: 13 Jan 2008, 21:34

Optimizing Math Functions

Post by Jonanin »

Hi all.

A while ago I was browsing the tubes and came across an article on Quake's fast inverse square root function. It was pretty interesting (and very clever...). It calculates 1/sqrt(x) using only bit shift and multiplication. I was thinking about how this could maybe be used in Spring to speed things up. I struck up a conversation in #sy and discussed it a little bit but nothing came out of it other than learning that Spring doesn't use approximations at all.

Today I coded up a fast sqrt functions using the magic number from Quake and newton's method. Note that the sqrt is just an approximation - which is what makes it fast.

Code: Select all

float bsqrt(float number) {
    long i;
    float x, y; 
    const float f = 1.5F; 
    x = number * 0.5F; // half number used for approximation
    y  = number; 
    i  = * ( long * ) &y; 
    i  = 0x5f3759df - ( i >> 1 );    // magic number from Quake 3. this is the initial guess
    y  = * ( float * ) &i; 
    y  = y * ( f - ( x * y * y ) );  // Newton's Method (This is 2 iterations, use more means more accuracy
    y  = y * ( f - ( x * y * y ) );  // but it is slower. Less iterations isn't as accurate but faster.
    return number * y; 
}
As you can see the last two lines are the same. These are the iterations of Newton's method. The more iterations you use, the more accurate the guess. However, more iterations is slower.

So far the tests have been great when comparing with streflop's sqrt. When calculating sqrts of 1-100000, 100 times, the fast sqrt is roughly 25x faster.
See here (Times in milliseconds):

Code: Select all

Streflop Sqrt time: 7563
Fast Sqrt 1 Iteration: 265
Fast Sqrt 2 Iterations: 344
Fast Sqrt 3 Iterations: 453
Now, about how accurate it is. Going from 1 to 10000000:
(Note that 'error' is the difference between the fast sqrt and the real square root. so an error of 1 would mean, say, sqrt(x) was 5 and fastsqrt(x) was 6).

Code: Select all

1 Iteration:
     Average error: 1.9649329910134016863310080225346609950065612792969
     Highest error: 4.830078125 at sqrt(9999917)
2 Iterations:
     Average Error: 0.0036126651906251906927469708108446866390295326709747
     Highest Error: 0.01123046875 at sqrt(9975537)
3 Iterations:
     Average Error: 0.000043552740287780761837405085756813605257775634527206
     Highest Error: 0.000244140625 at sqrt(4196813)

Now, I was wondering if this would be worth implementing, and if so, how I would go about implementing and testing its performance against the old sqrt.

Also thanks to lurker for a lot of help.

Comments, etc, please?
Last edited by Jonanin on 03 Mar 2008, 06:09, edited 1 time in total.
User avatar
ILMTitan
Spring Developer
Posts: 410
Joined: 13 Nov 2004, 08:35

Re: Optimizing Math Functions

Post by ILMTitan »

Sounds great. I know we could use it in some of the float3 methods at least.
Jonanin
Posts: 107
Joined: 13 Jan 2008, 21:34

Re: Optimizing Math Functions

Post by Jonanin »

Also, I forgot to add.

Has been tested on both windows and linux, it runs exactly the same on both.
User avatar
LordMatt
Posts: 3393
Joined: 15 May 2005, 04:26

Re: Optimizing Math Functions

Post by LordMatt »

So it gives the same error on Linux and windows or runs at the same speed?
User avatar
Peet
Malcontent
Posts: 4384
Joined: 27 Feb 2006, 22:04

Re: Optimizing Math Functions

Post by Peet »

LordMatt wrote:So it gives the same error on Linux and windows or runs at the same speed?
Yes.
Jonanin
Posts: 107
Joined: 13 Jan 2008, 21:34

Re: Optimizing Math Functions

Post by Jonanin »

Helpful peet :D

Heh, the code never runs the same speed every time in the SAME platform :P

I meant it returns the same values on both platforms, so yes, same error.
User avatar
KDR_11k
Game Developer
Posts: 8293
Joined: 25 Jun 2006, 08:44

Re: Optimizing Math Functions

Post by KDR_11k »

It might still be useful to keep fastinvsqrt around, things like normalizing need that a lot.
User avatar
yuritch
Spring 1944 Developer
Posts: 1018
Joined: 11 Oct 2005, 07:18

Re: Optimizing Math Functions

Post by yuritch »

Absolute error values are not that important, it's relative error that says much about approximate functions usefulness. Can you show relative (i.e. percentage) errors of this compared to 'slow' sqrt? I.e. like that: max relative error - 50%, min relative error - 0.0000001%?
Why is this important: compare an absolute error of 1 when computing sqrt(1) and sqrt(10000). In the first case it's a 100% error (0 or 2 instead of 1), in the second case it's 1% error (99 or 101 instead of 100), but in absolute value it's always a 1.
User avatar
koshi
Lobby Developer
Posts: 1059
Joined: 14 Aug 2007, 16:15

Re: Optimizing Math Functions

Post by koshi »

research paper wrote:the maximum relative error over all floating point numbers was 0.00175228
Link: http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
imbaczek
Posts: 3629
Joined: 22 Aug 2006, 16:19

Re: Optimizing Math Functions

Post by imbaczek »

given that sqrt consistently shows up in profiles as one of the most time-consuming functions, I'm all for including and using it.

edit: paper states that 0x5f375a86 is a better initial guess, but it doesn't matter _that_ much.
Auswaschbar
Spring Developer
Posts: 1254
Joined: 24 Jun 2007, 08:34

Re: Optimizing Math Functions

Post by Auswaschbar »

I did some investigations in math functions in the past and found out that the system's math functions are about 20 times faster than streflop's, but its more accurate than this approximation, but its unsynced.
How fast is this approximation compared with your systems sqrt()-implementation?
User avatar
AF
AI Developer
Posts: 20687
Joined: 14 Sep 2004, 11:32

Re: Optimizing Math Functions

Post by AF »

We could always put it in a branch and test, that way we wouldn't need to speculate.
User avatar
lurker
Posts: 3842
Joined: 08 Jan 2007, 06:13

Re: Optimizing Math Functions

Post by lurker »

The system's? With math.h, it came out at the same speed as with streflop, and I'm reasonable sure there were no auto-includes as not having the "streflop_cond.h" line made it not compile. Do I need to do something else to match the mode Spring's streflop runs in?

Edit: After many errors trying to get streflop_init<streflop::Simple>(); to run, I gave up and put the test code directly into game.cpp.
results in a minute
Kloot
Spring Developer
Posts: 1867
Joined: 08 Oct 2006, 16:58

Re: Optimizing Math Functions

Post by Kloot »

@Jonanin

Why long instead of int as in Lomont's paper? Even
approximations should always have equal precision
on all hardware (even though in this case it doesn't
matter). ;)
Last edited by Kloot on 03 Mar 2008, 16:03, edited 2 times in total.
User avatar
lurker
Posts: 3842
Joined: 08 Jan 2007, 06:13

Re: Optimizing Math Functions

Post by lurker »

Okay, finally got it going in spring. A two iteration run is ~8 times faster than streflop.

For ten runs of each number 1 - 10,000,000
Streflop: Sqrt Total Time: 11297ms
Fast Sqrt, 1 iteration: 1156ms
Fast Sqrt, 2 iterations: 1516ms
Fast Sqrt, 3 iterations: 1828ms
Last edited by lurker on 03 Mar 2008, 15:59, edited 1 time in total.
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Re: Optimizing Math Functions

Post by Tobi »

What about __builtin_fsqrt() ?

It's one assembler instruction only, and only one line of code so it's easier to maintain too. (Basically it's what GCC would generate with -ffast-math, I think.)

It may even be (a lot) faster then the Quake bsqrt.
User avatar
lurker
Posts: 3842
Joined: 08 Jan 2007, 06:13

Re: Optimizing Math Functions

Post by lurker »

streflop: 11610ms
quake, 2 iterations: 1515ms
__builtin_sqrt: 1344ms
1 / __builtin_sqrt: 2219ms

Why don't we use that again? syncing?
Tobi
Spring Developer
Posts: 4598
Joined: 01 Jun 2005, 11:36

Re: Optimizing Math Functions

Post by Tobi »

Yes.

It could be that streflop isn't really needed even for sync since we use only GCC anyway, but it's VERY hard to test.

Basically you need to generate a sync matrix like I once did, ie. test every version of GCC you can get with every optimization flag (0, 1, 2, 3, s) vs each other (well vs one reference version should be enough, especially since the old sync matrix is known: every GCC >= 4.0 with every optimization option syncs against each other), to check whether there are any regressions.

(I think there's a script for that somewhere in tools/, when configured correctly it should be possible to run the script and let it run until it finishes (takes a few days); then you can create a matrix from the log file.)

Of course this isn't needed if you make synced code use streflop and only convert unsynced code, but the problem with this is to figure out the possible context in which each sqrt is executed. (for example the one in float3::Normalize(), which is probably one of the most used sqrt calls, is called from both synced and unsynced contexts, so you can't just change it without extensively testing for sync regressions)

EDIT: Before anyone suggests it, SyncedFloat3 is NOT the solution to the latter issue, since the operations are defined in a way to minimize impact on mixed sync calculations: ie. SyncedFloat3::operator+(SyncedFloat3, SyncedFloat3) returns float3, because it is entirely valid to add two synced floats in unsynced context. Would you then do (syncedFloat1 + syncedFloat2).Normalize() and expect SyncedFloat3::Normalize() to get called, you are wrong ;-)
User avatar
AF
AI Developer
Posts: 20687
Joined: 14 Sep 2004, 11:32

Re: Optimizing Math Functions

Post by AF »

we could add a global boolean check specifying wetehr we're in synced or unsynced mode but then I dont know if the overhead of that would negate any benefit in a faster sqrt operation. It would also be useful for other things, perhaps its already done somewhere, bah I dont know enough to comment.
User avatar
det
Moderator
Posts: 737
Joined: 26 Nov 2005, 11:22

Re: Optimizing Math Functions

Post by det »

Today's computers have an SSE function to do an equivalent approximation of inverse square root at greater speeds.
Post Reply

Return to “Engine”