Optimizing Math Functions

Tobi · Post by **Tobi** » 03 Mar 2008, 20:14

It was done but was unmaintained for a long time even before I joined the community. (ENTER_SYNCED, ENTER_UNSYNCED, ENTER_MIXED, PUSH_CODE_MODE and POP_CODE_MODE macros)

So basically one would have to start from scratch marking code as synced or unsynced (at times I tried to figure out a way to programmatically do that but haven't found an easy way yet.)

Jonanin · Post by **Jonanin** » 03 Mar 2008, 20:53

Alright, well, it seems this conversation has gone way above my head

However, about the earlier comments:

I have however updated the code a bit:

Code: Select all

float bsqrt(float number) {
    const float f = 1.5F;
    float x = number * 0.5F;
    float y  = number;
    int i  = * ( int * ) &y;
    i  = 0x5f375a86 - ( i >> 1 );
    y  = * ( float * ) &i;
    y  = y * ( f - ( x * y * y ) );
	y  = y * ( f - ( x * y * y ) );
    return number * y;
}

imbaczek wrote:edit: paper states that 0x5f375a86 is a better initial guess, but it doesn't matter _that_ much.

Ah, that is correct. Changed above.

Kloot wrote:Why long instead of int as in Lomont's paper? Even
approximations should always have equal precision
on all hardware (even though in this case it doesn't
matter).

Sorry, actually quake used a long, not sure why tho. fixed above.

On my tests, I believe streflop is 25x slower where Math.h's is about 10x slower.

Here are the percents of error I got [tested against math.h]:

Code: Select all

0 Iterations:
    Sqrt Total Time: 969
    Fast Sqrt Total: 94
    Average Error: 2.3142415708400920770770881063072010874748229980469 percent
    Highest Error: 3.4364936758788369175476873351726680994033813476562 percent at sqrt(977746)
1 Iteration:
    Sqrt Total Time: 968
    Fast Sqrt Total: 141
    Average Error: 0.09285881584973530722404433390693156979978084564209 percent
    Highest Error: 0.17513806048635793821688366733724251389503479003906 percent at sqrt(2701137)
2 Iterations:
    Sqrt Total Time: 968
    Fast Sqrt Total: 188
    Average Error: 0.00017350704573369476243355213895824817882385104894638 percent
    Highest Error: 0.00046997129594361061787413658130674321000697091221809 percent at sqrt(2677679)
3 Iterations:
    Sqrt Total Time: 953
    Fast Sqrt Total: 250
    Average Error: 1.9868740745263981125451856202257516770259826444089e-06 percent
    Highest Error: 1.1917364517739414403051274871092601870259386487305e-05 percent at sqrt(4196813)

I found it very interesting how well 0 iterations performs - very fast, and pretty darn accurate. Maybe it could be used for something like collisions which needs little accuracy. An average error of only 2 percent is a lot lower than I expected.

So I suppose this method is better for n<2 iterations. Also we could use the inverse sqrt too.

Post by **Kloot** » 04 Mar 2008, 02:51

FWIW, I added this function to the float3 class (inlined so the
stackframe creation/destruction overhead doesn't defeat its
purpose). Since synthetic benchmarks almost never tell the
whole story, anyone wishing to investigate its potential proper
can do so by grep'ping all (significant) instances of Normalize()
by ANormalize() and then setting up some realistic (ie., likely to
arise during normal large-scale games) test scenario. Note that
you will want to ensure your tests are CPU-limited first, don't run
them with dynamic water and shadows enabled.

Jonanin · Post by **Jonanin** » 04 Mar 2008, 02:59

Hmm, nice.

You implemented InvSqrt, what about regular sqrt? Also, your initial guess magic number seems weird, why 0x5f3759d5? I would use 0x5f375a86.

Also the number of iterations needed must be considered. 0 iterations could sometimes be useful - it is very fast, and still relatively accurate.

Also, currently there are 2 iters, you could consider only 1 as it is 2x faster, still pretty accurate.

Jonanin · Post by **Jonanin** » 05 Mar 2008, 06:40

OK I see it was updated.

here is a simple patch which shows a fast sqrt using InvSqrt.
Though it doesn't actually implement it anywhere,someone can do that though.

Also, sqrt isn't the only thing that can have fast approximations, there are good ones for sine and cosine as well, and some others. Maybe that is going overboard though, just a heads up.

I will try to test these later...

EDIT: humm I'm not sure which types of file extensions are allowed, ill just put the patch inside code tags...

Code: Select all

Index: float3.h
===================================================================
--- float3.h	(revision 5559)
+++ float3.h	(working copy)
@@ -387,6 +387,16 @@
 		// x = x * (1.5f - xh * (x * x));
 		return x;
 	}
+	
+	/**
+	 * @brief fast approximation of sqrt(x)
+	 * @return approximation of sqrt(x)
+	 *
+	 * Uses InvSqrt and multiplies by x to get sqrt.
+	 */
+	inline float ASqrt(float x) {
+		return InvSqrt(x) * x;
+	}
 
 	/**
 	 * @brief normalizes the vector approximately

Peet · Post by **Peet** » 05 Mar 2008, 06:42

I don't think sin/cos approximations would work out, as they're inherent in things like rotation.

Post by **ILMTitan** » 05 Mar 2008, 13:16

One thing you need to remember is that sin/cos on a computer are (nearly) always approximations due to the maximum size of a float. If we can increase speed significantly and keep the error below 1%, I think we should go for it.

BoredJoe · Post by **BoredJoe** » 05 Mar 2008, 15:58

Jonanin wrote:Hmm, nice.

You implemented InvSqrt, what about regular sqrt? Also, your initial guess magic number seems weird, why 0x5f3759d5? I would use 0x5f375a86.

Also the number of iterations needed must be considered. 0 iterations could sometimes be useful - it is very fast, and still relatively accurate.

Also, currently there are 2 iters, you could consider only 1 as it is 2x faster, still pretty accurate.

umm...did you change your number from the Quake 3 number to the one imbaczek suggested would be better?

I remember reading about this a long time ago, and essentially, even though there are numbers that are theoretically better initial guesses than the quake 3 guess, they actually perform worse than the Quake 3 number.

Tobi · Post by **Tobi** » 06 Mar 2008, 10:04

Both float3::InvSqrt and float3::ASqrt should really be static; it doesn' t make sense to require user of the function to create a float3 object on the stack just to be able to calculate a (inverse) square root of a single float.

Ideally the functions don't belong in float3 at all anyway, since they have nothing to do with float3.

Post by **Kloot** » 06 Mar 2008, 12:56

Indeed, I only put InvSqrt() there because ANormalize() makes
use of it and there aren't many other places where reciprocal
square roots have to be calculated. Introducing a MathHelper
compilation unit would be the better option though.

Jonanin · Post by **Jonanin** » 13 Mar 2008, 20:30

Well, I've tested by simply replacing float3::Normalize with float3::ANormalize, and it seems to work perfect. I don't know if this is necessarily a good test, because I don't know where the approximation would be used if not everywhere.

However I haven't tested for speed improvements. Time profiler in float3 isn't working, however lurker said he got it working once.

What's a good way to test speed then?

EDIT: Not working because TimeProfiler includes float3.h itself.

imbaczek · Post by **imbaczek** » 13 Mar 2008, 22:37

compile with debug=yes profile=yes (if you're using gcc) and check what's using the most cycles. normally sqrt shows up there near the top.

you could also try a replay of a big fight... it won't be in sync, but the framerates should be possible to compare objectively.

Jonanin · Post by **Jonanin** » 13 Mar 2008, 23:15

Well, I get about ~18 fps with 40 krog vs 40 krog on delta siege revx3 with fast math, and about ~15 avg fps with regular sqrt.

Can someone test this also and see what they get?

Note this does not include when they all blow up in a chain reaction, I get a brief 1 fps for both then.

lurker · Post by **lurker** » 14 Mar 2008, 02:28

(I told him to compile with profiling repeatedly, and the specific command. I think he's deaf to it.)

Jonanin, I'll find my changes to get timeprofiler working and pastebin them.

Jonanin · Post by **Jonanin** » 14 Mar 2008, 05:10

Yes, I heard you, but the time profiling wasn't working because of the reasons I stated above. Compiling 'with time profiling' didn't help at all.

lurker · Post by **lurker** » 14 Mar 2008, 06:06

I believe this is the fourth time I've said this, so I'll use bold text.

Compiling with profiling is not using timeprofiler.h

Jonanin · Post by **Jonanin** » 14 Mar 2008, 22:42

Apparently it isn't clear to you, so I'll use bold text.

I know that.

Compiling 'with time profiling' didn't help at all.

Also note that the flag that you gave me was different from what imbaczek said above, you told me to use "use_profiling=yes" or something similiar.

Tobi · Post by **Tobi** » 15 Mar 2008, 21:19

Seems it's time for me to drop in the discussion as author of current buildsystem.

How To Profile Spring:

Code: Select all

scons configure profile=yes && scons spring

then test

Code: Select all

gprof spring.exe > profile.txt

To actually compare data the test & environment of course must be identical everytime.

This profiling should not be confused with profile_generate and profile_use flags, which, respectively, generate a profiling binary and a binary that uses the profiling data generated from running the first binary to make better guesses for branch prediction / loop unrolling etc. Hence, these flags are purely for optimization (and didn't particularly help; no measurable speed difference in my test), while the other is purely aid in manual optimizing.

Jonanin · Post by **Jonanin** » 16 Mar 2008, 02:09

Hmm, when I do gprof spring.exe > profile.txt I get:

Code: Select all

gmon.out: no such file or directory

Also, exactly how identical do the tests have to be?

Is it fine to just create units at the exact same location both times?

imbaczek · Post by **imbaczek** » 16 Mar 2008, 10:25

best way is to play a replay.

you must run gprof after you've ran the game, gmon.out is a file that should get created when you exit spring.

Spring RTS Engine

Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions

Re: Optimizing Math Functions