Optimizing Math Functions
Moderator: Moderators
Re: Optimizing Math Functions
It was done but was unmaintained for a long time even before I joined the community. (ENTER_SYNCED, ENTER_UNSYNCED, ENTER_MIXED, PUSH_CODE_MODE and POP_CODE_MODE macros)
So basically one would have to start from scratch marking code as synced or unsynced (at times I tried to figure out a way to programmatically do that but haven't found an easy way yet.)
So basically one would have to start from scratch marking code as synced or unsynced (at times I tried to figure out a way to programmatically do that but haven't found an easy way yet.)
Re: Optimizing Math Functions
Alright, well, it seems this conversation has gone way above my head
However, about the earlier comments:
I have however updated the code a bit:
On my tests, I believe streflop is 25x slower where Math.h's is about 10x slower.
Here are the percents of error I got [tested against math.h]:
I found it very interesting how well 0 iterations performs - very fast, and pretty darn accurate. Maybe it could be used for something like collisions which needs little accuracy. An average error of only 2 percent is a lot lower than I expected.
So I suppose this method is better for n<2 iterations. Also we could use the inverse sqrt too.

However, about the earlier comments:
I have however updated the code a bit:
Code: Select all
float bsqrt(float number) {
const float f = 1.5F;
float x = number * 0.5F;
float y = number;
int i = * ( int * ) &y;
i = 0x5f375a86 - ( i >> 1 );
y = * ( float * ) &i;
y = y * ( f - ( x * y * y ) );
y = y * ( f - ( x * y * y ) );
return number * y;
}
Ah, that is correct. Changed above.imbaczek wrote:edit: paper states that 0x5f375a86 is a better initial guess, but it doesn't matter _that_ much.
Sorry, actually quake used a long, not sure why tho. fixed above.Kloot wrote:Why long instead of int as in Lomont's paper? Even
approximations should always have equal precision
on all hardware (even though in this case it doesn't
matter).
On my tests, I believe streflop is 25x slower where Math.h's is about 10x slower.
Here are the percents of error I got [tested against math.h]:
Code: Select all
0 Iterations:
Sqrt Total Time: 969
Fast Sqrt Total: 94
Average Error: 2.3142415708400920770770881063072010874748229980469 percent
Highest Error: 3.4364936758788369175476873351726680994033813476562 percent at sqrt(977746)
1 Iteration:
Sqrt Total Time: 968
Fast Sqrt Total: 141
Average Error: 0.09285881584973530722404433390693156979978084564209 percent
Highest Error: 0.17513806048635793821688366733724251389503479003906 percent at sqrt(2701137)
2 Iterations:
Sqrt Total Time: 968
Fast Sqrt Total: 188
Average Error: 0.00017350704573369476243355213895824817882385104894638 percent
Highest Error: 0.00046997129594361061787413658130674321000697091221809 percent at sqrt(2677679)
3 Iterations:
Sqrt Total Time: 953
Fast Sqrt Total: 250
Average Error: 1.9868740745263981125451856202257516770259826444089e-06 percent
Highest Error: 1.1917364517739414403051274871092601870259386487305e-05 percent at sqrt(4196813)
So I suppose this method is better for n<2 iterations. Also we could use the inverse sqrt too.
Re: Optimizing Math Functions
FWIW, I added this function to the float3 class (inlined so the
stackframe creation/destruction overhead doesn't defeat its
purpose). Since synthetic benchmarks almost never tell the
whole story, anyone wishing to investigate its potential proper
can do so by grep'ping all (significant) instances of Normalize()
by ANormalize() and then setting up some realistic (ie., likely to
arise during normal large-scale games) test scenario. Note that
you will want to ensure your tests are CPU-limited first, don't run
them with dynamic water and shadows enabled.
stackframe creation/destruction overhead doesn't defeat its
purpose). Since synthetic benchmarks almost never tell the
whole story, anyone wishing to investigate its potential proper
can do so by grep'ping all (significant) instances of Normalize()
by ANormalize() and then setting up some realistic (ie., likely to
arise during normal large-scale games) test scenario. Note that
you will want to ensure your tests are CPU-limited first, don't run
them with dynamic water and shadows enabled.

Re: Optimizing Math Functions
Hmm, nice.
You implemented InvSqrt, what about regular sqrt? Also, your initial guess magic number seems weird, why 0x5f3759d5? I would use 0x5f375a86.
Also the number of iterations needed must be considered. 0 iterations could sometimes be useful - it is very fast, and still relatively accurate.
Also, currently there are 2 iters, you could consider only 1 as it is 2x faster, still pretty accurate.
You implemented InvSqrt, what about regular sqrt? Also, your initial guess magic number seems weird, why 0x5f3759d5? I would use 0x5f375a86.
Also the number of iterations needed must be considered. 0 iterations could sometimes be useful - it is very fast, and still relatively accurate.
Also, currently there are 2 iters, you could consider only 1 as it is 2x faster, still pretty accurate.
Re: Optimizing Math Functions
OK I see it was updated.
here is a simple patch which shows a fast sqrt using InvSqrt.
Though it doesn't actually implement it anywhere,someone can do that though.
Also, sqrt isn't the only thing that can have fast approximations, there are good ones for sine and cosine as well, and some others. Maybe that is going overboard though, just a heads up.
I will try to test these later...
EDIT: humm I'm not sure which types of file extensions are allowed, ill just put the patch inside code tags...
here is a simple patch which shows a fast sqrt using InvSqrt.
Though it doesn't actually implement it anywhere,someone can do that though.
Also, sqrt isn't the only thing that can have fast approximations, there are good ones for sine and cosine as well, and some others. Maybe that is going overboard though, just a heads up.
I will try to test these later...
EDIT: humm I'm not sure which types of file extensions are allowed, ill just put the patch inside code tags...
Code: Select all
Index: float3.h
===================================================================
--- float3.h (revision 5559)
+++ float3.h (working copy)
@@ -387,6 +387,16 @@
// x = x * (1.5f - xh * (x * x));
return x;
}
+
+ /**
+ * @brief fast approximation of sqrt(x)
+ * @return approximation of sqrt(x)
+ *
+ * Uses InvSqrt and multiplies by x to get sqrt.
+ */
+ inline float ASqrt(float x) {
+ return InvSqrt(x) * x;
+ }
/**
* @brief normalizes the vector approximately
Last edited by Jonanin on 05 Mar 2008, 07:15, edited 2 times in total.
Re: Optimizing Math Functions
I don't think sin/cos approximations would work out, as they're inherent in things like rotation.
Re: Optimizing Math Functions
One thing you need to remember is that sin/cos on a computer are (nearly) always approximations due to the maximum size of a float. If we can increase speed significantly and keep the error below 1%, I think we should go for it.
Re: Optimizing Math Functions
umm...did you change your number from the Quake 3 number to the one imbaczek suggested would be better?Jonanin wrote:Hmm, nice.
You implemented InvSqrt, what about regular sqrt? Also, your initial guess magic number seems weird, why 0x5f3759d5? I would use 0x5f375a86.
Also the number of iterations needed must be considered. 0 iterations could sometimes be useful - it is very fast, and still relatively accurate.
Also, currently there are 2 iters, you could consider only 1 as it is 2x faster, still pretty accurate.
I remember reading about this a long time ago, and essentially, even though there are numbers that are theoretically better initial guesses than the quake 3 guess, they actually perform worse than the Quake 3 number.
Re: Optimizing Math Functions
Both float3::InvSqrt and float3::ASqrt should really be static; it doesn' t make sense to require user of the function to create a float3 object on the stack just to be able to calculate a (inverse) square root of a single float.
Ideally the functions don't belong in float3 at all anyway, since they have nothing to do with float3.
Ideally the functions don't belong in float3 at all anyway, since they have nothing to do with float3.
Re: Optimizing Math Functions
Indeed, I only put InvSqrt() there because ANormalize() makes
use of it and there aren't many other places where reciprocal
square roots have to be calculated. Introducing a MathHelper
compilation unit would be the better option though.
use of it and there aren't many other places where reciprocal
square roots have to be calculated. Introducing a MathHelper
compilation unit would be the better option though.
Re: Optimizing Math Functions
Well, I've tested by simply replacing float3::Normalize with float3::ANormalize, and it seems to work perfect. I don't know if this is necessarily a good test, because I don't know where the approximation would be used if not everywhere.
However I haven't tested for speed improvements. Time profiler in float3 isn't working, however lurker said he got it working once.
What's a good way to test speed then?
EDIT: Not working because TimeProfiler includes float3.h itself.
However I haven't tested for speed improvements. Time profiler in float3 isn't working, however lurker said he got it working once.
What's a good way to test speed then?
EDIT: Not working because TimeProfiler includes float3.h itself.
Re: Optimizing Math Functions
compile with debug=yes profile=yes (if you're using gcc) and check what's using the most cycles. normally sqrt shows up there near the top.
you could also try a replay of a big fight... it won't be in sync, but the framerates should be possible to compare objectively.
you could also try a replay of a big fight... it won't be in sync, but the framerates should be possible to compare objectively.
Re: Optimizing Math Functions
Well, I get about ~18 fps with 40 krog vs 40 krog on delta siege revx3 with fast math, and about ~15 avg fps with regular sqrt.
Can someone test this also and see what they get?
Note this does not include when they all blow up in a chain reaction, I get a brief 1 fps for both then.
Can someone test this also and see what they get?
Note this does not include when they all blow up in a chain reaction, I get a brief 1 fps for both then.
Re: Optimizing Math Functions
(I told him to compile with profiling repeatedly, and the specific command. I think he's deaf to it.)
Jonanin, I'll find my changes to get timeprofiler working and pastebin them.
Jonanin, I'll find my changes to get timeprofiler working and pastebin them.
Re: Optimizing Math Functions
Yes, I heard you, but the time profiling wasn't working because of the reasons I stated above. Compiling 'with time profiling' didn't help at all.
Re: Optimizing Math Functions
I believe this is the fourth time I've said this, so I'll use bold text.
Compiling with profiling is not using timeprofiler.h
Compiling with profiling is not using timeprofiler.h
Re: Optimizing Math Functions
Apparently it isn't clear to you, so I'll use bold text.
I know that.
I know that.
Also note that the flag that you gave me was different from what imbaczek said above, you told me to use "use_profiling=yes" or something similiar.Compiling 'with time profiling' didn't help at all.
Re: Optimizing Math Functions
Seems it's time for me to drop in the discussion as author of current buildsystem.
How To Profile Spring:
then test
To actually compare data the test & environment of course must be identical everytime.
This profiling should not be confused with profile_generate and profile_use flags, which, respectively, generate a profiling binary and a binary that uses the profiling data generated from running the first binary to make better guesses for branch prediction / loop unrolling etc. Hence, these flags are purely for optimization (and didn't particularly help; no measurable speed difference in my test), while the other is purely aid in manual optimizing.
How To Profile Spring:
Code: Select all
scons configure profile=yes && scons spring
Code: Select all
gprof spring.exe > profile.txt
This profiling should not be confused with profile_generate and profile_use flags, which, respectively, generate a profiling binary and a binary that uses the profiling data generated from running the first binary to make better guesses for branch prediction / loop unrolling etc. Hence, these flags are purely for optimization (and didn't particularly help; no measurable speed difference in my test), while the other is purely aid in manual optimizing.
Re: Optimizing Math Functions
Hmm, when I do gprof spring.exe > profile.txt I get:
Also, exactly how identical do the tests have to be?
Is it fine to just create units at the exact same location both times?
Code: Select all
gmon.out: no such file or directory
Is it fine to just create units at the exact same location both times?
Re: Optimizing Math Functions
best way is to play a replay.
you must run gprof after you've ran the game, gmon.out is a file that should get created when you exit spring.
you must run gprof after you've ran the game, gmon.out is a file that should get created when you exit spring.