[engine] Comparing two float3s

[engine] Comparing two float3s

Happenin' news on what is happening in the community. Content releases, new tutorials, other cool stuff.
Post Reply
User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

[engine] Comparing two float3s

Post by jK »

I thought about the fastest way to compare 2 vectors.

Code Spring uses atm:

Code: Select all

return std::fabs(x - f.x) <= CMP_EPS
 && std::fabs(y - f.y) <= CMP_EPS
 && std::fabs(z - f.z) <= CMP_EPS;
Currently Spring compares each component individually and you always hear branching is slow. On the other hand CPUs predict them, GPUs don't.

So I thought instead of comparing all 3 components each, why not do `Length(vecA-vecB)^2 <= CMP_EPS`? The additional multiplies & additions by the dot-product should be `for free` I thought.

So here is the code:
http://ideone.com/B9KS0f (3 compares)
http://ideone.com/zrAY3X (DOT FPU)
http://ideone.com/9eLaHD (SSE)

with the results:
3 compares: 0.98s
dot FPU: 1.3s
SSE: >5sec (Time limit exceeded)

-> another assumption turned out wrong :mrgreen:
Possible reason: It seems the early-out advantage of the individual compares (the CPU can stop the bool-check if already the first comparison fails) wins against Length()^2.

PS: duno why the SSE version is such damn slow (and even slower than FPU) anyone got an idea?

PPS: to compile the code I used `g++ -o foo.bin -O2 -mfpmath=sse -msse -msse2 foo_float3.c -DUSE_...`.
User avatar
Peet
Malcontent
Posts: 4383
Joined: 27 Feb 2006, 22:04

Re: Blog: Comparing two float3s

Post by Peet »

Code: Select all

peet@starscream ~/floats> g++ -o sse.bin -O2 -mfpmath=sse -msse -msse2 floats.cpp -DUSE_SSE
peet@starscream ~/floats> g++ -o dot.bin -O2 -mfpmath=sse -msse -msse2 floats.cpp -DUSE_DOT
peet@starscream ~/floats> g++ -o branch.bin -O2 -mfpmath=sse -msse -msse2 floats.cpp
peet@starscream ~/floats> time ./sse.bin 
16777216.000000 8388608.000000 16777216.000000
0.35user 0.00system 0:00.36elapsed 99%CPU (0avgtext+0avgdata 1008maxresident)k
0inputs+0outputs (0major+315minor)pagefaults 0swaps
peet@starscream ~/floats> time ./dot.bin 
16777216.000000 8388608.000000 16777216.000000
0.53user 0.00system 0:00.53elapsed 99%CPU (0avgtext+0avgdata 1004maxresident)k
0inputs+0outputs (0major+313minor)pagefaults 0swaps
peet@starscream ~/floats> time ./branch.bin 
16777216.000000 8388608.000000 16777216.000000
0.26user 0.00system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 1004maxresident)k
0inputs+0outputs (0major+314minor)pagefaults 0swaps
For me USE_SSE is notably faster than USE_DOT. Perhaps your USE_SSE compiled one is not actually using simd instructions - I believe it's sometimes a bit more effort than one would assume to convince the compiler to utilize them. At least with MSVC I've found that explicitly specifying the type's alignment is definitely a factor.

Also, I imagine it might also be pertinent to do this test with arrays of float3s (where the computation is repeated without each calculation depending on the one immediately preceding it) so that pipelining can be more of a factor.
User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: Blog: Comparing two float3s

Post by jK »

Peet wrote:For me USE_SSE is notably faster than USE_DOT.
You got an Intel right? Me got an AMD and it seems IdeOne does so, too. Single SSE instructions seem to fail heavily on AMDs (it differs a bit when running on arrays afaik).
Peet wrote:Perhaps your USE_SSE compiled one is not actually using simd instructions - I believe it's sometimes a bit more effort than one would assume to convince the compiler to utilize them. At least with MSVC I've found that explicitly specifying the type's alignment is definitely a factor.
I used gcc native vector extension, it automatically set alignment etc.
It also can generate fallback FPU code for all ops, but it doesn't seem to be like that. Neither does a proper -march=amdfam10 change anything.
Peet wrote:Also, I imagine it might also be pertinent to do this test with arrays of float3s (where the computation is repeated without each calculation depending on the one immediately preceding it) so that pipelining can be more of a factor.
That would be a different test, implementing array driven computations in Spring code would be a heavy modification.
User avatar
Peet
Malcontent
Posts: 4383
Joined: 27 Feb 2006, 22:04

Re: Blog: Comparing two float3s

Post by Peet »

Yeah I am running on an i7 2630QM. FWIW i did a naive arrayified version and sse/branching performed almost identically. Pretty disappointing that SSE works so poorly on AMD...sounds like this has implications for more than just comparison operations. I suppose simd vs not-simd has sync implications as well so we can't just ditch it for AMD users ...
User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: Blog: Comparing two float3s

Post by jK »

Peet wrote:FWIW i did a naive arrayified version and sse/branching performed almost identically.
http://ideone.com/joQ7f8 (SSE 2.02s)
http://ideone.com/3kBhPe (FPU 0.13s)
gajop
Moderator
Posts: 3051
Joined: 05 Aug 2009, 20:42

Re: Blog: Comparing two float3s

Post by gajop »

not sure your original test is "correct", as the if will always be false and it'll be short circuited at the first vector component
example with different numbers (different CMP_EPS & starting a value):
http://ideone.com/TFarBl
http://ideone.com/psA6zO
User avatar
Beherith
Posts: 5145
Joined: 26 Oct 2007, 16:21

Re: Blog: Comparing two float3s

Post by Beherith »

I thought you were going to use an eps variable, or are you planning to inline to an immediate epsilon value?
User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: Blog: Comparing two float3s

Post by jK »

gajop wrote:not sure your original test is "correct", as the if will always be false and it'll be short circuited at the first vector component
example with different numbers (different CMP_EPS & starting a value):
http://ideone.com/TFarBl
http://ideone.com/psA6zO
seems gcc optimized something away in the if-clause, cause of `a = b;`. Replacing it with `a += b;` gives again an advantage for 3comps:
http://ideone.com/GS2b4H (dot 1.30s)
http://ideone.com/QnPH9S (3comps 0.89s)

Also tried another type of the for-loop in the hope gcc cannot optimize it away:
http://ideone.com/Strg3k (dot 3.48s)
http://ideone.com/pfol5Z (3comps 2.69s)


edit: much better version:
http://ideone.com/cq3pjw (3comps 1.03s)
http://ideone.com/DK0HWb (dot 1.26s)
Last edited by jK on 16 Dec 2012, 04:40, edited 1 time in total.
User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: Blog: Comparing two float3s

Post by jK »

Beherith wrote:I thought you were going to use an eps variable, or are you planning to inline to an immediate epsilon value?
Everything in the code is auto-inlined.
User avatar
Beherith
Posts: 5145
Joined: 26 Oct 2007, 16:21

Re: Blog: Comparing two float3s

Post by Beherith »

Scream if you want access to an idle ubuntu server with a sandy bridge with a g620 cpu.
a1983
Posts: 55
Joined: 02 Dec 2009, 12:01

Re: Blog: Comparing two float3s

Post by a1983 »

May be try speedup not vector, but float equality checking.
Like, for example, here:
http://www.cygnus-software.com/papers/c ... floats.htm
User avatar
PicassoCT
Journeywar Developer & Mapper
Posts: 10450
Joined: 24 Jan 2006, 21:12

Re: Blog: Comparing two float3s

Post by PicassoCT »

time in seconds of results were right at home, here on the laptop they differ - guiltguess is osscheduling.
zerver
Spring Developer
Posts: 1358
Joined: 16 Dec 2006, 20:59

Re: Blog: Comparing two float3s

Post by zerver »

Interesting blog.

Indeed when doing AND

Code: Select all

if (A && B && C())
you should put the statement that is most likely to be false first, and when doing OR

Code: Select all

if (A || B || C())
you should put the statement that is most likely to be true first.

Actually I have had friends who call themselves C programmers that were totally unaware of the optimizations that are in effect and their implications. I.e. why the f-k does C() not get called? LoL
User avatar
PicassoCT
Journeywar Developer & Mapper
Posts: 10450
Joined: 24 Jan 2006, 21:12

Re: Blog: Comparing two float3s

Post by PicassoCT »

But its glaringly obvious once you done assembler for a semester.. jmpIfEquals im looking at you.
Post Reply

Return to “Community Blog”