[engine] Comparing two float3s

[engine] Comparing two float3s

Happenin' news on what is happening in the community. Content releases, new tutorials, other cool stuff.
Post Reply
User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

[engine] Comparing two float3s

Post by jK » 15 Dec 2012, 21:05

I thought about the fastest way to compare 2 vectors.

Code Spring uses atm:

Code: Select all

return std::fabs(x - f.x) <= CMP_EPS
 && std::fabs(y - f.y) <= CMP_EPS
 && std::fabs(z - f.z) <= CMP_EPS;
Currently Spring compares each component individually and you always hear branching is slow. On the other hand CPUs predict them, GPUs don't.

So I thought instead of comparing all 3 components each, why not do `Length(vecA-vecB)^2 <= CMP_EPS`? The additional multiplies & additions by the dot-product should be `for free` I thought.

So here is the code:
http://ideone.com/B9KS0f (3 compares)
http://ideone.com/zrAY3X (DOT FPU)
http://ideone.com/9eLaHD (SSE)

with the results:
3 compares: 0.98s
dot FPU: 1.3s
SSE: >5sec (Time limit exceeded)

-> another assumption turned out wrong :mrgreen:
Possible reason: It seems the early-out advantage of the individual compares (the CPU can stop the bool-check if already the first comparison fails) wins against Length()^2.

PS: duno why the SSE version is such damn slow (and even slower than FPU) anyone got an idea?

PPS: to compile the code I used `g++ -o foo.bin -O2 -mfpmath=sse -msse -msse2 foo_float3.c -DUSE_...`.
0 x

User avatar
Peet
Malcontent
Posts: 4375
Joined: 27 Feb 2006, 22:04

Re: Blog: Comparing two float3s

Post by Peet » 15 Dec 2012, 22:22

Code: Select all

peet@starscream ~/floats> g++ -o sse.bin -O2 -mfpmath=sse -msse -msse2 floats.cpp -DUSE_SSE
peet@starscream ~/floats> g++ -o dot.bin -O2 -mfpmath=sse -msse -msse2 floats.cpp -DUSE_DOT
peet@starscream ~/floats> g++ -o branch.bin -O2 -mfpmath=sse -msse -msse2 floats.cpp
peet@starscream ~/floats> time ./sse.bin 
16777216.000000 8388608.000000 16777216.000000
0.35user 0.00system 0:00.36elapsed 99%CPU (0avgtext+0avgdata 1008maxresident)k
0inputs+0outputs (0major+315minor)pagefaults 0swaps
peet@starscream ~/floats> time ./dot.bin 
16777216.000000 8388608.000000 16777216.000000
0.53user 0.00system 0:00.53elapsed 99%CPU (0avgtext+0avgdata 1004maxresident)k
0inputs+0outputs (0major+313minor)pagefaults 0swaps
peet@starscream ~/floats> time ./branch.bin 
16777216.000000 8388608.000000 16777216.000000
0.26user 0.00system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 1004maxresident)k
0inputs+0outputs (0major+314minor)pagefaults 0swaps
For me USE_SSE is notably faster than USE_DOT. Perhaps your USE_SSE compiled one is not actually using simd instructions - I believe it's sometimes a bit more effort than one would assume to convince the compiler to utilize them. At least with MSVC I've found that explicitly specifying the type's alignment is definitely a factor.

Also, I imagine it might also be pertinent to do this test with arrays of float3s (where the computation is repeated without each calculation depending on the one immediately preceding it) so that pipelining can be more of a factor.
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: Blog: Comparing two float3s

Post by jK » 15 Dec 2012, 22:41

Peet wrote:For me USE_SSE is notably faster than USE_DOT.
You got an Intel right? Me got an AMD and it seems IdeOne does so, too. Single SSE instructions seem to fail heavily on AMDs (it differs a bit when running on arrays afaik).
Peet wrote:Perhaps your USE_SSE compiled one is not actually using simd instructions - I believe it's sometimes a bit more effort than one would assume to convince the compiler to utilize them. At least with MSVC I've found that explicitly specifying the type's alignment is definitely a factor.
I used gcc native vector extension, it automatically set alignment etc.
It also can generate fallback FPU code for all ops, but it doesn't seem to be like that. Neither does a proper -march=amdfam10 change anything.
Peet wrote:Also, I imagine it might also be pertinent to do this test with arrays of float3s (where the computation is repeated without each calculation depending on the one immediately preceding it) so that pipelining can be more of a factor.
That would be a different test, implementing array driven computations in Spring code would be a heavy modification.
0 x

User avatar
Peet
Malcontent
Posts: 4375
Joined: 27 Feb 2006, 22:04

Re: Blog: Comparing two float3s

Post by Peet » 15 Dec 2012, 23:12

Yeah I am running on an i7 2630QM. FWIW i did a naive arrayified version and sse/branching performed almost identically. Pretty disappointing that SSE works so poorly on AMD...sounds like this has implications for more than just comparison operations. I suppose simd vs not-simd has sync implications as well so we can't just ditch it for AMD users ...
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: Blog: Comparing two float3s

Post by jK » 15 Dec 2012, 23:49

Peet wrote:FWIW i did a naive arrayified version and sse/branching performed almost identically.
http://ideone.com/joQ7f8 (SSE 2.02s)
http://ideone.com/3kBhPe (FPU 0.13s)
0 x

gajop
Moderator
Posts: 2952
Joined: 05 Aug 2009, 20:42

Re: Blog: Comparing two float3s

Post by gajop » 16 Dec 2012, 00:59

not sure your original test is "correct", as the if will always be false and it'll be short circuited at the first vector component
example with different numbers (different CMP_EPS & starting a value):
http://ideone.com/TFarBl
http://ideone.com/psA6zO
0 x

User avatar
Beherith
Moderator
Posts: 4925
Joined: 26 Oct 2007, 16:21

Re: Blog: Comparing two float3s

Post by Beherith » 16 Dec 2012, 01:58

I thought you were going to use an eps variable, or are you planning to inline to an immediate epsilon value?
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: Blog: Comparing two float3s

Post by jK » 16 Dec 2012, 04:16

gajop wrote:not sure your original test is "correct", as the if will always be false and it'll be short circuited at the first vector component
example with different numbers (different CMP_EPS & starting a value):
http://ideone.com/TFarBl
http://ideone.com/psA6zO
seems gcc optimized something away in the if-clause, cause of `a = b;`. Replacing it with `a += b;` gives again an advantage for 3comps:
http://ideone.com/GS2b4H (dot 1.30s)
http://ideone.com/QnPH9S (3comps 0.89s)

Also tried another type of the for-loop in the hope gcc cannot optimize it away:
http://ideone.com/Strg3k (dot 3.48s)
http://ideone.com/pfol5Z (3comps 2.69s)


edit: much better version:
http://ideone.com/cq3pjw (3comps 1.03s)
http://ideone.com/DK0HWb (dot 1.26s)
Last edited by jK on 16 Dec 2012, 04:40, edited 1 time in total.
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: Blog: Comparing two float3s

Post by jK » 16 Dec 2012, 04:17

Beherith wrote:I thought you were going to use an eps variable, or are you planning to inline to an immediate epsilon value?
Everything in the code is auto-inlined.
0 x

User avatar
Beherith
Moderator
Posts: 4925
Joined: 26 Oct 2007, 16:21

Re: Blog: Comparing two float3s

Post by Beherith » 16 Dec 2012, 12:47

Scream if you want access to an idle ubuntu server with a sandy bridge with a g620 cpu.
0 x

a1983
Posts: 55
Joined: 02 Dec 2009, 12:01

Re: Blog: Comparing two float3s

Post by a1983 » 17 Dec 2012, 06:53

May be try speedup not vector, but float equality checking.
Like, for example, here:
http://www.cygnus-software.com/papers/c ... floats.htm
0 x

User avatar
PicassoCT
Journeywar Developer & Mapper
Posts: 9928
Joined: 24 Jan 2006, 21:12

Re: Blog: Comparing two float3s

Post by PicassoCT » 17 Dec 2012, 10:49

time in seconds of results were right at home, here on the laptop they differ - guiltguess is osscheduling.
0 x

zerver
Spring Developer
Posts: 1358
Joined: 16 Dec 2006, 20:59

Re: Blog: Comparing two float3s

Post by zerver » 17 Dec 2012, 16:01

Interesting blog.

Indeed when doing AND

Code: Select all

if (A && B && C())
you should put the statement that is most likely to be false first, and when doing OR

Code: Select all

if (A || B || C())
you should put the statement that is most likely to be true first.

Actually I have had friends who call themselves C programmers that were totally unaware of the optimizations that are in effect and their implications. I.e. why the f-k does C() not get called? LoL
0 x

User avatar
PicassoCT
Journeywar Developer & Mapper
Posts: 9928
Joined: 24 Jan 2006, 21:12

Re: Blog: Comparing two float3s

Post by PicassoCT » 17 Dec 2012, 18:20

But its glaringly obvious once you done assembler for a semester.. jmpIfEquals im looking at you.
0 x

Post Reply

Return to “Community Blog”