Code Spring uses atm:
Code: Select all
return std::fabs(x - f.x) <= CMP_EPS
&& std::fabs(y - f.y) <= CMP_EPS
&& std::fabs(z - f.z) <= CMP_EPS;
So I thought instead of comparing all 3 components each, why not do `Length(vecA-vecB)^2 <= CMP_EPS`? The additional multiplies & additions by the dot-product should be `for free` I thought.
So here is the code:
http://ideone.com/B9KS0f (3 compares)
http://ideone.com/zrAY3X (DOT FPU)
http://ideone.com/9eLaHD (SSE)
with the results:
3 compares: 0.98s
dot FPU: 1.3s
SSE: >5sec (Time limit exceeded)
-> another assumption turned out wrong
Possible reason: It seems the early-out advantage of the individual compares (the CPU can stop the bool-check if already the first comparison fails) wins against Length()^2.
PS: duno why the SSE version is such damn slow (and even slower than FPU) anyone got an idea?
PPS: to compile the code I used `g++ -o foo.bin -O2 -mfpmath=sse -msse -msse2 foo_float3.c -DUSE_...`.