for the shifsts:
on p4 and above, shift are no longer faster than multiplications!
rule for inline:
if the function is declared and defined inside the class {} scope,
the compiler will always try to make it inline. if outside,
you should specify the inline keyword. inlining preserves
3ck for the calling, plus one clock for the push on the stack for every
parameter plus 1ck for reading parameters from the stack inside
the caller context, plus all the copy constructor involved, and maybe
I've forgot something.
note that inlining all the code will have the same sideeffect
of unrolling loops: will trash the cache as snow on hell! and memory
is pretty slow today.
consider to use those loop if short of speed:
int count=iterationCount +1;
while(--i) {
....
}
and those tips:
http://www.devx.com/amd/Article/21545
don't forgot the extreme unroll technique if the index
is not important, only the iteration count suffice as
for iterators:
int n = (count + 7) / 8; /* count > 0 assumed */
switch (count % 8)
{
case 0: do { operation;
case 7: operation;
case 5: operation;
case 4: operation;
case 3: operation;
case 2: operation;
case 1: operation;
} while (--n > 0);
}
also: always code a slow alghorithm, then switch on the fastes, and
DON'T throw away the older: use it for debugging
(execute both in real usage, not tests, and if the result differs throw
an exception)