Speed comparisons

munch · Post by **munch** » 18 Dec 2005, 00:03

Two unrelated topics here

1. Pipelining. x86 has a lot of pipelining in it - that is to say the CPU starts work on the next instruction before it's finished working out the previous one. In fact modern pipelines are quite long (i.e more than a couple of instructions). If you have to branch then the whole pipeline gets binned and the CPU has to start from scratch. You can save a lot of time by avoiding branching. For example you may be tempted to do this

Code: Select all

{
   do some stuff;
   if (rare_case)
   {
      // skip this instruction when not needed to save time
      simple instructions;
   }
   do some more stuff;
 }

In fact it's quicker to try to write the code like this:

Code: Select all

{
   do some stuff;
   simple instructions;
   do some more stuff;
}

because your pipeline stays intact. This can be quite tricky to do, but typical tricks usually involve adding zero or multiplying by one etc. in the bit of code you'd skip when not needed.

2. Portability. Bear in mind that Spring is designed to work on a wide range of processors. If your AI requires shedloads of CPU to work, you're really cutting down the number of platforms it can work on. You may be happy with that, but you should be aware that you are creating a portability problem with older machines if you require too many CPU cycles for the AI.

Hope this helps.

Munch

cain · Post by **cain** » 18 Dec 2005, 09:32

actually, latest processor will tag the jump as rare and take te other branch without penality, the latest athloan doesn't even pipeline jumps, mark instruction as conditionals.

older processor have short pipelines and doesn't need all of this care.

a great technique for ai is to spread cicles between update(), for example:

Code: Select all


//slow cicle that trash cpu

update() {
   for (i=0;i<unitcount;i++) {
   //blah f(i);
   }
}

int i;
update() {

if (!oncicle) {
i=0;
oncicle=true;
}
if (oncicle) {
    i++;
    //blah f(i);
}
 
if (i==unitcount) {
   oncicle=false;
}

}


//could be obviously optimized. this is written for readability

renrutal · Post by **renrutal** » 18 Dec 2005, 10:41

I really don't get what that code could do...

Do you mean, more units in the game, more time between the ai instructions?

Like, with 100 units in the game, it should take give orders every 1 second, 200 units, every 2 seconds?

cain · Post by **cain** » 18 Dec 2005, 11:06

you can split big non critical loop over more game frames.
if you would, you could tune it to do more than one operation per
update, still splitting the longer computation all over the frames.

you can spread for example terrain analisys over the time, allowing for a finer grain computation.

mantaining the similarity with the unit example:

that does one operation per frame, so 30 op/second at 1x speed.
you give order every frame but one order per frame.
It doesn't seem as useful, but imagine if "giving a order" is
replaced with a long big computation that should be done iteratively.

renrutal · Post by **renrutal** » 18 Dec 2005, 23:36

Doing this stuff concurrently in threads isn't better and less costly?

Tobi · Post by **Tobi** » 19 Dec 2005, 00:05

If the user has a dual processor / dual core system, yes it is better.

It may already be better to use threads on a hyper threading cpu, but I'm just guessing there.

On older stuff it (threading) must be worse than non-threading, though only by a small bit.

renrutal · Post by **renrutal** » 19 Dec 2005, 03:32

Threads are not hardware dependent. They're a very effective way of coding things. Superscalar processors just make a better use of them, but they're good anywhere, specially when you're waiting for a result. Instead of putting the processor in a wait state/loop, you just tell a thread to beep you when it's ready, then you go do another things.

And I've seen people here telling others to put their processors in a loop.

Since 95 most the major OS have full threading support, so I doubt it can be considered "worse" then non-threading in any important case.

SoftNum · Post by **SoftNum** » 19 Dec 2005, 04:59

Threads are basically an easier eay of doing what you mention on a single proc system. The thread can be spawned in Update(), and then you can check to see if the thread has completed each update till it's time to spawn a new thread.

In fact, this is exactly what my AI does.

The problem with threads is you need to be careful about more then one thread trying to access the same thing at the same time. But there are plenty of ways around this.

And as a note to Tobi, your Windows system is running lots of processes at once to begin with. NT (which includes 200, XP, Vista) is very good at threading and multi-processing.

cain · Post by **cain** » 19 Dec 2005, 19:24

threads are a quick and dirty way to use ide cicle of the cpu for doing things.

better coding and ordering are usualy a better choiche; threads are hardware dependent (each cpu has it's own context swich microcodes) and are costy, even on multicore/dualprocessor.

x86 are not superscalar: on windows, per cpu threading will actually slows every thread, as the scheduling has to be done twice AND in a multicore lock.

read at this some cpu info and history. contains also dual-cpu benchmarks (on page two)

http://www.emulators.com/docs/pentium_1.htm

windows is quite good using multy thread context, but on dual processor resource locking actually slows down the things. consider that you have a big hardware log every screen refresh.

also threading in never easy. stl containers doesn't have proper syncronizations, c++ doesn't support native locking (have to use mutex instead of sinchronyzed) and a lot of nasty things. if you want to thread some operation, be sure that you don't have access problem every other purpose will introduce problems: for example you could give unit a order out of the update function, and nobody knows if this is an acceppted behaviour or if it will crash the game. you have to put the order onto a queue, lock the thread, pick up the order from within an update, give the order and then restart the thread. and so on.

SoftNum · Post by **SoftNum** » 20 Dec 2005, 01:01

So, how exactly do you think your Windows XP machine does all 10-20 odd things it does at once? Threads arn't 'Quick and dirty' they're a good idea for scalar programming.

Dual Core, or Dual CPU computers don't actually benefit people unless the coder breaks the tasks up into threads that can be done simultaniously. Most all games out today are written in a single thread. So they don't run appriciably better on a dual core or dual processor machine. They do a little but, because windows can dump background tasks onto the other thread.

Now, as to your other point, yes Threads arn't exactly _easy_ to program in, really. But if you use proper encapsulation, it's not that hard to lock down, spit out the orders queue, then go back to work. But just because something is hard to do doesn't make it evil.

cain · Post by **cain** » 20 Dec 2005, 09:04

well.. because windows xp runs on the shiny new 3Gig P4 with a gig or so of ram.

office runs great. abiword runs grat ALSO on a 486.

never said that threading is evil. it's unnecessary, in my point of view. consider that as a java programmer I'm fighting every day battle with the awt main tread that locks up at every event. my last program was a whirling mass of concurrent processes. yeah, it's quick, does what it does, and have a nice and well structured platform to run in. shm/pthread/mutex are really a different mode of operation. doing threaded thing's is not that bad. it's doing it properly that will produce a lot of overload.

Still, I'm converting the gameinterface to an event interface, to have a proper framevork to develop with. making an interface-level thread access syncronization could be the way.

However remember that xp is NOT a preemptive system, and that the foremost thread gets all the cpu: with this trick the main app seems to run speedy and the user have the perception of the system being fast, while the background task runs comparatively slow.

Spring RTS Engine

Speed comparisons

pipelining + portability