Some graphics testing results

Argh · Post by **Argh** » 03 Nov 2009, 01:11

OK, I went and tested some things. I wanted to find out where Spring's bottlenecking atm. Here are my results:

1. Spring's not bottlenecked on polycount, on a typical scene. Period.

2. Spring's not fillrate limited. Did the classic "run windowed, keep making it smaller" test, and there's no real difference, when viewing a complex scene.

3. Did a test where I limited the amount of texture throughput. There were speed changes seen, and they weren't small enough that I'd say that this couldn't be optimized a bit more. This suggests that Spring would run better if Units were sorted into batches, not by UnitID, but by tex1, for better batching and direct support for projects that want to use atlases.

4. Did a test of Piece transforms (i.e., built something with a ridiculous number of named Pieces with a single triangle apiece). Eek! This really, really, really matters.

One question comes immediately to mind: if Pieces aren't currently in motion, and the Unit isn't in motion, are we updating their position and vector? If so... we should stop doing that, it's wasting a lot of CPU. Store it in RAM, only fetch it or change it if COB / Lua requests a change.

Moreover, why update the positions and do all the transforms of Unit Pieces we can't currently see (which it appears to be doing)? We already can derive their positions from their COB states, and extrapolate forwards from all COB events (i.e., if it started moving forward X speed Y period ago, then it's at Z, and the only thing that can alter that is... COB). We don't need the Piece transforms, for off-screen things, we just need a record of the COB state, with the sole exception of weapons (i.e., anything within an AimWeaponX loop), where they need to be tracked explicitly, because they're under AI control. But with the rest, we just need to track the Unit's position and vector and watch for changes to the COB state only.

Anyhow, these are just my thoughts, after doing some tests. If you want the test setup for test 4, let me know, but it can be replicated easily enough. I thought it was interesting, though, that when I actually tested to find the bottleneck for World Builder scenes (no fighting going on, no AIs, not much happening in COB), it's almost entirely CPU side, not GPU at all, other than gaining back some FPS from better batching and atlases.

aegis · Post by **aegis** » 03 Nov 2009, 01:23

depends on your video card.

Argh · Post by **Argh** » 03 Nov 2009, 01:25

I am on a GeForce 7800GS on AGP.

I don't think it's fair to include stuff like Intel chipsets, that can't run shaders, in terms of where the bottleneck is- I have no problems with Spring continuing to give them some support, but they're not really in the ballpark for results.

On a new (gaming) card, you're just going to see higher numbers and the same issues, I predict, just like we did with Saktoth's polycount test last month. There, performance changes were linear, in terms of single-object Units. I tested the multiple-Piece theory because I was expecting to see fillrate issues... and when that turned out to be largely irrelevant, it and texture throughput were all that was left. Texture throughput is non-trivial, but as I said, we could batch specifically to support atlases, and give people that option.

BrainDamage · Post by **BrainDamage** » 03 Nov 2009, 01:30

post benchmark code and your number results

Argh · Post by **Argh** » 03 Nov 2009, 01:41

I'll build a demo for number 4, the rest of them anybody can easily replicate.

The fillrate test is simple enough to perform- just keep shrinking a windowed Spring, observe the framerate.

The texture test is simple also- test the difference in a S3O game (CA, S'44, P.U.R.E., Gundam) when you do give all, vs. giving the same number of Units. Don't use BA, obviously, as it's already taking full advantage of an atlas.

Polycount tests have been done multiple times, and the observed results are always similar.

aegis · Post by **aegis** » 03 Nov 2009, 01:43

clipping?

BrainDamage · Post by **BrainDamage** » 03 Nov 2009, 01:49

Argh wrote:The texture test is simple also- test the difference in a S3O game (CA, S'44, P.U.R.E., Gundam) when you do give all, vs. giving the same number of Units. Don't use BA, obviously, as it's already taking full advantage of an atlas.

not a valid test, the mod differences ( scripts, etc ) can heavily affect the result.
Again, I'd like to see actually the numbers you got, "not limited" is not a good enough answer for a benchmark in my dictionary.

Argh · Post by **Argh** » 03 Nov 2009, 01:59

Clipping is an issue, absolutely, I've seen that issue come up a lot in particle systems. But that's so situational that I don't think we can control for it, and, due to the shadowmaps working the way they do, we can't use scissor very easily. I tried some experiments with that, with horrible results. This may just be me not knowing how to use it properly, ofc.

not a valid test, the mod differences ( scripts, etc ) can heavily affect the result.

Well, download the World Builder archive then, because I'm using World Builder maps, which are very tight on CPU use per static Unit as my baseline. They use more CPU than Features, but they're still not exactly doing much in COB. The World Builder Editor is included, and if you'll excuse the crudeness of that old version, since you won't be using it anyway, you can observe exactly what I'm seeing.

Very few of them do anything, and the few that do are just using COB as a clock for infrequent events, generally, unless they're destroyed. What I see there, in World Builder (not P.U.R.E.) is that scenes that add very little to the geometry load and not much texture usage run at 20-50 FPS, whereas the empty map runs at 90-ish (with shadows and reflectivity on, mind).

Anyhow... I'll get the Piece demo built and post it.

If necessary, I'll provide the latest World Builder and a World Builder map in a package, for testing, so that my claims about CPU use can be verified (although I'm pretty sure screenshots showing how much CPU rendering is using from 'b' mode should be sufficient).

However, it's huge (we're talking maybe 100MB for map + WB), so I'm reluctant to provide it, unless people are actually willing to test with it.

AF · Post by AF » 03 Nov 2009, 02:05

Argh what I would like to see is a set of lua widgets and scripts to automate these tests and output real hard figures we can draw graphs with, or at least generate A number.

If you can do that, then you'll fidn people bend to your will much more readily.

Argh · Post by **Argh** » 03 Nov 2009, 02:22

The problem with that, AF, is that then we're testing the wrong things.

Lua Widgets can't show us CPU load due to Piece transforms. The graphics bottleneck is not entirely on the CPU side, but it's pretty close.

OK... until I get the demo done, here are screenshots to chew on. These are shrunk; I run at 1600/1200. Note that the CPU use is "spiky"- I assume that's the Piece updates, because it certainly isn't anything that's running in COB, you can see what 'b' thinks is the load. Note that 'b' only shows CPU usage; it may show big "draw use" due to GPU choke, but other tests have repeatedly indicated that that's not what's happening. If it was, then the fillrate test should have resulted in enough scene culling to show a dramatic improvement... and it does not.

Shadows ON:

Shadows OFF:

Anyway, the intent isn't to browbeat anybody atm.

If I wanted to do that, I'd talk seriously about getting rid of as much fixed function pipeline stuff as we can, because that's the best way forward, period. If people want to talk about that, we can talk about it, but frankly I think it's a waste of time atm- until I get or write a model parser, I can't prove that certain things can be done.

Argh · Post by **Argh** » 03 Nov 2009, 02:43

Lastly, here's a simple one. 100 World Builder "commanders", at > 4000 triangles apiece, using Kloot's normalmap shader with my changes and 3 512 textures.

Far more polycount than we were processing in that scene- that obviously isn't the issue. Far more GLSL to run through my GPU- that ain't it. So, that leaves CPU-side stuff, or pipeline from CPU to GPU as the bottleneck. It should be obvious, but I guess it's not.

Post by **Kloot** » 03 Nov 2009, 16:48

Moreover, why update the positions and do all the transforms of Unit Pieces we can't currently see (which it appears to be doing)?

Because piece transforms are synced data.

if Pieces aren't currently in motion, and the Unit isn't in motion, are we updating their position and vector?

No.

two screenshots ... I assume that's the Piece updates

Why would you even begin to make such an assumption when 1) the colors of the spiked graphs *only* match the Shadows / Reflect and Draw World boxes in the legend directly above them (always the only two entries of significance when there is no other simulation going on in the background), both of which have *nothing* at all to do with piece updates yet *everything* with repeated terrain rendering, and 2) the total Scripts load is 0.18%?

Argh · Post by **Argh** » 04 Nov 2009, 04:07

Well, why are Piece transforms synced data, other than events that need to be, like weapons? I mean, seriously... why use CPU on events we can't see, for things that don't matter? Why not treat them like a particle system, or simply treat their state as indeterminate until we need to actually draw them?

Anyhow... I'm offline until tomorrow, and this isn't my box, so I'll keep it short. I'm just looking for the bottleneck- that's all. Not an argument.

My contention is, and remains, that the bottleneck is on the CPU side, not the GPU. The bit with the 100 guys pretty much demonstrates that my GPU can handle a lot more triangles / depthtests / etc., than the WB scene. I predict that if I double or even triple the tricount, the WB scene will render at about the same rate, because that's not the bottleneck.

aegis · Post by **aegis** » 04 Nov 2009, 04:39

particle effects can't affect the game, piece positions can

yuritch · Post by **yuritch** » 04 Nov 2009, 07:58

Argh, piece positions are used for things that affect sync. Let's see, weapon emit points, transport pickup/drop points, script uses for whatever purpose (get PIECE_XZ, etc., and unit scripts are synced). Piece coords have to be consistent across the simulation, so they have to be recalculated constantly, independently of drawing.

Argh · Post by **Argh** » 05 Nov 2009, 21:24

Yeah, I know that. I'm just saying... when you look at each of those things, it's not COB, it's just some Piece positions in particular.

We could have, for example, a callout that requested sync tracking of a given Piece- otherwise it's extrapolated only when visible. That could save a lot of CPU.

That said, Kloot's objections are worth investigating.

I thought that the "draw world" included the CPU time actually performing the Unit transforms- the COB measurement is not the same thing in the profiler at all, and tells us nothing about where Spring's doing the fixed-function stuff that I am almost certain is a major source of slowdown in a big complex scene. Is that true or not?

Argh · Post by **Argh** » 06 Nov 2009, 21:27

Sorry about the delay posting the demo, I have had IRL stuff going on this week and I've been AFK most of it. Maybe Sunday. Again, sorry about the delay, it's coming, I've just been busy.

Beherith · Post by **Beherith** » 07 Nov 2009, 00:34

Argh, what I would highly recommend you do for CPU load testing (it helped me LOADS when testing my stuff) is to build a standard release build of spring in visual studio (extremely simple, shouldnt take more than 30 mins with downloading everything too).
Then launch a release build with your preferred settings, then run a profiler called Very Sleepy (free).
http://www.codersnotes.com/sleepy

It just hooks into the exe and samples the code at specific intervals. It auto decodes symbols to source, and even lets you browse source code and see how much cpu time each line is taking. You will absolutely love the simplicity and usability.

AF · Post by AF » 07 Nov 2009, 02:30

WOOOO NTai profiling time ^_^ thanks beherith!

Argh · Post by **Argh** » 07 Nov 2009, 03:20

Sweet, it would be nice to have something I can use for this. I will take a look at that when I am back to work on things on Sunday if I can squeeze it in.

Spring RTS Engine

Some graphics testing results

Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results

Re: Some graphics testing results