Spring RTS Engine

Argh · Post by **Argh** » 16 Dec 2009, 11:49

OK, tried Sleepy out. I don't get the same display as you do. I don't know how I'm supposed to interpret the output, either. All that said, assume I don't know how it works, and your number's useful. That said, just because the groundDrawer function is taking up a lot of time... does not necessarily mean it's GPU, it's not like that function just says, "here's our display list and the shader, go".

It just means groundDrawer is taking a long time to return, doesn't it?

To accurately profile how long the actual draw operation is taking, we'd need to build a function that just says, "go" and time how long it takes for that to return, over a long enough sample period. Otherwise we're just guessing, and I really have a very hard time believing it's GPU, considering all the stuff I've seen with shaders over the last few months. Not ruling it out, mind you... it's just that I'd be very, very surprised, now that I've seen the Unpleasantville test.

Beherith · Post by **Beherith** » 16 Dec 2009, 12:03

Holy crap, seems like something IS afoot.

When zoomed out i get 120 fps on your posted map argh. When zoomed in all the way I get 51 fps, and sm3 drawing is taking up 34% cpu!
With /wiremap you can see the insane amount of wasted tris on highest detail. But this still does not explain why 5 layers with bump is 18 fps no matter what the detail level is.

Baczek:
http://www.codersnotes.com/sleepy use Very sleepy not just sleepy, as it has awesome UI.

It supports any native Windows app, if it has standard PDB debugging information. No recompilation is necessary ÔÇô it can just attach to any app as itÔÇÖs running.

I dont know if mingw outputs standard pdb.

Argh:
Heres how you interpret the results; Inclusive time is the time spent in the function and in all functions called by this function. Exclusive is the time spend purely in that function.
If a() calls b(), and a() is 50% inclusive, and 40% exclusive, that means that b() is 10% inclusive.

Argh · Post by **Argh** » 16 Dec 2009, 12:08

Good, I'm not crazy, then.

If my thoughts on this are accurate... then when you turn the detail level all the way down, it will not make a great deal of difference. A few percent at most. It ain't the geometry, or the shader, or even the texture loads. OK, it *might* be the shader, but I really am doubting that atm, I've read that, and it's pretty bare-bones.

One thing about the shader really bothers me, though- why isn't it handling all the layers in one shader pass, with unused layers simply not used? Is it literally redrawing everything multiple times? If so... meh... it's truly time for the "build-a-map" concept, imo, because that's just never going to work well, period.

I'd have to read the groundDrawer code to even make a wild stab... my gut feeling is that it's either the tesselation code (really doubt it, it looks like stock old-skool SMF code just from the types of quads it builds, and jc probably just ported his previous work there, but get confirmation since I have no idea wtf I'm talking about)... but my real gut feeling is that somehow those blend layers aren't being written in a static way, and are getting redone on a regular basis, which they shouldn't be. Just a hunch, mind you- but the weird "skipping" I see, where the game state just plain halts for a few milliseconds when I pan around a lot... tell me something very odd's afoot.

Beherith · Post by **Beherith** » 16 Dec 2009, 12:26

Arghs testmap zoomed in, high detail. 45 fps.

If you notice, 63% is for only DRAW! While 34% of total cpu time is sm3 draw!

Argh · Post by **Argh** » 16 Dec 2009, 12:29

Running empty, or with the WB stuff? I am assuming "empty".

Beherith · Post by **Beherith** » 16 Dec 2009, 12:31

Meaning? I just loaded your map. I dont have pure installed.

Argh · Post by **Argh** » 16 Dec 2009, 12:32

OK, so no whole giant city making the numbers suspect, etc. That is disturbing, tbh. Lemme repeat my test over here, with the WB stuff all removed.

I got: 60FPS, zoomed in, detail 12.

45-48FPS, zoomed out to mid-level (about 900-1000 high). That must be polycount at that point. It really is pretty insane, at detail level 12

In "b", I see almost 11% CPU time is being used on "ground update"... but this is a voidwater, notDeformable map. Very odd.

Beherith · Post by **Beherith** » 16 Dec 2009, 12:36

Does anyone want my built msvc executable with PDB?

Post by **jcnossen** » 16 Dec 2009, 23:19

The bottleneck is definitely all GPU.
All vertex and index data are stored in static vertex buffers, and change only when you move the camera.

Forboding Angel · Post by **Forboding Angel** » 16 Dec 2009, 23:30

Just a note. I run sm3 at .32 (that's the view radius I normally use for smf) because in top down mode it looks fine. At .32 sm3 runs smooth as butter (and better than smf).

Doesn't it seem a little odd to be doing all this testing at 12? How many of you actually use 1200 viewradius in smf (assuming you could actually ever get it that high). The highest I can get at smf is 400ish (haven't tried higher, but my fps by that point is in the teens. I can view 12 in sm3 and still have about 30-40 fps and it's as picture perfect as smf@400 (if not moreso). That right there tells you that sm3 performance is great.

Someone run a long BA bot battle on sm3 .32 with tons of units and stuff and see how it performs vs smf. That would be a lot more telling wouldn't it?

Argh · Post by **Argh** » 17 Dec 2009, 03:47

All vertex and index data are stored in static vertex buffers, and change only when you move the camera.

But in a real game, you're constantly moving the camera. So, is that why it nearly halts the gamestate, if I pan really fast?

If so, why not use a static-mesh strategy- divide the mesh into sectors, pre-subdivided according to detail levels, keep it all static unless we need to adjust map geometry?

Beherith · Post by **Beherith** » 17 Dec 2009, 04:33

Argh wrote: If so, why not use a static-mesh strategy- divide the mesh into sectors, pre-subdivided according to detail levels, keep it all static unless we need to adjust map geometry?

Ive tried pre-subdivided detail levels. stitching them back together with no tears is nearly impossible.

Argh · Post by **Argh** » 17 Dec 2009, 04:50

What if they're meshed at runtime, and don't use LOD? Just geometry chunks?

Post by **jcnossen** » 17 Dec 2009, 16:42

If so, why not use a static-mesh strategy- divide the mesh into sectors, pre-subdivided according to detail levels, keep it all static unless we need to adjust map geometry?

That is too much data, I tried that once.

Master-Athmos · Post by **Master-Athmos** » 18 Dec 2009, 00:35

Why should it be too much data? Once again I want to point at the chunked terrain + e.g. quadtree approach I linked. It's a widespread method used to actually make big and detailed terrain possible. With RTS games usually having a top-down view some algorithms even could be simplified removing some logic...

It probably increases the data amounts (but not so much it would burst all memory) while giving LOD and great performance...

Argh · Post by **Argh** » 18 Dec 2009, 07:01

That is too much data, I tried that once.

Really? Hrmm. I was thinking, the only big issue is the view-distance. So basically, it's really just cheaper to ROAM.

Sooo... how the hell can it be GPU, though?

Meh. Going to have to write a test, see what's up. It's obvious that I don't understand very much yet.

Won't have time for this for at least a week or two.

Post by **jcnossen** » 18 Dec 2009, 10:37

Why should it be too much data? Once again I want to point at the chunked terrain + e.g. quadtree approach I linked. It's a widespread method used to actually make big and detailed terrain possible. With RTS games usually having a top-down view some algorithms even could be simplified removing some logic...

I don't see any link, but it's already using quadtree and 'chunked' terrain. It already does a big and detailed terrain, spring is the limiting factor when it comes to terrain size, because of all the AI related maps that are stored.

It's actually only too much data when you go for a really large map (>1025x1025), but it does quickly add up. Position, normal, binormal, tangent make up 4*3*4 bytes per vertex. You could trim that down by using char's though, so I guess it can be done... But it doesn't really change any main problem of sm3 which is related to texturing. What should be done is storing blending data in the vertex instead of in texture map

Argh · Post by **Argh** » 18 Dec 2009, 13:19

What should be done is storing blending data in the vertex instead of in texture map

Oh, so basically rebuild the vertex at runtime, store that in multitexcoords? That makes sense, and it would speed stuff up a lot.

Post by jK » 18 Dec 2009, 14:27

Argh wrote:
What should be done is storing blending data in the vertex instead of in texture map
Oh, so basically rebuild the vertex at runtime, store that in multitexcoords? That makes sense, and it would speed stuff up a lot.

To make this work you would need a static grid size -> more polygons + more vertex data -> you would just shift the load to the vertex shader ...

Also the sending tangent & bitangent to the GPU is redundant, you can reconstruct them easily in the shader.
And btw the data alignment in the VBO can be a huge performance hit, cause the GPU prefers to read n*32bytes at once, so 4*3*4 = 48 != 32 or 64 and yeah, even the used datatypes have an impact on the performance:
http://www.sci.utah.edu/~bavoil/opengl/vbo/data_types/

Argh · Post by **Argh** » 18 Dec 2009, 15:21

To make this work you would need a static grid size -> more polygons + more vertex data -> you would just shift the load to the vertex shader ...

OK, I can see that. That takes us right back to the pre-built LOD tearing problem, too

We're going in circles here.

It really does appear to me that the best way to resolve this is by doing the build-a-map concept. There just aren't any other good ways to achieve this, other than a purely static mesh... which would mean the end of free-cam, for all practical purposes.

To have ROAM and decent performance, we need to lose the blend stages. The best way to do that without destroying the overall concept is by doing all the blending exactly one time, and then storing the data in a static way.

I think I can write the FBO stuff, to build the blended maps reasonably quickly with a shader. I'd have to build it with Lua, but porting it should be reasonably easy. I don't know enough C++ to submit a patch, but I am willing to put time into this, if it would be helpful- this is basically just a different use of the FBO stuff in P.O.P.S., more or less.

Spring RTS Engine

SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3

Re: SM3