SM3 - Page 10

SM3

Discuss the source code and development of Spring Engine in general from a technical point of view. Patches go here too.

Moderator: Moderators

Post Reply
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

OK, tried Sleepy out. I don't get the same display as you do. I don't know how I'm supposed to interpret the output, either. All that said, assume I don't know how it works, and your number's useful. That said, just because the groundDrawer function is taking up a lot of time... does not necessarily mean it's GPU, it's not like that function just says, "here's our display list and the shader, go".

It just means groundDrawer is taking a long time to return, doesn't it?

To accurately profile how long the actual draw operation is taking, we'd need to build a function that just says, "go" and time how long it takes for that to return, over a long enough sample period. Otherwise we're just guessing, and I really have a very hard time believing it's GPU, considering all the stuff I've seen with shaders over the last few months. Not ruling it out, mind you... it's just that I'd be very, very surprised, now that I've seen the Unpleasantville test.
Attachments
ProcCounts.txt
(24.93 KiB) Downloaded 118 times
User avatar
Beherith
Moderator
Posts: 5040
Joined: 26 Oct 2007, 16:21

Re: SM3

Post by Beherith »

Holy crap, seems like something IS afoot.

When zoomed out i get 120 fps on your posted map argh. When zoomed in all the way I get 51 fps, and sm3 drawing is taking up 34% cpu!
With /wiremap you can see the insane amount of wasted tris on highest detail. But this still does not explain why 5 layers with bump is 18 fps no matter what the detail level is.

Baczek:
http://www.codersnotes.com/sleepy use Very sleepy not just sleepy, as it has awesome UI.

It supports any native Windows app, if it has standard PDB debugging information. No recompilation is necessary ÔÇô it can just attach to any app as itÔÇÖs running.

I dont know if mingw outputs standard pdb.

Argh:
Heres how you interpret the results; Inclusive time is the time spent in the function and in all functions called by this function. Exclusive is the time spend purely in that function.
If a() calls b(), and a() is 50% inclusive, and 40% exclusive, that means that b() is 10% inclusive.
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

Good, I'm not crazy, then.

If my thoughts on this are accurate... then when you turn the detail level all the way down, it will not make a great deal of difference. A few percent at most. It ain't the geometry, or the shader, or even the texture loads. OK, it *might* be the shader, but I really am doubting that atm, I've read that, and it's pretty bare-bones.

One thing about the shader really bothers me, though- why isn't it handling all the layers in one shader pass, with unused layers simply not used? Is it literally redrawing everything multiple times? If so... meh... it's truly time for the "build-a-map" concept, imo, because that's just never going to work well, period.

I'd have to read the groundDrawer code to even make a wild stab... my gut feeling is that it's either the tesselation code (really doubt it, it looks like stock old-skool SMF code just from the types of quads it builds, and jc probably just ported his previous work there, but get confirmation since I have no idea wtf I'm talking about)... but my real gut feeling is that somehow those blend layers aren't being written in a static way, and are getting redone on a regular basis, which they shouldn't be. Just a hunch, mind you- but the weird "skipping" I see, where the game state just plain halts for a few milliseconds when I pan around a lot... tell me something very odd's afoot.
Last edited by Argh on 16 Dec 2009, 12:27, edited 1 time in total.
User avatar
Beherith
Moderator
Posts: 5040
Joined: 26 Oct 2007, 16:21

Re: SM3

Post by Beherith »

Image


Arghs testmap zoomed in, high detail. 45 fps.

If you notice, 63% is for only DRAW! While 34% of total cpu time is sm3 draw!
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

Running empty, or with the WB stuff? I am assuming "empty".
Last edited by Argh on 16 Dec 2009, 12:31, edited 1 time in total.
User avatar
Beherith
Moderator
Posts: 5040
Joined: 26 Oct 2007, 16:21

Re: SM3

Post by Beherith »

Meaning? I just loaded your map. I dont have pure installed.
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

OK, so no whole giant city making the numbers suspect, etc. That is disturbing, tbh. Lemme repeat my test over here, with the WB stuff all removed.

I got: 60FPS, zoomed in, detail 12.

45-48FPS, zoomed out to mid-level (about 900-1000 high). That must be polycount at that point. It really is pretty insane, at detail level 12 :P

In "b", I see almost 11% CPU time is being used on "ground update"... but this is a voidwater, notDeformable map. Very odd.
Last edited by Argh on 16 Dec 2009, 12:41, edited 2 times in total.
User avatar
Beherith
Moderator
Posts: 5040
Joined: 26 Oct 2007, 16:21

Re: SM3

Post by Beherith »

Does anyone want my built msvc executable with PDB?
User avatar
jcnossen
Former Engine Dev
Posts: 2440
Joined: 05 Jun 2005, 19:13

Re: SM3

Post by jcnossen »

The bottleneck is definitely all GPU.
All vertex and index data are stored in static vertex buffers, and change only when you move the camera.
User avatar
Forboding Angel
Evolution RTS Developer
Posts: 14657
Joined: 17 Nov 2005, 02:43

Re: SM3

Post by Forboding Angel »

Just a note. I run sm3 at .32 (that's the view radius I normally use for smf) because in top down mode it looks fine. At .32 sm3 runs smooth as butter (and better than smf).

Doesn't it seem a little odd to be doing all this testing at 12? How many of you actually use 1200 viewradius in smf (assuming you could actually ever get it that high). The highest I can get at smf is 400ish (haven't tried higher, but my fps by that point is in the teens. I can view 12 in sm3 and still have about 30-40 fps and it's as picture perfect as smf@400 (if not moreso). That right there tells you that sm3 performance is great.

Someone run a long BA bot battle on sm3 .32 with tons of units and stuff and see how it performs vs smf. That would be a lot more telling wouldn't it?
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

All vertex and index data are stored in static vertex buffers, and change only when you move the camera.
But in a real game, you're constantly moving the camera. So, is that why it nearly halts the gamestate, if I pan really fast?

If so, why not use a static-mesh strategy- divide the mesh into sectors, pre-subdivided according to detail levels, keep it all static unless we need to adjust map geometry?
User avatar
Beherith
Moderator
Posts: 5040
Joined: 26 Oct 2007, 16:21

Re: SM3

Post by Beherith »

Argh wrote: If so, why not use a static-mesh strategy- divide the mesh into sectors, pre-subdivided according to detail levels, keep it all static unless we need to adjust map geometry?
Ive tried pre-subdivided detail levels. stitching them back together with no tears is nearly impossible.
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

What if they're meshed at runtime, and don't use LOD? Just geometry chunks?
User avatar
jcnossen
Former Engine Dev
Posts: 2440
Joined: 05 Jun 2005, 19:13

Re: SM3

Post by jcnossen »

If so, why not use a static-mesh strategy- divide the mesh into sectors, pre-subdivided according to detail levels, keep it all static unless we need to adjust map geometry?
That is too much data, I tried that once.
Master-Athmos
Posts: 912
Joined: 27 Jun 2009, 01:32

Re: SM3

Post by Master-Athmos »

Why should it be too much data? Once again I want to point at the chunked terrain + e.g. quadtree approach I linked. It's a widespread method used to actually make big and detailed terrain possible. With RTS games usually having a top-down view some algorithms even could be simplified removing some logic...

It probably increases the data amounts (but not so much it would burst all memory) while giving LOD and great performance...
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

That is too much data, I tried that once.
Really? Hrmm. I was thinking, the only big issue is the view-distance. So basically, it's really just cheaper to ROAM.

Sooo... how the hell can it be GPU, though?

Meh. Going to have to write a test, see what's up. It's obvious that I don't understand very much yet.

Won't have time for this for at least a week or two.
User avatar
jcnossen
Former Engine Dev
Posts: 2440
Joined: 05 Jun 2005, 19:13

Re: SM3

Post by jcnossen »

Why should it be too much data? Once again I want to point at the chunked terrain + e.g. quadtree approach I linked. It's a widespread method used to actually make big and detailed terrain possible. With RTS games usually having a top-down view some algorithms even could be simplified removing some logic...
I don't see any link, but it's already using quadtree and 'chunked' terrain. It already does a big and detailed terrain, spring is the limiting factor when it comes to terrain size, because of all the AI related maps that are stored.

It's actually only too much data when you go for a really large map (>1025x1025), but it does quickly add up. Position, normal, binormal, tangent make up 4*3*4 bytes per vertex. You could trim that down by using char's though, so I guess it can be done... But it doesn't really change any main problem of sm3 which is related to texturing. What should be done is storing blending data in the vertex instead of in texture map
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

What should be done is storing blending data in the vertex instead of in texture map
Oh, so basically rebuild the vertex at runtime, store that in multitexcoords? That makes sense, and it would speed stuff up a lot.
User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: SM3

Post by jK »

Argh wrote:
What should be done is storing blending data in the vertex instead of in texture map
Oh, so basically rebuild the vertex at runtime, store that in multitexcoords? That makes sense, and it would speed stuff up a lot.
To make this work you would need a static grid size -> more polygons + more vertex data -> you would just shift the load to the vertex shader ...

Also the sending tangent & bitangent to the GPU is redundant, you can reconstruct them easily in the shader.
And btw the data alignment in the VBO can be a huge performance hit, cause the GPU prefers to read n*32bytes at once, so 4*3*4 = 48 != 32 or 64 and yeah, even the used datatypes have an impact on the performance:
http://www.sci.utah.edu/~bavoil/opengl/vbo/data_types/
User avatar
Argh
Posts: 10920
Joined: 21 Feb 2005, 03:38

Re: SM3

Post by Argh »

To make this work you would need a static grid size -> more polygons + more vertex data -> you would just shift the load to the vertex shader ...
OK, I can see that. That takes us right back to the pre-built LOD tearing problem, too :P

We're going in circles here.

It really does appear to me that the best way to resolve this is by doing the build-a-map concept. There just aren't any other good ways to achieve this, other than a purely static mesh... which would mean the end of free-cam, for all practical purposes.

To have ROAM and decent performance, we need to lose the blend stages. The best way to do that without destroying the overall concept is by doing all the blending exactly one time, and then storing the data in a static way.

I think I can write the FBO stuff, to build the blended maps reasonably quickly with a shader. I'd have to build it with Lua, but porting it should be reasonably easy. I don't know enough C++ to submit a patch, but I am willing to put time into this, if it would be helpful- this is basically just a different use of the FBO stuff in P.O.P.S., more or less.
Post Reply

Return to “Engine”