Profile Build
Moderator: Moderators
Profile Build
The Gadget Profiler doesn't work outside CA, but I've made a fixed version that doesn't require CA's various internal dependencies. Feel free to try it out- it doesn't have any controls atm, but it's drop-in-and-rock otherwise.
What's bad is that now I've seen what it's saying- and it has no explanation about why things are so slow atm. So much for the "blame the game developers" theory.
The most expensive single piece of gadget code in my game, the UnitRendering application, is still eating less than 13%. Total time for everything (including POPS, which has a fair amount of fixed overhead) is less than 20%. And that's a worst-case scenario, where we're looking at most of the map and there are hundreds of Units on display.
I was expecting something massively obvious, but there's nothing at all in my Gadgets that explains the massive drop in performance I'm seeing over here.
I don't have any real answers, but I can tell you guys for certain that unitRendering is very expensive, regardless of how complex the shader you send is. I cut the normalmap shader to the absolute bone, and it's still twice as expensive as the ARB rendering. This would be fine, though, if that was it.
The total drop in performance outweighs it by a giant amount. It's just a drop in the bucket o' suck. Just having a few hundred Units sitting around doing nothing at all is incredibly expensive, even if most of them aren't animating, aren't calling Lua, aren't doing anything and aren't on the screen.
And none of it appears to be related to Gadgets or Widgets.
I tried the Piece theory, and the amount of speedup from taking the most common Unit with multiple Pieces (my trees, which were 4 Pieces due to stuff I no longer need to do to do transparency without order-of-operations problems) down to one Piece was so small I can't even tell it's there.
I tried the map theory, too. jK's right, it doesn't matter much.
I tried LOS. It only really matters with aircraft. Which, of course, are the things we want the biggest LOS with, and where having a small LOS can be a big problem.
I'm not testing pathfinding yet- if the engine's sluggish without pathfinding happening, then it's kind've beside the point- at worst, it's just another peak-load event.
Lastly, SSMF can be very expensive in certain (unlikely) situations. But they're unlikely use cases and they are definitely shader-dependent issues (i.e., they can be removed for safety's sake) so I don't think we need to worry about them.
So... to return to what I said earlier... can we get a profiler built in that tells us what we actually need to know? Or at least a real focus on the cost per Unit? Something is seriously wrong here.
A Unit that's off-screen, doesn't move (building) is Gaia (i.e., should not be tracking resources) and is not animating should cost practically zero CPU time, but it doesn't.
Tests earlier, where I deliberately made sure no Units were in the frustrum, resulted in a FPS gain, to be sure- but it was 20 FPS. 20. Considering that I get 500+ when there aren't any Units, if the issue was Unit shaders, and I'm looking outside the map and there are no Units in the frustrum... no, that doesn't explain things at all.
Anyhow, that's it for now. The stuff I've looked at thus far all points in one direction- at the engine, fixed costs per Unit and perhaps some issues in UnitRendering- but doesn't give me anything more definite than that.
What's bad is that now I've seen what it's saying- and it has no explanation about why things are so slow atm. So much for the "blame the game developers" theory.
The most expensive single piece of gadget code in my game, the UnitRendering application, is still eating less than 13%. Total time for everything (including POPS, which has a fair amount of fixed overhead) is less than 20%. And that's a worst-case scenario, where we're looking at most of the map and there are hundreds of Units on display.
I was expecting something massively obvious, but there's nothing at all in my Gadgets that explains the massive drop in performance I'm seeing over here.
I don't have any real answers, but I can tell you guys for certain that unitRendering is very expensive, regardless of how complex the shader you send is. I cut the normalmap shader to the absolute bone, and it's still twice as expensive as the ARB rendering. This would be fine, though, if that was it.
The total drop in performance outweighs it by a giant amount. It's just a drop in the bucket o' suck. Just having a few hundred Units sitting around doing nothing at all is incredibly expensive, even if most of them aren't animating, aren't calling Lua, aren't doing anything and aren't on the screen.
And none of it appears to be related to Gadgets or Widgets.
I tried the Piece theory, and the amount of speedup from taking the most common Unit with multiple Pieces (my trees, which were 4 Pieces due to stuff I no longer need to do to do transparency without order-of-operations problems) down to one Piece was so small I can't even tell it's there.
I tried the map theory, too. jK's right, it doesn't matter much.
I tried LOS. It only really matters with aircraft. Which, of course, are the things we want the biggest LOS with, and where having a small LOS can be a big problem.
I'm not testing pathfinding yet- if the engine's sluggish without pathfinding happening, then it's kind've beside the point- at worst, it's just another peak-load event.
Lastly, SSMF can be very expensive in certain (unlikely) situations. But they're unlikely use cases and they are definitely shader-dependent issues (i.e., they can be removed for safety's sake) so I don't think we need to worry about them.
So... to return to what I said earlier... can we get a profiler built in that tells us what we actually need to know? Or at least a real focus on the cost per Unit? Something is seriously wrong here.
A Unit that's off-screen, doesn't move (building) is Gaia (i.e., should not be tracking resources) and is not animating should cost practically zero CPU time, but it doesn't.
Tests earlier, where I deliberately made sure no Units were in the frustrum, resulted in a FPS gain, to be sure- but it was 20 FPS. 20. Considering that I get 500+ when there aren't any Units, if the issue was Unit shaders, and I'm looking outside the map and there are no Units in the frustrum... no, that doesn't explain things at all.
Anyhow, that's it for now. The stuff I've looked at thus far all points in one direction- at the engine, fixed costs per Unit and perhaps some issues in UnitRendering- but doesn't give me anything more definite than that.
- Attachments
-
- dbg_profiler.lua
- (12.13 KiB) Downloaded 103 times
Re: Spring's performance MKII
Argh, I made the profile build for you. TRY IT.
Re: Spring's performance MKII
Will do, soon as I get done with various RL stuff 

Re: Spring's performance MKII
<pokes buildbot>
Er, sorry about the double-post, but which build am I looking for? I'm not seeing what I'm supposed to deploy.
And none of the builds are from later than June. Did I miss the memo or something?
Anyhow, I'll be able to do stuff in a few hours.
Er, sorry about the double-post, but which build am I looking for? I'm not seeing what I'm supposed to deploy.
And none of the builds are from later than June. Did I miss the memo or something?
Anyhow, I'll be able to do stuff in a few hours.
Re: Profile Build
split from other thread.
although your thoughts were partially relevant and may have intended well, they were a potential derail and went against OP's request.
although your thoughts were partially relevant and may have intended well, they were a potential derail and went against OP's request.
Beherith wrote:If anyone wants to test spring with a profiler, I built the latest GIT revision in msvc, so it has a .pdb file next to it. This allows my favorite free profiler, Very Sleepy to translate all adresses, and shows the source with performance counters next to it.
Link to built spring:
http://beherith.eat-peet.net/stuff/sprtest.7z
Re: Profile Build
I am doing good science here, as best as I can. Questioning the validity of testing results and presenting evidence that predictions aren't matching expectations is not a sin.
In short... I'm not sure that this was justified.
Nowhere in our policy are people allowed to just arbitrarily exclude people from science discussion in Dev, if they aren't being deliberately asinine or disruptive, and none of that post was in any way meant to be either.
Anyhow, let's move on here, I'm not going to waste all day on something like this unless people hand me hemlock.
I need to deal with RL stuff right now, I'll test Behirith's rig in a bit and see what it tells us, and who knows, maybe it'll show me a magic way to resolve the crisis I'm facing
And for goodness sakes, move it back where it belongs, unless you want a full-dress discussion about moderation policy or something, which was not the intent of my protest.
<out>
In short... I'm not sure that this was justified.
Nowhere in our policy are people allowed to just arbitrarily exclude people from science discussion in Dev, if they aren't being deliberately asinine or disruptive, and none of that post was in any way meant to be either.
Anyhow, let's move on here, I'm not going to waste all day on something like this unless people hand me hemlock.
I need to deal with RL stuff right now, I'll test Behirith's rig in a bit and see what it tells us, and who knows, maybe it'll show me a magic way to resolve the crisis I'm facing

And for goodness sakes, move it back where it belongs, unless you want a full-dress discussion about moderation policy or something, which was not the intent of my protest.
<out>
- Forboding Angel
- Evolution RTS Developer
- Posts: 14673
- Joined: 17 Nov 2005, 02:43
Re: Profile Build
What is this performance drop that I keep hearing about? As far as I can tell, Evo is actually running better than it ever has.
I have a sneaking suspicion that this performance drop that you and smoth are speaking of might be hardware related in some way, but of course I don't have the foggiest idea if that's true or not.
I have a sneaking suspicion that this performance drop that you and smoth are speaking of might be hardware related in some way, but of course I don't have the foggiest idea if that's true or not.
Re: Profile Build
By all means blame my dual 3.5 ghz and 280 for being too slow 

Re: Profile Build
I'm still stuck dealing with RL crap for awhile.
When I get free, I am going to test Behirith's thingy (presuming I can get the profiler set up) to give us some hardcore numbers, and I'm going to make a short video to show what's happening, for those of us who aren't CS majors and want to know why I'm hot and bothered about it.
Basically, long and short is that P.U.R.E. is not just really ugly on an ATi (no Unit shadows by fiat, because they couldn't get them working with earlier drivers) but it's a lot slower, on this new box with a quad-core and 2GB of DDR3, than it was on the Athlon XP with a 7800GS on AGP.
Whatever is going on is not new news, either. I alerted the devs about the engine getting a lot slower six months ago, and provided a World Builder map set up so they could see it for themselves, and so far as I know, they never even bothered. So basically I've come back to finish what I said I'd do, and I'm not sure I can realistically do it any more, because the game's almost slide-show slow before I even have anything game-related happening.
Whatever it is, it's not Gadgets, it's not Widgets, and it's not GPU. I was hoping MT would be a little closer to useful and we could skip around the issue with raw horsepower, but it's not there yet, so I'm kinda screwed atm.
When I get free, I am going to test Behirith's thingy (presuming I can get the profiler set up) to give us some hardcore numbers, and I'm going to make a short video to show what's happening, for those of us who aren't CS majors and want to know why I'm hot and bothered about it.
Basically, long and short is that P.U.R.E. is not just really ugly on an ATi (no Unit shadows by fiat, because they couldn't get them working with earlier drivers) but it's a lot slower, on this new box with a quad-core and 2GB of DDR3, than it was on the Athlon XP with a 7800GS on AGP.
Whatever is going on is not new news, either. I alerted the devs about the engine getting a lot slower six months ago, and provided a World Builder map set up so they could see it for themselves, and so far as I know, they never even bothered. So basically I've come back to finish what I said I'd do, and I'm not sure I can realistically do it any more, because the game's almost slide-show slow before I even have anything game-related happening.
Whatever it is, it's not Gadgets, it's not Widgets, and it's not GPU. I was hoping MT would be a little closer to useful and we could skip around the issue with raw horsepower, but it's not there yet, so I'm kinda screwed atm.
Re: Profile Build
Do you have it locked to one core?
I still get much better FPS locking spring.exe to a single core (with a Q6600).
I still get much better FPS locking spring.exe to a single core (with a Q6600).
Re: Profile Build
OK, I've tried testing Beherith's build. It crashes when the first Unit is built, every time, so that's going to have to wait.
I've done some more tests. Here are my conclusions:
I had an AHA moment, and developed a good proof that it's not rendering snafus. And what do ya know... the resulting FPS is nearly the same as when I have normalmaps on everything you (can't) see here (which, if you've never played P.U.R.E., is a wooded area with some houses, a tank-farm and various other crap).
Here is a screenshot which proves my case.
Every Unit in this screenshot has had Spring.SetUnitNoDraw(unitID,true) set.
Note the FPS. And ofc, you can read the Gadget profiler, showing what I said earlier- nothing there is killing the engine speed-wise.
Case closed.

Whatever is happening here is engine-side. It's not my fancy shaders, it's not unitRendering, it's not anything related to drawing the geometry. I know there are a lot of ground objects (streets, etc.), but I've already tested that- the current code is plenty fast and setting that off barely moves the numbers.
The CPU is maxed out, and it's not Gadgets or Widgets. If I can see a build that will run and give a profile, we can dig deeper and maybe start to dig in.
If you guys aren't satisfied that that picture says something significant, then nothing is going to work here. These are trees and buildings. There isn't any pathfinding going on, minimal activity on POPS, no CEGs, hardly anything is happening. There's no "game" here- it's just scenery. Invisible scenery.
When we know why that's so slow, then we're getting somewhere. IDK whether that will also resolve Smoth's issues, but my guess is that it's all the same stuff.
I've done some more tests. Here are my conclusions:
I had an AHA moment, and developed a good proof that it's not rendering snafus. And what do ya know... the resulting FPS is nearly the same as when I have normalmaps on everything you (can't) see here (which, if you've never played P.U.R.E., is a wooded area with some houses, a tank-farm and various other crap).
Here is a screenshot which proves my case.
Every Unit in this screenshot has had Spring.SetUnitNoDraw(unitID,true) set.
Note the FPS. And ofc, you can read the Gadget profiler, showing what I said earlier- nothing there is killing the engine speed-wise.
Case closed.

Whatever is happening here is engine-side. It's not my fancy shaders, it's not unitRendering, it's not anything related to drawing the geometry. I know there are a lot of ground objects (streets, etc.), but I've already tested that- the current code is plenty fast and setting that off barely moves the numbers.
The CPU is maxed out, and it's not Gadgets or Widgets. If I can see a build that will run and give a profile, we can dig deeper and maybe start to dig in.
If you guys aren't satisfied that that picture says something significant, then nothing is going to work here. These are trees and buildings. There isn't any pathfinding going on, minimal activity on POPS, no CEGs, hardly anything is happening. There's no "game" here- it's just scenery. Invisible scenery.
When we know why that's so slow, then we're getting somewhere. IDK whether that will also resolve Smoth's issues, but my guess is that it's all the same stuff.
Re: Profile Build
your cpu is maxed out for 66 fps?
Re: Profile Build
I get 500 FPS with an empty map.
So 66 is what's left when the CPU is chugging through c. 300 Units (if you want an exact count, I can get that for ya) that aren't moving, have LOS of zero, are Gaia, aren't animating, aren't calling Lua, are being touched by Lua only very lightly (if-->then sort result of false for the vast majority of things, as the profiler shows) etc., etc., etc.
That's ridiculous. A Unit doing so little should be practically free. We should be able to have thousands of them before it starts to hurt.
The only time they should be a significant performance problem is if we're using a long view and can see a gazillion triangles' worth of geometry / texture loads / geometry transforms.
So 66 is what's left when the CPU is chugging through c. 300 Units (if you want an exact count, I can get that for ya) that aren't moving, have LOS of zero, are Gaia, aren't animating, aren't calling Lua, are being touched by Lua only very lightly (if-->then sort result of false for the vast majority of things, as the profiler shows) etc., etc., etc.
That's ridiculous. A Unit doing so little should be practically free. We should be able to have thousands of them before it starts to hurt.
The only time they should be a significant performance problem is if we're using a long view and can see a gazillion triangles' worth of geometry / texture loads / geometry transforms.
Re: Profile Build
300 is an arbitrary unscientific number - measure it with a range of unit counts, see where the fps starts to drop, where the fps gets unplayable, and everything between
and we need profiling
and we need profiling

Re: Profile Build
Like I said, I can give you an exact number for any given map, and we can look at performance case by case if we want- there's everything from maps with only a couple of dozen things to probably darn near 1000 for the two massive cities (which are totally unplayable at all atm). I'll code a Gadget when I've had sleep, we can even do breakdowns if that ends up being useful. But I think we need to know what we're actually looking for first, otherwise it may just be a goose-chase.300 is an arbitrary unscientific number - measure it with a range of unit counts, see where the fps starts to drop, where the fps gets unplayable, and everything between
I agree. I was hoping it'd be point-and-shoot, so I could give us a breakdown. The obvious alternative is to build another testbed, so people who already have profiling working properly and know what the numbers are can just see what happens when they run it.we need profiling
If their situation differed markedly, that would point to some sort of OpenGL issue causing stuff to run in software mode or something equally awful, which may be very difficult to get resolved; if they saw the same thing, then it's mainly a matter of looking at the numbers and trying to decide what to look at.
I am not under any illusions that whatever is wrong is "obvious", or we wouldn't be having this discussion in the first place. There are a lot of places where Units can be very expensive if they're not exempted, and it's fruitless to check anything out until there's a clear pointer.
Anyhow, if people are willing to test a scenario, let me know and I'll build a demo sometime in the next 48 hours. I need to sleep now, but I'll be available afternoon-ish.
Re: Profile Build
If you can tell me where my test build is crashing, Ill try to debug, as it doesnt crash for me.
Look at your task manager when running spring. Are you sure the process is not smeared over all of your cores? This was a massive issue for me, as all cores taxed to 25% didnt even kick in my EIST.
Look at your task manager when running spring. Are you sure the process is not smeared over all of your cores? This was a massive issue for me, as all cores taxed to 25% didnt even kick in my EIST.
Re: Profile Build
No, it's definitely locked to one core here (I'm still using XP Pro, so applications don't pull that unless they're explicitly multicore or core 0 is totally occupied, in my experience).
I get a minor boost if I set processor affinity to one of the idle cores, but it's not worth writing home about, and I have to do it every time I start the engine up. I wish there was a simple fix for this, but there just isn't.
Anyhow, still busy with RL stuff for awhile, but I'll be able to give you some feedback about what's up with that build when I get back. It crashes when the first Unit is built, every time. Map loads, rendering loads, etc.
That, and it's missing some DLLs- DeVil, etc.
I get a minor boost if I set processor affinity to one of the idle cores, but it's not worth writing home about, and I have to do it every time I start the engine up. I wish there was a simple fix for this, but there just isn't.
Anyhow, still busy with RL stuff for awhile, but I'll be able to give you some feedback about what's up with that build when I get back. It crashes when the first Unit is built, every time. Map loads, rendering loads, etc.
That, and it's missing some DLLs- DeVil, etc.
Re: Profile Build
imagecfg -a 0x8 Spring.exe
Re: Profile Build
Copy over the dlls from an existing spring install, without overwriting existing ones.
Re: Profile Build
Okie doke. After messing around with things a bit to get testing working, I've fixed what caused the crash, and here's a profile result from Very Sleepy.
It's not 100% the same as what I've been testing (for example, only Commanders is supported in this profile build) but the results in terms of performance are close enough it's probably nearly identical.
I'll leave it to people who know what this means to say what it reveals.
The fourth-highest inclusive is really interesting, though. Nothing on that test map has a Weapon defined at all.
It's not 100% the same as what I've been testing (for example, only Commanders is supported in this profile build) but the results in terms of performance are close enough it's probably nearly identical.
I'll leave it to people who know what this means to say what it reveals.
The fourth-highest inclusive is really interesting, though. Nothing on that test map has a Weapon defined at all.
- Attachments
-
- capture.zip
- (40.56 KiB) Downloaded 26 times