Graphical Performance & Spring
Moderator: Moderators
- Stealth870
- Posts: 166
- Joined: 13 Sep 2004, 00:25
I was chatting to [UoY]ElCapitano earlier about my new rig and how the performance in Spring is diabolical compared to other games. He suggested I make a program setting in the nvidia control panel to turn off threaded optimizations in Spring so I went ahead and tried it out.
Suddenly instead of 8fps regardless of settings, I have 140-150 fps with ALL SETTINGS MAXED on 8800 GTX. I hosted myself a .cheat game and spammed a mix of peewees and mavericks totalling to 1600 units, and had them all on constant move orders on delta dry.
Even at full zoom, min FPS was around 30 with this number of units, with a whole tonne of them firing and doing other random stuff just to try make it more realistic. Seems like a sure fix for 8800 GTX on most up-to-date drivers.
If anyone else hasn't tried this yet, I seriously suggest you do this and see if it also helps you for Spring performance.
Suddenly instead of 8fps regardless of settings, I have 140-150 fps with ALL SETTINGS MAXED on 8800 GTX. I hosted myself a .cheat game and spammed a mix of peewees and mavericks totalling to 1600 units, and had them all on constant move orders on delta dry.
Even at full zoom, min FPS was around 30 with this number of units, with a whole tonne of them firing and doing other random stuff just to try make it more realistic. Seems like a sure fix for 8800 GTX on most up-to-date drivers.
If anyone else hasn't tried this yet, I seriously suggest you do this and see if it also helps you for Spring performance.
I've tested Spring with an ATI 9500-series card, a SLI-rigged pair of nVidia GeForce 7600GTs, and my 7800GTS.
In order of framerate hit, here are the things I've seen, just on rendering an empty section of map, with nothing else going on:
1. Dynamic Water. Even with no ripple events, it eats GPU quite heavily. There may be opportunities for optimization there.
2. Terrain Detail. Yup, on SMD maps, terrain detail is the second-highest FPS-lowering factor. I really didn't think this would be the case, but it appears to be, after multiple tests. Something odd is up, there, this didn't used to be a crippling factor.
3. Shadowmap rendering. Having read a few more articles about this arcane topic, I think that we might get far better performance, with very few visual distractions, from forcing blob shadows for all non-map objects when the camera is more than (some user-controlled number) from the map surface. It'd cull huge numbers of polygons from that rendering pass.
4. Grass. This needs to get culled more with camera distance, frankly, or be subject to user controls. Huge waste of polygons.
5. Reflective / Refractive water.
6. Groundscars. As has been previously noted, these are surprisingly heavy users of FPS. Is it because of the way they are being "projected" onto the map? It seems that large numbers of groundscars on a big flat area cause a lot less FPS drop than the same number on a varied terrain...
Episodic, or game-engine areas where slowdowns are most apparent, in my tests, which includes many events that are at least partially CPU-dependent:
1. Large pathfinding situations involving many areas of collision. Short of somebody really sitting down and revamping K-Man's steering code, where the main problem lies, which I'm sure is probably very difficult stuff, we're stuck with this. Even a steering-code change that forced Units who will be in collision situations in 5 frames or less, causing the Unit to "negotiate" with the other Unit about which one gets precedence, while an expensive search, would probably work better than what we have now, where collisions are handled only as they happen, instead of being prevented in the first place.
2. Large numbers of CustomExplosionGenerators that all occur within the same sync frame. I've stated before, in Feature Requests, that I think these need to be formally de-synced, as they're just a visual effect, and put on a millisecond timing code, outside the sync loop. I still think that that would be a very good idea, because pileups of these events in a given frame are a big killer- it's not just that the sheer number of particles, all obeying complex rules, is a massive computational situation. The first situation- that these massive computations all run in frame-by-frame lockstep- is preventable. The second one is not, and is a problem for game designers to address.
3. Sound bottlenecks. The evolution of soundcode (did that proposed patch ever get done? I lost track) to allow more "private" or "selectively public" sound would probably help a lot, by filtering the number of sounds down, but you can choke Spring very easily by writing a BOS that calls a 50kb sound every tick.
4. LOS checks involving rapidly-moving objects and a lot of changes of reference frame. Mainly, large numbers of aircraft moving over large enemy bases, however, this also is part of why Spring lags the most when large numbers of Units can see each other, period.
I don't have any magical ideas about how to address this. LOS checks are one of those n-squared things, right? But my feeling is that if we went to a model of LOS checks where we could lower actual resolution whilst keeping size would help a lot- I've already explained why, in detail, elsewhere.
Oh, and I solved part of the mystery of why units in PURE are shooting at each other when they are not yet being drawn. Imbaczek's change to the aiming code means that, by default, Units seem to be aiming at the "front edge" of each other's collision spheres- hence, since they can apparently detect collision spheres BEFORE THEY ACQUIRE LOS, which I personally think is both a bug and a potential exploit, due to the way that aiming code and LOS code are essentially seperate... they're shooting slightly ahead of where they can see. Moreover, this puts larger units at a distinct disadvantage. I addressed this by changing their aiming behaviors, but I think that this is a bug of the first order, and I'll put it into Mantis.
5. Pathfinder checks due to map changes after deformation. This is one of those things that is either very minor, or catastrophic, depending on how much of it is going on- frequency can rapidly turn this game feature into a major drain on CPU and GPU, due to the rapid changes in the actual map geometry it seems to cause.
6. The way that the FALL COB event is handled is still massively inefficient. The code for this basically turns the object into a special, new game entity, uses the previous parent origin as the centroid, and builds a collision sphere on the fly, for checking purposes. Instead of doing that on the fly, every time that an object is told to FALL... why not have Spring go through every 3DO / S3O, identify all of the Pieces, build the collision spheres, etc., and store that data for reference later? I know it would add to loading times, but I highly suspect that it would greatly speed up combat when this is used... oh, and that would be one quick step towards having multi-sphere collision models for more complex collision detection in general, if it is ever deemed useful enough to pay the price.
7. Pathfinder choke due to Wreckage needs work. Right now, Wreckage is just treated like a non-moving Unit, with a collision square. Are there any better ways to go about this? Can Wreckage "send" a "warning" to alter steering behavior, before collisions cause a chain-reaction of bad paths? Again, this is related to the steering-code problems in general.
8. COB events can cause significant CPU usage, due to their usage of a millisecond timing model- it is very easy to create a COB that will flood Spring to the point of effectively halting the sim. This is a game-designer problem, but one thing that could be done on the engine end is if we animators knew exactly how many "ticks" == "sim frame". I know it's about 30, but is it 33, 35? That sounds minor, but anything that runs faster than that is bad COB code and cause several problems. Needs to be determined and documented.
There are other areas where there can be major FPS drains, but I think these are the worst ones I see all the time. Some of the FPS drain is due to CPU bottleneck, where it simply cannot post to the GPU in time.
Other than the (relatively) simple problems of a slightly modified shadowmap concept, optimizing for FALL events by precalculating all Pieces and storing that information for fast retrieval, instead of calculating on the fly, and maybe some simple bug in the LOD behaviors of SMD (it's probably not simple, however, and I don't even pretend to understand JC's code, not even slightly)... most of these problems are non-trivial fixes.
I think that the issue of LOD of LOS-check using lower-resolution grids, though, might be a major way to improve performance in many situations.
I wish I had a good answer for the steering-code problem, other than my weird concept of a grayscale heightmap to store and transmit probability weights to an actor (which is probably a terrible idea, but it's the only one I have had that was slightly original- if you're bored, look it up on the CE forum).
At any rate, those are the things I see in my tests. Reflectivity / glow was only a problem for the ATI card, but even then, it was very minor, compared to reflectivity of the water.
With CEGs, the issue is more one of how many CEGs are being created / updated during a given frame, and less of sheer particle-count. While that matters, and is non-trivial, it matters less than I thought it would.
With LUA, it mainly is a design area, like with BOS. Only, unlike BOS, it's not as critical, because so long as the LUA isn't incredibly complex, it doesn't cause major problems. With BOS, it's more sensitive, due to the millisecond timing model, but that's a design issue, and needs to remain under the control of the animators, imo- forcing an arbitrary lower value for "sleep" commands would be a very bad idea.
In order of framerate hit, here are the things I've seen, just on rendering an empty section of map, with nothing else going on:
1. Dynamic Water. Even with no ripple events, it eats GPU quite heavily. There may be opportunities for optimization there.
2. Terrain Detail. Yup, on SMD maps, terrain detail is the second-highest FPS-lowering factor. I really didn't think this would be the case, but it appears to be, after multiple tests. Something odd is up, there, this didn't used to be a crippling factor.
3. Shadowmap rendering. Having read a few more articles about this arcane topic, I think that we might get far better performance, with very few visual distractions, from forcing blob shadows for all non-map objects when the camera is more than (some user-controlled number) from the map surface. It'd cull huge numbers of polygons from that rendering pass.
4. Grass. This needs to get culled more with camera distance, frankly, or be subject to user controls. Huge waste of polygons.
5. Reflective / Refractive water.
6. Groundscars. As has been previously noted, these are surprisingly heavy users of FPS. Is it because of the way they are being "projected" onto the map? It seems that large numbers of groundscars on a big flat area cause a lot less FPS drop than the same number on a varied terrain...
Episodic, or game-engine areas where slowdowns are most apparent, in my tests, which includes many events that are at least partially CPU-dependent:
1. Large pathfinding situations involving many areas of collision. Short of somebody really sitting down and revamping K-Man's steering code, where the main problem lies, which I'm sure is probably very difficult stuff, we're stuck with this. Even a steering-code change that forced Units who will be in collision situations in 5 frames or less, causing the Unit to "negotiate" with the other Unit about which one gets precedence, while an expensive search, would probably work better than what we have now, where collisions are handled only as they happen, instead of being prevented in the first place.
2. Large numbers of CustomExplosionGenerators that all occur within the same sync frame. I've stated before, in Feature Requests, that I think these need to be formally de-synced, as they're just a visual effect, and put on a millisecond timing code, outside the sync loop. I still think that that would be a very good idea, because pileups of these events in a given frame are a big killer- it's not just that the sheer number of particles, all obeying complex rules, is a massive computational situation. The first situation- that these massive computations all run in frame-by-frame lockstep- is preventable. The second one is not, and is a problem for game designers to address.
3. Sound bottlenecks. The evolution of soundcode (did that proposed patch ever get done? I lost track) to allow more "private" or "selectively public" sound would probably help a lot, by filtering the number of sounds down, but you can choke Spring very easily by writing a BOS that calls a 50kb sound every tick.
4. LOS checks involving rapidly-moving objects and a lot of changes of reference frame. Mainly, large numbers of aircraft moving over large enemy bases, however, this also is part of why Spring lags the most when large numbers of Units can see each other, period.
I don't have any magical ideas about how to address this. LOS checks are one of those n-squared things, right? But my feeling is that if we went to a model of LOS checks where we could lower actual resolution whilst keeping size would help a lot- I've already explained why, in detail, elsewhere.
Oh, and I solved part of the mystery of why units in PURE are shooting at each other when they are not yet being drawn. Imbaczek's change to the aiming code means that, by default, Units seem to be aiming at the "front edge" of each other's collision spheres- hence, since they can apparently detect collision spheres BEFORE THEY ACQUIRE LOS, which I personally think is both a bug and a potential exploit, due to the way that aiming code and LOS code are essentially seperate... they're shooting slightly ahead of where they can see. Moreover, this puts larger units at a distinct disadvantage. I addressed this by changing their aiming behaviors, but I think that this is a bug of the first order, and I'll put it into Mantis.
5. Pathfinder checks due to map changes after deformation. This is one of those things that is either very minor, or catastrophic, depending on how much of it is going on- frequency can rapidly turn this game feature into a major drain on CPU and GPU, due to the rapid changes in the actual map geometry it seems to cause.
6. The way that the FALL COB event is handled is still massively inefficient. The code for this basically turns the object into a special, new game entity, uses the previous parent origin as the centroid, and builds a collision sphere on the fly, for checking purposes. Instead of doing that on the fly, every time that an object is told to FALL... why not have Spring go through every 3DO / S3O, identify all of the Pieces, build the collision spheres, etc., and store that data for reference later? I know it would add to loading times, but I highly suspect that it would greatly speed up combat when this is used... oh, and that would be one quick step towards having multi-sphere collision models for more complex collision detection in general, if it is ever deemed useful enough to pay the price.
7. Pathfinder choke due to Wreckage needs work. Right now, Wreckage is just treated like a non-moving Unit, with a collision square. Are there any better ways to go about this? Can Wreckage "send" a "warning" to alter steering behavior, before collisions cause a chain-reaction of bad paths? Again, this is related to the steering-code problems in general.
8. COB events can cause significant CPU usage, due to their usage of a millisecond timing model- it is very easy to create a COB that will flood Spring to the point of effectively halting the sim. This is a game-designer problem, but one thing that could be done on the engine end is if we animators knew exactly how many "ticks" == "sim frame". I know it's about 30, but is it 33, 35? That sounds minor, but anything that runs faster than that is bad COB code and cause several problems. Needs to be determined and documented.
There are other areas where there can be major FPS drains, but I think these are the worst ones I see all the time. Some of the FPS drain is due to CPU bottleneck, where it simply cannot post to the GPU in time.
Other than the (relatively) simple problems of a slightly modified shadowmap concept, optimizing for FALL events by precalculating all Pieces and storing that information for fast retrieval, instead of calculating on the fly, and maybe some simple bug in the LOD behaviors of SMD (it's probably not simple, however, and I don't even pretend to understand JC's code, not even slightly)... most of these problems are non-trivial fixes.
I think that the issue of LOD of LOS-check using lower-resolution grids, though, might be a major way to improve performance in many situations.
I wish I had a good answer for the steering-code problem, other than my weird concept of a grayscale heightmap to store and transmit probability weights to an actor (which is probably a terrible idea, but it's the only one I have had that was slightly original- if you're bored, look it up on the CE forum).
At any rate, those are the things I see in my tests. Reflectivity / glow was only a problem for the ATI card, but even then, it was very minor, compared to reflectivity of the water.
With CEGs, the issue is more one of how many CEGs are being created / updated during a given frame, and less of sheer particle-count. While that matters, and is non-trivial, it matters less than I thought it would.
With LUA, it mainly is a design area, like with BOS. Only, unlike BOS, it's not as critical, because so long as the LUA isn't incredibly complex, it doesn't cause major problems. With BOS, it's more sensitive, due to the millisecond timing model, but that's a design issue, and needs to remain under the control of the animators, imo- forcing an arbitrary lower value for "sleep" commands would be a very bad idea.
Argh, you do not have an nvidia 8x000 class gpu and your suggested fixes are not gpu performance boosts there cpu performance boosts and thats nto what the issue is.
On my 8800 switching from absic water to reflective water takes off 0-1 fps. A switch from reflective to reflective and refractive gives no framerate drop. switch from reflec+refrac to dynamic gives similair no show in the framerate drop department. The 8800 class gpu's handle these water renderers very well performance wise, the only exception is increased driver instability and the flickering bug in dynamic water.
Pathfinding has nothing todo with the gpu nor does aiming, these are cpu things.
8800 class cards and core 2 duos handles masses of grass and trees with no ill effects or noticeable lag.
Under an 8800 clas gpu ind ebug mode there are 3 items whcih take up the vast majority of the time. All the other items take up miniscule 0.1% 1% type values.
- shadows and reflections
- drawing the ground
- and most of all, drawing the interface. This is the biggest by far.
Of course as time goes on simulation rises and so on.
Also
Moving the camera around causes a lot of lag in gDEBugger. However zooming out and zooming in on another location isn't as laggy even though it does the same thing. Pressing tab gives a huge lag spike, then around 10-15fps.
On my 8800 switching from absic water to reflective water takes off 0-1 fps. A switch from reflective to reflective and refractive gives no framerate drop. switch from reflec+refrac to dynamic gives similair no show in the framerate drop department. The 8800 class gpu's handle these water renderers very well performance wise, the only exception is increased driver instability and the flickering bug in dynamic water.
Pathfinding has nothing todo with the gpu nor does aiming, these are cpu things.
8800 class cards and core 2 duos handles masses of grass and trees with no ill effects or noticeable lag.
Under an 8800 clas gpu ind ebug mode there are 3 items whcih take up the vast majority of the time. All the other items take up miniscule 0.1% 1% type values.
- shadows and reflections
- drawing the ground
- and most of all, drawing the interface. This is the biggest by far.
Of course as time goes on simulation rises and so on.
Also
Moving the camera around causes a lot of lag in gDEBugger. However zooming out and zooming in on another location isn't as laggy even though it does the same thing. Pressing tab gives a huge lag spike, then around 10-15fps.
That's because spring moves sections of the map texture from ram to vram whenever you move the camera...works fine normally but the debugger captures vram, slowing the process down considerably.AF wrote:Moving the camera around causes a lot of lag in gDEBugger. However zooming out and zooming in on another location isn't as laggy even though it does the same thing. Pressing tab gives a huge lag spike, then around 10-15fps.
why not store a general value for a tile representing the average colour and display that the update it to the texture when the textures loaded.
That way when zoomed out you don't need to load the texture at all and you et a speed boost showing the entire map since you dont need to load all the textures on the map.
The effect would be unnoticeable unless you suffer form this problem in which case you'd get a minor graphical artefact and a nicer frame rate.
That way when zoomed out you don't need to load the texture at all and you et a speed boost showing the entire map since you dont need to load all the textures on the map.
The effect would be unnoticeable unless you suffer form this problem in which case you'd get a minor graphical artefact and a nicer frame rate.
@AF:
I was fairly careful to list the things I think are purely GPU-related seperately from the things that are entirely CPU or are a combination of both (CEGs are obviously one of those- my proposed solution would remove some of the CPU bottlenecking, but would have zero effect on GPU problems).
That's about all I have to say, those are just my current areas I've observed while testing PURE, so I may be over-emphasizing some stuff or under-rating others, frankly.
I was fairly careful to list the things I think are purely GPU-related seperately from the things that are entirely CPU or are a combination of both (CEGs are obviously one of those- my proposed solution would remove some of the CPU bottlenecking, but would have zero effect on GPU problems).
That's about all I have to say, those are just my current areas I've observed while testing PURE, so I may be over-emphasizing some stuff or under-rating others, frankly.
It is DXT1 compressed. I believe that's 1:6 compression ratio.jK wrote:8192x8192x3B = 192MB
So why not compress it? With 1:7 it is only 27MB large, small enough to fit in the gfx ram
EDIT: oops ratio is 1:6 for DXT1 not 1:3 !
Last edited by Tobi on 30 Sep 2007, 12:59, edited 1 time in total.
A low resolution image of the map were 1 pixel represents a tile rather than 64 pixels could be kept in video ram for the whole game. A setup involving each pixel representing the average of a 16x16 tile would be far far smaller than a DXT1 compressed image of the entire map, and much more efficient.
What's more we already have such an image stored in memory. The minimap!
What's more we already have such an image stored in memory. The minimap!
Actually the map's texture map is probably in memory in any (power of two) size you want. The mipmaps!
Seriously though, most of that is working already maybe the only useful addition to the map rendering code could be checks for amount of video RAM and whether all textures are persistent => drop unused map textures from video RAM, but keep the entire map texture in video RAM if it fits.
Maybe there is some way to specify texture priority in OpenGL (I dont remember), then it may already help to just load all textures into the driver and set the map textures at a lower priority then other textures.
EDIT:
A 16x16 map at the lowest mipmap level (4x4 pixels per tile, 8 bytes per tile) takes up only 1 MB of texture memory. (Note that this is only twice as much as 1 pixel per tile uncompressed RGBA texture.)
So maybe just forcefully keeping at least the lowest mipmap level in video RAM already solves it, though IMO it would still be nice if everything was made persistent in video RAM whenever it fits.
Seriously though, most of that is working already maybe the only useful addition to the map rendering code could be checks for amount of video RAM and whether all textures are persistent => drop unused map textures from video RAM, but keep the entire map texture in video RAM if it fits.
Maybe there is some way to specify texture priority in OpenGL (I dont remember), then it may already help to just load all textures into the driver and set the map textures at a lower priority then other textures.
EDIT:
A 16x16 map at the lowest mipmap level (4x4 pixels per tile, 8 bytes per tile) takes up only 1 MB of texture memory. (Note that this is only twice as much as 1 pixel per tile uncompressed RGBA texture.)
So maybe just forcefully keeping at least the lowest mipmap level in video RAM already solves it, though IMO it would still be nice if everything was made persistent in video RAM whenever it fits.
I wouldn't mind the option to urn off detail textures and the option to use something similar to what I said all the time regardless of distance from the map (aka not loading map textures at all beyond generating the minimap and using the mini map instead and having no map textures beyond using coloured vertexes).
It would help on the lower end spectrum.
It would help on the lower end spectrum.
Ressing threads is bad noize, you should know better. 
j/k I wonder if this has something to do with how the units are drawn, as the FPS on an empty map seems appropriately high for the 8 series cards, but it seems to drop much more precipitously than on the 7 series cards for example. This is even with ground decals turned to low/off.

j/k I wonder if this has something to do with how the units are drawn, as the FPS on an empty map seems appropriately high for the 8 series cards, but it seems to drop much more precipitously than on the 7 series cards for example. This is even with ground decals turned to low/off.