Notes about performance, 105.0 vs 106.0 vs bar105 engines
Moderator: Moderators
Notes about performance, 105.0 vs 106.0 vs bar105 engines
I've done some basic testing to compare performance of 105.0 with the latest official engine release and the latest BAR105 engine release.
MF v1.80
comet catcher redux v2
disable luaui, disable luarules
(use tab hotkey for zoom out top view)
vsync off
--- spring_105.0_windows-64
spawn 5000 aven_magnum a bit north from the center (they take up most of the map)
zoomed out top view (icons) : ~100 fps, ~0.9 speed
zoomed in a bit until the models appear (but with as many models visible as possible) : 4 fps, ~0.3 speed
zoom out top view (icons) then order them to move south : freezes for a second then drops to ~10 fps for a few seconds then to ~90 fps, ~0.15 speed
~3.05 GB ram usage
--- BAR105.105.1.1-784_windows-64
spawn 5000 aven_magnum a bit north from the center (they take up most of the map)
zoomed out top view (icons) : ~100 fps, 1.0 speed
zoomed in a bit until the models appear (but with as many models visible as possible) : 22 fps, 1.0 speed
zoom out top view (icons) then order them to move south : freezes for a second then drops to ~10 fps for a few seconds then to ~90fps, ~0.15 speed
~3.75 GB ram usage
NOTE: a few days ago I had tested BAR105.105.1.1-769 and it had about 2/3 as much fps when zoomed out and showing icons, that issue has been fixed by Ivand and now it has about the same fps in my pc
--- spring_106.0_windows-64
spawn 5000 aven_magnum a bit north from the center (they take up most of the map)
zoomed out top view (icons) : ~160 fps, ~0.9 speed
zoomed in a bit until the models appear (but with as many models visible as possible) : 30 fps, ~0.82 speed
zoom out top view (icons) then order them to move south : freezes for a second then drops to ~10 fps for a few seconds then to ~120fps, ~0.15 speed
~3.35 GB ram usage
NOTE: i wrote "freeze" when i try to move the units, but it's more like a ~1s delay between issuing the order and them moving
There are some differences and the 106.0 seems faster than the BAR105, but breaks compatibility. The performance differences between BAR105 and 106.0 may be due to feature differences.
A key difference is that I get 5x FPS or more from either 106.0 or BAR105 relative to 105.0 when just looking at a scene with hundreds of units/features, nice!
MF v1.80
comet catcher redux v2
disable luaui, disable luarules
(use tab hotkey for zoom out top view)
vsync off
--- spring_105.0_windows-64
spawn 5000 aven_magnum a bit north from the center (they take up most of the map)
zoomed out top view (icons) : ~100 fps, ~0.9 speed
zoomed in a bit until the models appear (but with as many models visible as possible) : 4 fps, ~0.3 speed
zoom out top view (icons) then order them to move south : freezes for a second then drops to ~10 fps for a few seconds then to ~90 fps, ~0.15 speed
~3.05 GB ram usage
--- BAR105.105.1.1-784_windows-64
spawn 5000 aven_magnum a bit north from the center (they take up most of the map)
zoomed out top view (icons) : ~100 fps, 1.0 speed
zoomed in a bit until the models appear (but with as many models visible as possible) : 22 fps, 1.0 speed
zoom out top view (icons) then order them to move south : freezes for a second then drops to ~10 fps for a few seconds then to ~90fps, ~0.15 speed
~3.75 GB ram usage
NOTE: a few days ago I had tested BAR105.105.1.1-769 and it had about 2/3 as much fps when zoomed out and showing icons, that issue has been fixed by Ivand and now it has about the same fps in my pc
--- spring_106.0_windows-64
spawn 5000 aven_magnum a bit north from the center (they take up most of the map)
zoomed out top view (icons) : ~160 fps, ~0.9 speed
zoomed in a bit until the models appear (but with as many models visible as possible) : 30 fps, ~0.82 speed
zoom out top view (icons) then order them to move south : freezes for a second then drops to ~10 fps for a few seconds then to ~120fps, ~0.15 speed
~3.35 GB ram usage
NOTE: i wrote "freeze" when i try to move the units, but it's more like a ~1s delay between issuing the order and them moving
There are some differences and the 106.0 seems faster than the BAR105, but breaks compatibility. The performance differences between BAR105 and 106.0 may be due to feature differences.
A key difference is that I get 5x FPS or more from either 106.0 or BAR105 relative to 105.0 when just looking at a scene with hundreds of units/features, nice!
Last edited by raaar on 19 Jan 2022, 22:45, edited 1 time in total.
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
Did you run it with /luaui disable, out of curiosity?
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
yes, otherwise i'd be unable to compare with 106.0 as it breaks with my lua code.
disable luaui, disable luarules
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
I ran a similar test.
As an extra note in 106 I did /grounddetail 140 as MF auto-sets that upon initialization via widget.
--- spring_105.0_windows-64
zoomed out tab view (icons) : 45 fps
zoomed in until models just filled the screen: 3 fps
zoom out tab view (icons) then order them to move south : 1s freeze, few seconds of ~1fps, then ~10fps
--- spring_bar_.BAR105.105.1.1-769-g56f63fc_windows-64
zoomed out tab view (icons) : 32 fps
zoomed in until models just filled the screen: 15 fps
zoom out tab view (icons) then order them to move south : ~10fps (not jumpy like 105)
--- spring_106.0_windows-64
zoomed out tab view (icons) : 43 fps
zoomed in until models just filled the screen: ~23 fps
zoom out tab view (icons) then order them to move south : ~9 fps (not jumpy like 105)
Similar looking performance, but bar105 and 106 lack the hitching in movement like raaar experienced.
As an extra note in 106 I did /grounddetail 140 as MF auto-sets that upon initialization via widget.
--- spring_105.0_windows-64
zoomed out tab view (icons) : 45 fps
zoomed in until models just filled the screen: 3 fps
zoom out tab view (icons) then order them to move south : 1s freeze, few seconds of ~1fps, then ~10fps
--- spring_bar_.BAR105.105.1.1-769-g56f63fc_windows-64
zoomed out tab view (icons) : 32 fps
zoomed in until models just filled the screen: 15 fps
zoom out tab view (icons) then order them to move south : ~10fps (not jumpy like 105)
--- spring_106.0_windows-64
zoomed out tab view (icons) : 43 fps
zoomed in until models just filled the screen: ~23 fps
zoom out tab view (icons) then order them to move south : ~9 fps (not jumpy like 105)
Similar looking performance, but bar105 and 106 lack the hitching in movement like raaar experienced.
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
Another test, checking ram usage
MF v1.80
comet catcher redux v2
disable luaui, disable luarules
--- 105.0
2.75 GB
--- BAR105-769
3.42 GB
--- 106.0
3.06 GB
the new engines seem to consume more ram than 105.0, at least when the game starts, and BAR105 consumes the most.
EDIT: another player tested and got different results, but still showed more memory usage on the newer engines:
MF v1.80
comet catcher redux v2
disable luaui, disable luarules
--- 105.0
2.75 GB
--- BAR105-769
3.42 GB
--- 106.0
3.06 GB
the new engines seem to consume more ram than 105.0, at least when the game starts, and BAR105 consumes the most.
EDIT: another player tested and got different results, but still showed more memory usage on the newer engines:
[19:20:07] <Shruggoth> ima check what my RAM does
[19:20:58] <Shruggoth> 2.5GiB on CCR, spring 105
[19:21:21] <Shruggoth> how much RAM do you have, ximes?
[19:21:37] <Shruggoth> ooh
[19:21:39] <Shruggoth> 3.1GiB on 769
[19:21:50] <Shruggoth> so it demands an extra 600MiB for me
[19:22:35] <Shruggoth> Somehow
[19:22:41] <Shruggoth> 106 uses 4.2GiB
Last edited by raaar on 16 Jan 2022, 21:06, edited 2 times in total.
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
The new engines are x64, that naturally consumes more ram :)
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
hmm, but isn't 105.0 x64 as well?
Another thing i failed to notice on my first test is how much spring slowed down the game. I recorded fps, but not game speed.
Another thing i failed to notice on my first test is how much spring slowed down the game. I recorded fps, but not game speed.
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
Thanks for doing this and describing your results!
Given what you describe, at least for me it'll prob be things other than performance that I'd mostly be considering (as I've said to some people in private already) if I need to make the choice (for example compatibility issues as you mentioned; or my guestimate of support and future development, etc. - though thats obviously a whole different can of worms).
Still, definitely nice to have some semi-objective numbers out there :).
Given what you describe, at least for me it'll prob be things other than performance that I'd mostly be considering (as I've said to some people in private already) if I need to make the choice (for example compatibility issues as you mentioned; or my guestimate of support and future development, etc. - though thats obviously a whole different can of worms).
Still, definitely nice to have some semi-objective numbers out there :).
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
Thanks, these are basic tests though. It'd be nice to do proper benchmarking.
I've updated the tests to use "vsync off", show the speed multipliers and ram usage and to use the latest BAR105 release which has a fix for a performance issue with rendering icons.
I've updated the tests to use "vsync off", show the speed multipliers and ram usage and to use the latest BAR105 release which has a fix for a performance issue with rendering icons.
-
- Posts: 9
- Joined: 13 Jun 2022, 17:39
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
It has been about year since you run the tests, raaar. I would be interested to see how your performance tests show how the engines are doing now.
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
I doubt rendering performance changed that much. BAR105 is probably bottlenecked by the terrain rendering that no one touched since 105 times. This can be looked at, but we are not in the "highest FPS wins" business:
No one (hopefully) runs a game without LuaUI and additional effects, once these are commissioned BAR105 should see much lesser impact on FPS than competitors, just because the way we architectured things: engine model buffers and transformation matrices are always exposed on GPU, available for use from Lua Shader and basically "free" to use; all UI geometry is uploaded once and reused afterwards.
Just have a look at the number of effects BAR as game has and FPS it shows with all these effects in place. They all are made in Lua and cost peanuts to execute. For example unit headlights https://youtu.be/GEQmm4UhLAg?t=21 rely on choosing the lights position and direction solely inside a shader. No need to run expensive GetUnitPiecePosition/GetUnitPieceDirection/GetUnitPosition/etc. per unit.
Because we stress PCIe bus with uploading all matrix data to the GPU we may never reach "1800 FPS" level of 106.0, but we will always be better when a game will grow good amount of Lua "meat" around it. This assumes of course a game dev put efforts into using modern GL4 API and not sticking to old ways of doing things (which still exist, but perform on the same level as on 105.0 engine).
As far as sim is concerned BAR105 should blow anything else out of the water. Multi-threaded pathfinding is novel in BAR105, so as multi-threaded collision handling. And these two items had the highest computation cost in late game scenarios.
No one (hopefully) runs a game without LuaUI and additional effects, once these are commissioned BAR105 should see much lesser impact on FPS than competitors, just because the way we architectured things: engine model buffers and transformation matrices are always exposed on GPU, available for use from Lua Shader and basically "free" to use; all UI geometry is uploaded once and reused afterwards.
Just have a look at the number of effects BAR as game has and FPS it shows with all these effects in place. They all are made in Lua and cost peanuts to execute. For example unit headlights https://youtu.be/GEQmm4UhLAg?t=21 rely on choosing the lights position and direction solely inside a shader. No need to run expensive GetUnitPiecePosition/GetUnitPieceDirection/GetUnitPosition/etc. per unit.
Because we stress PCIe bus with uploading all matrix data to the GPU we may never reach "1800 FPS" level of 106.0, but we will always be better when a game will grow good amount of Lua "meat" around it. This assumes of course a game dev put efforts into using modern GL4 API and not sticking to old ways of doing things (which still exist, but perform on the same level as on 105.0 engine).
As far as sim is concerned BAR105 should blow anything else out of the water. Multi-threaded pathfinding is novel in BAR105, so as multi-threaded collision handling. And these two items had the highest computation cost in late game scenarios.
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
How representative of engine performance is it to just spawn in a bunch of units and look at them vs running a replay of big battles?
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
There's nothing to look at in the sense that a 106 game doesn't exist (at least something I would call a game). And even if it existed it wouldn't work on BAR105 or 105.
So the least common denominator is to spawn a bunch of units with disabled LuaUI and LuaRules. In case their unitscripts are in COB/BOS they can even move and shoot.
This can be best described as baseline performance. But it won't give a single clue how fast the perf is going to deteriorate in case gfx & UI widgets/gadgets are added. This is as far as rendering is concerned.
Sim should be better with BAR105 because two perf critical pieces of code were MTed.
So the least common denominator is to spawn a bunch of units with disabled LuaUI and LuaRules. In case their unitscripts are in COB/BOS they can even move and shoot.
This can be best described as baseline performance. But it won't give a single clue how fast the perf is going to deteriorate in case gfx & UI widgets/gadgets are added. This is as far as rendering is concerned.
Sim should be better with BAR105 because two perf critical pieces of code were MTed.
-
- Posts: 9
- Joined: 13 Jun 2022, 17:39
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
It's just a bit of fun. Relative figures are always nice to see. Who knows? It may show a regression, which would be good to know about.
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
I've done some testing setting up spring on a desktop pc nearby
ubuntu 22.0.4
gpu nvidia geforce gt 1030 (GL 4.3 compatibility)
8 gb ram
amd 1-10 7800 radeon r7 cpu
kind of a toaster by today's standards. With 1920x1080 fullscreen and spawning 100 aven_magnum on MF v2.00 on DSD, I get 15-20 fps on both 105.0, BAR105 1478 and 106.0, even with details set to low and luaui disabled.
In MF terms, "low" settings means
Spring.SendCommands("disticon 130")
Spring.SendCommands("water 0")
Spring.SendCommands("shadows 0")
Spring.SetConfigInt("ShadowMapSize",0,false)
Spring.SendCommands("softparticles 0")
Spring.SetConfigInt("MaxParticles",20000,false)
Spring.SetConfigInt("MaxNanoParticles",10000,false)
Spring.SendCommands("grounddetail 100")
Spring.SetConfigInt("GroundDecals",0,false)
Spring.SetConfigInt("GroundScarAlphaFade",0,false)
Spring.SetConfigFloat("snd_airAbsorption",0.0,false)
Spring.SetConfigInt("UseSDLAudio",1,false)
Spring.SetConfigInt("UseEFX",0,false)
Spring.SetConfigInt("DynamicSky",0,false)
Spring.SetConfigInt("GrassDetail",0,false)
Spring.SetConfigInt("3DTrees",0,false)
Spring.SetConfigInt("AdvMapShading",0,false)
Spring.SetConfigInt("AdvUnitShading",0,false)
Spring.SetConfigInt("CompressTextures",1,false)
Spring.SetConfigInt("HighResInfoTexture",0,false)
Spring.SetConfigInt("LuaShaders",1,false)
Spring.SetConfigInt("ROAM",2,false)
Spring.SetConfigInt("MSAALevel",0,false)
It seems the new versions don't do much to help this toaster :/
ubuntu 22.0.4
gpu nvidia geforce gt 1030 (GL 4.3 compatibility)
8 gb ram
amd 1-10 7800 radeon r7 cpu
kind of a toaster by today's standards. With 1920x1080 fullscreen and spawning 100 aven_magnum on MF v2.00 on DSD, I get 15-20 fps on both 105.0, BAR105 1478 and 106.0, even with details set to low and luaui disabled.
In MF terms, "low" settings means
Spring.SendCommands("disticon 130")
Spring.SendCommands("water 0")
Spring.SendCommands("shadows 0")
Spring.SetConfigInt("ShadowMapSize",0,false)
Spring.SendCommands("softparticles 0")
Spring.SetConfigInt("MaxParticles",20000,false)
Spring.SetConfigInt("MaxNanoParticles",10000,false)
Spring.SendCommands("grounddetail 100")
Spring.SetConfigInt("GroundDecals",0,false)
Spring.SetConfigInt("GroundScarAlphaFade",0,false)
Spring.SetConfigFloat("snd_airAbsorption",0.0,false)
Spring.SetConfigInt("UseSDLAudio",1,false)
Spring.SetConfigInt("UseEFX",0,false)
Spring.SetConfigInt("DynamicSky",0,false)
Spring.SetConfigInt("GrassDetail",0,false)
Spring.SetConfigInt("3DTrees",0,false)
Spring.SetConfigInt("AdvMapShading",0,false)
Spring.SetConfigInt("AdvUnitShading",0,false)
Spring.SetConfigInt("CompressTextures",1,false)
Spring.SetConfigInt("HighResInfoTexture",0,false)
Spring.SetConfigInt("LuaShaders",1,false)
Spring.SetConfigInt("ROAM",2,false)
Spring.SetConfigInt("MSAALevel",0,false)
It seems the new versions don't do much to help this toaster :/
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
I've had people tell me the new version runs slow on linux.
Ximes (old laptop with 2 cpu cores (4 logical), 4 GB ram and gentoo linux) built BAR105 1478 locally and he's able to play, although sometimes runs out of memory.
He noticed a strange issue:
[.....:32] <ximes> <raaar> so what settings did you use when you got the first one, https://imgur.com/a/WNzSBp3 <<< here I didn't modify any setCoreAffinity setting
[17:17:18] <ximes> then
[17:17:21] <ximes> <ximes> https://imgur.com/a/7933MIy <<< after I set the mask manually with the command "taskset -a -p 0f 14944"
[17:18:26] <ximes> the only settings that may be interfering with it are:
[17:18:26] <ximes> PathingThreadCount = 4
[17:18:27] <ximes> WorkerThreadCount = 4
[17:18:35] <ximes> I would have to test it again to be sure
[17:18:54] <ximes> 1 min
[17:20:42] <ximes> [14:18:27] <ximes> PathingThreadCount = 4
[17:20:42] <ximes> [14:18:27] <ximes> WorkerThreadCount = 4
[17:21:04] <ximes> <<< nope, I get the same problem.... mask set to 0b1000.... all threads running on cpu #3
[17:22:41] <ximes> tbh... nowadays no app sets core affinity without a very strong reason... spring engine may have done it in the past, but it got somehow broken now
Regardless of whether he set or not the "setCoreAffinity" setting, all threads would end up running on one cpu. He said after changing the cpu affinity to use the other cores, performance improved a bit (like +10-50% fps on a sandbox watching 150 units)
I couldn't reproduce the issue on the computer i mentioned in the previous post with ubuntu os, there i get affinity flags 1,2,4 and f on the spring threads, mostly f.
EDIT: seems issue had been talked about on BAR discord and there was an attempted fix on 8 jan (https://github.com/beyond-all-reason/spring/issues/575)
EDIT : after some testing it seems springlobby is to blame : I get more variety of affinity masks for various threads when running either the bar105 or the 105 binary directly and got all "8" and one "f" twice when running from springlobby, but it's inconsistent.
Ximes (old laptop with 2 cpu cores (4 logical), 4 GB ram and gentoo linux) built BAR105 1478 locally and he's able to play, although sometimes runs out of memory.
He noticed a strange issue:
[.....:32] <ximes> <raaar> so what settings did you use when you got the first one, https://imgur.com/a/WNzSBp3 <<< here I didn't modify any setCoreAffinity setting
[17:17:18] <ximes> then
[17:17:21] <ximes> <ximes> https://imgur.com/a/7933MIy <<< after I set the mask manually with the command "taskset -a -p 0f 14944"
[17:18:26] <ximes> the only settings that may be interfering with it are:
[17:18:26] <ximes> PathingThreadCount = 4
[17:18:27] <ximes> WorkerThreadCount = 4
[17:18:35] <ximes> I would have to test it again to be sure
[17:18:54] <ximes> 1 min
[17:20:42] <ximes> [14:18:27] <ximes> PathingThreadCount = 4
[17:20:42] <ximes> [14:18:27] <ximes> WorkerThreadCount = 4
[17:21:04] <ximes> <<< nope, I get the same problem.... mask set to 0b1000.... all threads running on cpu #3
[17:22:41] <ximes> tbh... nowadays no app sets core affinity without a very strong reason... spring engine may have done it in the past, but it got somehow broken now
Regardless of whether he set or not the "setCoreAffinity" setting, all threads would end up running on one cpu. He said after changing the cpu affinity to use the other cores, performance improved a bit (like +10-50% fps on a sandbox watching 150 units)
I couldn't reproduce the issue on the computer i mentioned in the previous post with ubuntu os, there i get affinity flags 1,2,4 and f on the spring threads, mostly f.
EDIT: seems issue had been talked about on BAR discord and there was an attempted fix on 8 jan (https://github.com/beyond-all-reason/spring/issues/575)
EDIT : after some testing it seems springlobby is to blame : I get more variety of affinity masks for various threads when running either the bar105 or the 105 binary directly and got all "8" and one "f" twice when running from springlobby, but it's inconsistent.
-
- Posts: 9
- Joined: 13 Jun 2022, 17:39
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
If you send me a copy of an infolog where all the threads were on one core, I'll take a look and see if there's anything obvious engine side.
Re: Notes about performance, 105.0 vs 106.0 vs bar105 engines
I tested starting spring 105 and bar105 1478 from skylobby, from springlobby and directly 5 times each and checked thread info using
ps --pid PID -O tid,lwp,nlwp,%cpu,psr -L
taskset --all-tasks -p PID
mf v2.02, lonely oasis v1.1 map
----- skylobby 0.9.26
skylobby itself has mask "f" and its threads get assigned to cpus 0 to 3
- 105.0
affinity masks 1,2,c,f (mostly f) threads assigned to various cores (1 core 90%)
- bar105 1478
affinity masks 1,2,4,f (mostly f) threads assigned to various cores (1 core at 50%)
----- springlobby 0.274
springlobby's threads have masks 8 and f and are assigned cpus 0 to 3
apparently when i run spring it adds a thread with mask 8
- 105.0
affinity masks 1,2,c,f (mostly f) threads assigned to various cores (1 core at 90%)
- bar105 1478
affinity masks all 8 except f on 3rd all threads on core 3
----- running the spring binary directly
- 105.0
affinity masks 1,2,c,f threads assigned to various cores
- bar105 1478
affinity masks 1,2,4,f (mostly f) threads assigned to various cores
So indeed it seems that starting bar105 1478 from springlobby has the "all in 1 core" issue but 105.0 doesn't.
infolog in attachment
ps --pid PID -O tid,lwp,nlwp,%cpu,psr -L
taskset --all-tasks -p PID
mf v2.02, lonely oasis v1.1 map
----- skylobby 0.9.26
skylobby itself has mask "f" and its threads get assigned to cpus 0 to 3
- 105.0
affinity masks 1,2,c,f (mostly f) threads assigned to various cores (1 core 90%)
- bar105 1478
affinity masks 1,2,4,f (mostly f) threads assigned to various cores (1 core at 50%)
----- springlobby 0.274
springlobby's threads have masks 8 and f and are assigned cpus 0 to 3
apparently when i run spring it adds a thread with mask 8
- 105.0
affinity masks 1,2,c,f (mostly f) threads assigned to various cores (1 core at 90%)
- bar105 1478
affinity masks all 8 except f on 3rd all threads on core 3
----- running the spring binary directly
- 105.0
affinity masks 1,2,c,f threads assigned to various cores
- bar105 1478
affinity masks 1,2,4,f (mostly f) threads assigned to various cores
So indeed it seems that starting bar105 1478 from springlobby has the "all in 1 core" issue but 105.0 doesn't.
infolog in attachment
- Attachments
-
- infolog_starting_from_springlobby.txt
- (57.67 KiB) Downloaded 144 times