2018-12-12 07:13 CET

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0005951Spring engineGeneralpublic2018-11-21 19:14
ReporterGoogle_Frog 
Assigned ToKloot 
PrioritynormalSeveritymajorReproducibilityalways
StatusresolvedResolutionfixed 
Product Version104.0 +git 
Target VersionFixed in Version104.0 +git 
Summary0005951: Significant performance regression from 104.0.1-286-gd47727e maintenance to 104.0.1-309-ga8b5ffc maintenance
DescriptionI bisected a performance regression. Look at the screenshot and watch this video: https://www.youtube.com/watch?v=_GIhxik29z4

I can rarely run 104.0.1-309-ga8b5ffc maintenance faster than 3x speed at the very start of the game. This matches the reports from the last week of people falling behind the server late game. The performance of 104.0.1-309-ga8b5ffc maintenance is also significantly worse at around 600 units. The bisection is not tighter because the builds are missing or have a UI-killing problem with something related to teamRulesParams and player custom keys.

The bottom of the profiler is cut off. I hope nothing important is down there.
TagsNo tags attached.
Attached Files

-Relationships
+Relationships

-Notes

~0018971

Google_Frog (reporter)

I built 104.0.1-308-g17ff6c0 maintenance and it works. Therefore this commit causes the performance regression: https://github.com/spring/spring/commit/a8b5ffc86351680c6e0e8d7e8db161e63dbb912e

Video: https://youtu.be/6ev_HXlZCKg

~0018972

Google_Frog (reporter)

Would an even lower GC frequency result in even better performance?

~0018974

Kloot (developer)

Last edited: 2018-04-04 11:54

View 2 revisions

Obviously, and the OOM error that would soon follow (and take one or more Lua states down with it) would free up even more CPU time. Garbage collection always has to be just slightly ahead of the garbage accumulation rate or the game will break, especially while catching up.


"I can rarely run 104.0.1-309-ga8b5ffc maintenance faster than 3x speed at the very start of the game"

For me the speed cap hovers at 5x, which jumps to 8x by disabling all widgets and disappears after a /luaui disable. This is specific to ZK, BA easily reaches 100x actual sim-speed (tested with /setmaxspeed 100) despite a8b5ffc. I even backported the gc commit into 151-g11de57d (used by BA) and noticed no degradation when watching large team game demos.

The only possible conclusion is that ZK code causes an excessive amount of allocation pressure which should be fixed game-side. Note also that the old frequency was exactly the same (30Hz) as it is now so long as the game stays at 1x speed.

~0018975

Google_Frog (reporter)

Performance is much worse at 1x speed as well so the commit clearly has an effect at 1x speed. There were an insignificant number of OOM errors compared to the cost in performance.

~0018976

Kloot (developer)

https://springrts.com/phpbb/viewtopic.php?f=11&t=36779

mass rejoin desyncs caused by OOM's are not insignificant under *any* definition.

the first thing you should be doing is to find out why /luaui disable enables a 20-fold increase in sim-speed in ZK.

~0018977

Google_Frog (reporter)

That thread could be caused by anything. There are no logs, a wide range of things could have gone wrong.

~0018978

Google_Frog (reporter)

I did /luaui disable, what am I meant to be seeing? I do not see anything near a 20-fold increase in sim-speed (is that really what you meant?) or FPS.
 * https://youtu.be/iVY220eglus
 * https://youtu.be/c9eaeyzCh9Q

~0018979

Kloot (developer)

Last edited: 2018-04-04 16:54

View 2 revisions

What exactly is ambiguous or unclear about

[quote]
For me the speed cap hovers at 5x, which jumps to 8x by disabling all widgets and disappears after a /luaui disable. This is specific to ZK, BA easily reaches 100x actual sim-speed (tested with /setmaxspeed 100) despite a8b5ffc."
[/quote]

?

To be absolutely 100 percent sure I get the point across: if I issue /setmaxspeed 100 and then increase the game speed, ZK maxes out at 5x (as shown by the {Current,Wanted}SimSpeedMult line in /debug, with just two commanders on the map) *before* /luaui disable but will go up to 100x *after* disabling LuaUI and LuaRules. BA has no such problems.

~0018981

Google_Frog (reporter)

The ambiguity is that the situation has changed from 6 AIs on CCR with around 550 units to a pair of immobile commanders. Anyway the change in game speed for luaui disable is about 10 to 15 for me. A potential difference is that I have LuaMenu running.
 * https://youtu.be/L_Q4LMVXCc4
 * https://youtu.be/3yODwPeXjWY (non-water)

~0018982

Google_Frog (reporter)

I don't see that the situation of two commanders jumping from 5x to 100x is particularly important. It will depend on how much time is spent rendering the map. Are you using a very simple map? Otherwise we've likely got some platform specific difference.

I ran the test again with no lua states vs only lua UI. The game speed went from around 11 to around 17. Video: https://youtu.be/Z0a_TLOv7rA

~0018983

Google_Frog (reporter)

I did the lone commander test on 104.0.1-308-g17ff6c0 maintenance and found that a8b5ffc causes a 40x reduction in maximum game speed with all lua disabled*. I was able to reach a sim speed of around 700x with no lua states and a speed hovering around 400 to 500 with luaui. Video: https://youtu.be/4yRxwzg5NOw

* Technically, luamenu just has its widgets disabled as there is no nice way to disable the state. However, it shows 0.00% usage in the debug view so I count it as disabled.

~0018984

Kloot (developer)

Last edited: 2018-04-04 17:41

View 2 revisions

I was testing on CCRv3, but map rendering time is *entirely irrelevant* here and you can put the camera down in some corner to eliminate it.


"I don't see that the situation of two commanders jumping from 5x to 100x is particularly important."

Do you really not understand the importance of having reserves?

If the game is limited to 5x speed *right at the start*, that means the available performance buffer is already *minimal*. This would normally be consumed as the game state gets larger, but in ZK's case there is no reserve *left*.


"I did the lone commander test on 104.0.1-308-g17ff6c0 maintenance and found that a8b5ffc causes a 40x reduction in maximum game speed with all lua disabled*"

The collector still does work even when all Lua states are gone, multiplied by the sim speed. There is no point to optimize for that situation.

~0018985

Google_Frog (reporter)

You are ignoring my data. I find nothing to back up these statements:
 * "To be absolutely 100 percent sure I get the point across: if I issue /setmaxspeed 100 and then increase the game speed, ZK maxes out at 5x (as shown by the {Current,Wanted}SimSpeedMult line in /debug, with just two commanders on the map) *before* /luaui disable but will go up to 100x *after* disabling LuaUI and LuaRules. BA has no such problems."
 * "...but in ZK's case there is no reserve *left*."
Watch this: https://youtu.be/Z0a_TLOv7rA

When I zoom to an empty patch of ground the difference is 10x -> 31x with both luarules and luaui toggled. Your max gamespeed test is an ok benchmark but it will exaggerate the effect of things that occur on each gameframe as opposed to each update. In realistic situations the update and drawing may be the primary load.

Do you have any data about the frequency of lua OOM related crashes and desyncs?

~0018986

Kloot (developer)

Last edited: 2018-04-04 18:37

View 3 revisions

1) I can't read text in a 360p video
2) I can test ZK's performance right here
3) any frequency > 0 warrants action, and I don't like the implicit suggestion that provable events below some threshold should be ignored
4) check your own crash report history:

https://github.com/ZeroK-RTS/CrashReports/issues/3420
https://github.com/ZeroK-RTS/CrashReports/issues/3293
https://github.com/ZeroK-RTS/CrashReports/issues/3678
https://github.com/ZeroK-RTS/CrashReports/issues/3744

(not the only cases, I don't feel like digging them all up)

5) the simulation ("things that occur on each gameframe") dominates update and draw cost in bigger games, and has done so since forever
6) "I find nothing to back up these statements"

do you really expect me to create videos too, or are you implying yours are the only valid results?

~0018987

Google_Frog (reporter)

You don't have to go so far as videos, but we are getting very different effects so I'd like to know more about how to reproduce. Perhaps there is some settings or version mismatch. The videos become readable if you wait a while, as Youtube takes a bit of time to process the higher resolutions.

I checked my crash report history earlier and, without the knowledge to process each of the alloc errors, I can't see a trend towards their reduction (look at this filter https://github.com/ZeroK-RTS/CrashReports/issues?page=1&q=is%3Aissue+LUA_ERRMEM&utf8=%E2%9C%93). It is not so clear to me that the crashes are proven (or important) because they are rare and possibly have other causes. There has to be some sufficiently low crash frequency for which it is not worth significantly impacting the performance of every game. The openGL update seems to support this approach.

The alloc errors could be due to other bugs or just computers which are too weak. For example, look at this game:
 * https://github.com/ZeroK-RTS/CrashReports/issues/3420
 * http://zero-k.info/Battles/Detail/486188
That game was played with 100x resource multiplier and 0.1x terraform cost. The alloc issue is most likely due to some ridiculous terraforming, which is tricky to fix and basically never comes up. My lua memory usage never exceeded 100MB in that game so perhaps the OOM was on a single frame. I don't see how the player used 10x more memory in synced luarules than me or how more frequent GC is going to solve this completely.

If the issue is catch-up OOM then my stance is that a rare catch-up bug is a very good price to pay for a performance improvement. The vast majority of catch-ups are spectators and, compared to players, I care a lot less about their one-off bugs. Making catch-up more reliable caused some players to fall behind the game so here I'd say speed is more important for reliability.

https://github.com/spring/spring/commit/eafe606a587c4e691d46f46135caa4b315d5a523 seems a bit better, but not as good as before, especially for catch-up speed. I'll apply and it see what people think. Also, the switch to GC every simframe seemed very drastic. It has no proven benefits compared to a smaller change while comping with clearly proven downsides. Why not GC every five simframes and see if it works?

~0018988

silentwings (reporter)

> It is not so clear to me that the crashes are proven (or important)

I can cofirm that BA recently saw problematic lua OOM-killed issues, for both specs and players and catch-uppers, from a rate of occurence of zero before. I am not sure of what/where changed to cause it.

~0018989

Kloot (developer)

Last edited: 2018-04-05 11:50

View 2 revisions

"Also, the switch to GC every simframe seemed very drastic."

To reiterate once again: the original gc rate was fixed at 30 times *per second* (read the code), a8b5ffc changed it to a variable (30 x speedfactor) times per second. Therefore the only possible impact is on clients running *faster* than 1x speed, for example during catchup where an increased gc level is required.

If you claim that 30Hz is somehow magically not equal to 30*1Hz ("Performance is much worse at 1x speed as well"), I expect to see beyond extraordinary evidence.

~0019056

Kloot (developer)

it seems actual benchmark data will not be forthcoming and ZK has once again married itself to an old build so we're done here.

~0019480

Google_Frog (reporter)

I'm back and interested in using latest maintenance. I made a simple benchmarker that players can run with a single button.

The benchmark runs these three replays:
https://github.com/ZeroK-RTS/Chobby/tree/master/LuaMenu/configs/gameConfig/zk/defaultSettings/demos

The benchmark collects and write data using widgets in this mutator:
https://github.com/ZeroK-RTS/Chobby/blob/master/LuaMenu/configs/gameConfig/zk/defaultSettings/games/zerokBenchmark.sdz

This chobby widget coordinates the running of the benchmarks:
https://github.com/ZeroK-RTS/Chobby/blob/master/LuaMenu/widgets/gui_benchmark_handler.lua

The settings for the benchmark are set here:
https://github.com/ZeroK-RTS/Chobby/blob/master/LuaMenu/configs/gameConfig/zk/defaultSettings/benchmarks/config.lua

The benchmarks are run in a random order and in a standalone Spring instance. Maintenance 308 and 309 sync, so exactly the same thing occurs in each replay. Maintenance does not sync, so ignore all the data collected for 287.

70 people have run the single run version of the benchmark. The performance of Spring on identical replays seems to vary quite a bit, with individual results ranging from saying that 309 is 38% slower than 308, to 308 being 27% slower than 309. The results average out to 309 being 0.7% slower than 308. The data from one player reported dt = -10.182842 was sent by Widget:Update(dt), so there must be a bug in that callin.

The results from the repeated benchmark seem much more reliable. The duplicate benchmark runs each test case 20 times and was run by 8 people (besides me). The performance impact of 309 ranges from 1.1% to 11.2% in this sample, with an overall average of 3.7%. I ran this benchmark three times, with an average performance impact of 2%.

The performance impact of 309 is real, and I am sure I could generate a larger effect by having the benchmark interact with the UI. Furthermore, I consider the much reduced catchup performance to be a problematic regression, given that I have seen no evidence of any benefits for Zero-K. The OOM crash reports are infrequent and were occurring with the approximately the same frequency during the period in which ZK was using an engine version after 309. We have been running 287 for many months and I have heard zero complaints about crashes or desyncs due to rejoining. Over such a time period, this sort of bug is something I definitely would have heard about (i.e. had yelled at me by players). A perusal of the ZK crash reports suggests that most of the OOM errors could be solved by removing the metalmult, energymult, and terraformcostmult modoptions.

If BA requires the GC change in order to not desync on rejoin, then make a modrule for it.

~0019484

Kloot (developer)

Last edited: 2018-11-07 17:08

View 2 revisions

I already added a command (/luagccontrol, can be Spring.SendCommand'ed on startup unlike a primitive modrule) to switch between the new and old GC modes dynamically sometime after you decided you were willing to risk destroying engine development even in the absence of hard data and went on your continental walkabout. Issuing this voids your warranty regarding future OOM reports, though considering all the other serious bugs you knowingly and unknowingly chose to live with by reverting I doubt that will sway you.

Since you are back, let me also say the following in case it was not crystal-clear before: I simply do not care to sustain any further interaction based on repeated no-confidence votes or second-guessing changes you deem unnecessary *for ZK* at every turn. My time is not some worthless commodity to be traded at your whim or taken for granted, and life really is too short to let anything Spring-related dominate it.

~0019489

Google_Frog (reporter)

Thanks, that is better than a modrule. Its existence could have been posted here though.

~0019568

Kloot (developer)

I mentioned it in 0005992 which was opened not long after this report.
+Notes

-Issue History
Date Modified Username Field Change
2018-04-04 08:04 Google_Frog New Issue
2018-04-04 08:05 Google_Frog File Added: performance.jpg
2018-04-04 09:30 Google_Frog Note Added: 0018971
2018-04-04 09:43 Google_Frog Note Added: 0018972
2018-04-04 11:41 Kloot Note Added: 0018974
2018-04-04 11:54 Kloot Note Edited: 0018974 View Revisions
2018-04-04 15:27 Google_Frog Note Added: 0018975
2018-04-04 15:38 Kloot Note Added: 0018976
2018-04-04 16:20 Google_Frog Note Added: 0018977
2018-04-04 16:37 Google_Frog Note Added: 0018978
2018-04-04 16:48 Kloot Note Added: 0018979
2018-04-04 16:54 Kloot Note Edited: 0018979 View Revisions
2018-04-04 17:06 Google_Frog Note Added: 0018981
2018-04-04 17:13 Google_Frog Note Added: 0018982
2018-04-04 17:22 Google_Frog Note Added: 0018983
2018-04-04 17:34 Kloot Note Added: 0018984
2018-04-04 17:41 Kloot Note Edited: 0018984 View Revisions
2018-04-04 18:00 Google_Frog Note Added: 0018985
2018-04-04 18:34 Kloot Note Added: 0018986
2018-04-04 18:36 Kloot Note Edited: 0018986 View Revisions
2018-04-04 18:37 Kloot Note Edited: 0018986 View Revisions
2018-04-05 05:45 Google_Frog File Added: OOMDesync report 3420.jpg
2018-04-05 05:45 Google_Frog Note Added: 0018987
2018-04-05 09:26 silentwings Note Added: 0018988
2018-04-05 11:46 Kloot Note Added: 0018989
2018-04-05 11:50 Kloot Note Edited: 0018989 View Revisions
2018-04-27 18:56 Kloot Assigned To => Kloot
2018-04-27 18:56 Kloot Status new => closed
2018-04-27 18:56 Kloot Resolution open => fixed
2018-04-27 18:56 Kloot Note Added: 0019056
2018-11-07 07:40 Google_Frog Status closed => feedback
2018-11-07 07:40 Google_Frog Resolution fixed => reopened
2018-11-07 07:40 Google_Frog Note Added: 0019480
2018-11-07 07:40 Google_Frog File Added: benchmarkDuplicate.ods
2018-11-07 07:40 Google_Frog File Added: benchmarkSingleCollection.ods
2018-11-07 07:41 Google_Frog File Added: benchmarkGooglefrog.ods
2018-11-07 17:03 Kloot Note Added: 0019484
2018-11-07 17:08 Kloot Note Edited: 0019484 View Revisions
2018-11-08 01:23 Google_Frog Note Added: 0019489
2018-11-08 01:23 Google_Frog Status feedback => assigned
2018-11-21 19:14 Kloot Status assigned => resolved
2018-11-21 19:14 Kloot Resolution reopened => fixed
2018-11-21 19:14 Kloot Fixed in Version => 104.0 +git
2018-11-21 19:14 Kloot Note Added: 0019568
+Issue History