2025-08-06 09:06 CEST

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0005398Spring engineGeneralpublic2016-11-22 19:28
ReporterPepeAmpere 
Assigned Tohokomoko 
PrioritynormalSeverityminorReproducibilitysometimes
StatusclosedResolutionno change required 
Product Version103.0 
Target VersionFixed in Version 
Summary0005398: Desync after taking units
DescriptionWatch replay attached. Observe that there was desync once one player tried to take units of other player.

more info and replay file: https://trello.com/c/HeAvDetv/

[f=0038455] <kmar> Allies: I took the abandoned units.
[f=0038455] kmar unpaused the game
[f=0039772] Sync error for fail0r in frame 39763 (got da36fef5, correct is 5f412aaf)
[f=0039789] Error: [DESYNC WARNING] checksum d75b7c35 from demo player 0 (kmar) does not match our checksum 84b0ad4c for frame-number 39780
Steps To Reproduce1) Watch replay
2) Observe desync (only in infolog)
TagsNo tags attached.
Checked infolog.txt for Errors
Attached Files

-Relationships
+Relationships

-Notes

~0016883

hokomoko (developer)

There's a known desync in 103.0 for win vs. linux.
If that's the case here it's already fixed

~0016884

PepeAmpere (reporter)

All players playing that match were on windows machines and also our host is using spring-dedicated.exe (windows machine)

~0016885

hokomoko (developer)

kmar uses linux as far as I know

~0016886

hokomoko (developer)

also, infologs are missing from the report

~0016887

PepeAmpere (reporter)

Really, all players are running windows instances.

Attaching another replay + infolog

~0016888

PepeAmpere (reporter)

```
[f=0032158] <pow> this hil?
```

and

```
[f=0032163] Sync error for kmar in frame 32159 (got 7b5f9d00, correct is b5388941)
```

So desync happens one frame after unpause so "catching-up" was not successfull

~0016889

PepeAmpere (reporter)

```
[f=0032158] pow unpaused the game
```

and

```
[f=0032163] Sync error for kmar in frame 32159 (got 7b5f9d00, correct is b5388941)
```

sorry, bad cite

~0016890

hokomoko (developer)

catching up is irrelevant.

~0016891

PepeAmpere (reporter)

second infolog from the same game from unsynced player: https://trello-attachments.s3.amazonaws.com/54061fc793bf591aee00f4d2/582ee7e9037ec07512358d03/0b6485ca14ac3914efd8a8120529541f/infolog.txt

~0016892

hokomoko (developer)

it is full of lua errors

~0016893

hokomoko (developer)

Last edited: 2016-11-21 20:34

View 3 revisions

Please check all infologs in advance before reporting a desync as this could save me from compiling a debug build locally. (my folder is usually on latest spring and not on 103.0)
However, since I did compile said build, I can report the following errors:
[f=-000001] Chili-Error in `Chili Nota UI`:progressbar7 : [string "LuaUI/Widgets/chili/Headers/skinutils.lua"]:584: bad argument 0000005 to 'ClipPlane' (number expected, got NAN (check your code for div0))
[f=-000001] stacktrace:
    [C]: in ClipPlane
    [string "LuaUI/Widgets/chili/Headers/skinutils.lua"]:584
    (tail call): in [?]
    [string "LuaUI/Widgets/chili/Controls/control.lua"]:677: in _UpdateOwnDList
    ...
    [string "-------------------------------------------..."]:79
    [string "LuaUI/Widgets/api_chili.lua"]:89


[f=-000001] [widgets.lua] Error: false
[f=-000001] [widgets.lua] Error: Error in SelectionChanged(): [string "LuaUI/Widgets/gui_select_army_only.lua"]:85: attempt to index local 'ud' (a nil value)
[f=-000001] [widgets.lua] Error: Removed widget: Select army only


(note that these are not the ones present in the infolog you have posted above)

~0016894

PepeAmpere (reporter)

Hokomoko - I do not understand your response.

1) This issue was reported as issue of the 103.0 product version (as you can see in header)
2) I do not understand your resolution - if I understand the system correctly, no widget error should cause desync of the simulation because it is unsynced stuff.

Btw, my theory was this:

I observe some errors on kmars machine which are not on my machine. I guess it is connected with perf-warrning messages, which appeared only on kmars machine, because he simulates orders of other players as usual + using some own micro-widgets which may spam commands requests (read or write queue). In result only on his machine some commands are dropped or not sent to other players which results in desync.

Anyway i think this is worth of solving on side of the engine. Engine should report same error in all synced scripts on all machines.

~0016895

hokomoko (developer)

Last edited: 2016-11-21 20:55

View 2 revisions

[f=0028857] Error: LuaRules::RunCallIn: error = 2, GameFrame, [Internal Lua error: Call failure] [string "LuaRules/Gadgets/unit_reverse.lua"]:201: [GetUnitCommands] called too often without a 2nd argument to define maxNumCmds returned in the table, please check your code!
Especially when you only read the first cmd or want to check if the queue is non-empty, this can be a huge performance leak!

stack traceback:
    [C]: in function 'spGetUnitCommands'
    [string "LuaRules/Gadgets/unit_reverse.lua"]:201: in function 'GameFrame'
    [string "LuaRules/gadgets.lua"]:931: in function <[string "LuaRules/gadgets.lua"]:929>
    (tail call): ?


LuaRules => Synced
Fix your code please, the message is very informative.
Also there are NaN errors.

It's nice that you say widgets shouldn't affect sync but that is not the situation.
If you give faulty input to the engine (anywhere) it can cause desyncs.
I will not investigate any desync which had lua errors in the infolog.

P.S. your theory is incorrect.

~0016896

StickyStains (reporter)

The problem with checking all input for desync-correctness is performance?
Or codebase inflation?
Just curious.

~0016897

hokomoko (developer)

Last edited: 2016-11-21 22:13

View 2 revisions

performance

EDIT: and also a bit of principle. I can't treat all lua devs like bumbling fools who can't make sure what they send to the engine is correct. Content devs have responsibility too.

~0016898

Forboding Angel (reporter)

The specific error that you're speaking of, hoko, is a widget, which as you know is unsynced.

Why would a widget crashing cause a sync error unless a poorly contrived gadget (In other words, lua created without error catching, which seems to be a common theme among spring lua devs) were depending upon info sent across unsynced<->synced?

~0016899

Forboding Angel (reporter)

In case it isn't clear, None of the above is sarcasm. It is an actual question and I would very much like to know the answer because it's illuminating wrt how the engine functions.

~0016900

hokomoko (developer)

look 3 messages above you.
It contains the word 'gadgets' 4 times and the word 'LuaRules' 5 times. So it's definitely not a widget.

How can unsynced affect synced? Let's say you give an order for a unit to move to {NaN, NaN, NaN}, this order may go straight to the network and then distributed between all players, entering the synced part of the engine.

~0016901

PepeAmpere (reporter)

Calm down hoko.

Read your comments above. In first comment you were citing only Widget error, not Gadget error, that why I was curious.

I agree that gadget error may be reason to refuse to investigate on your side, but the problem is that error happens only on the unsynced machine, noone else have this in infolog.

And I believe engine should be robust enough so no widget will kill the sync of game. That's my user understanding of thing and I cannot imagine we could agree on something else.

Your example with {NaN, NaN, NaN} - this should not been accepted by anyone in network, i do not think so this is something which is technically impossible to archive and if there is such problem, it is regular bug.

~0016902

hokomoko (developer)

1) It's a shame you didn't read the comment you're referring to. I first said there are errors in the infolog, and only then reported you another error.
2) It's fine that that's what you believe, I disagree.

~0016903

hokomoko (developer)

Let me state this clearly:
Having lua errors in the infolog voids your desync warranty for that specific match.

And before anyone has silly ideas:
Removing lua errors from an infolog before reporting voids your desync warranty for life.

~0016904

PepeAmpere (reporter)

> Having lua errors in the infolog voids your desync warranty for that specific match.

understand

> Removing lua errors from an infolog before reporting voids your desync warranty for life.

no idea what you talk about :) I have maybe 5 translations of this sentence in my mind and Im still not sure which to choose :)

Are you sarkastic? Threatening? Talking about technical rule which generally breaks sync? Is my reporters quality questioned? Or is my personal quality questioned? Are you telling me how error reporting and desync reporting works?

To make it clear from my side: I was no removing any error from infolog, I was just committed replay, infologs from two computers which were part of the game and Im wondering how is possible that sync-errors are not the same. Thats all.

> 2) It's fine that that's what you believe, I disagree.

So for you it makes sense that one player with shity widget can break game for other 20 players in game? Whats the point of having unsynced stuff then?

~0016905

PepeAmpere (reporter)

* not sync errors => LuaRules errors are not the same

~0016906

hokomoko (developer)

It wasn't aimed at you specifically at all, I'm sorry if this was implied.
Obviously you didn't remove them from your infolog because we know they were only visible this time on kmar's machine.

When I read "Having lua errors in the infolog voids your desync warranty" I immediately feared that someone in the future who's aware of this rule will have the idea to "just remove this single line from infolog.txt because it doesn't really matter" which will waste hours of my life.

So to deter said someone from acting on this idea, I've detailed the consequences.

> So for you it makes sense that one player with shity widget can break game for other 20 players in game? Whats the point of having unsynced stuff then?

If I were a game developer I'd make it clear that user widgets will void all warranty for the player.

If a player with a sub-par computer can't run the game in passable speed and always drops out once unit count is above 100, he will break every game he plays in for the other 20 player as well.

Also, if a player exits games prematurely, teamkills or does loads of other stupid crap the obvious solution is to not play with him unless he plays like a normal human being.

IMO, playing with faulty widgets is similar - players should know that if they ruin the game for everyone, no one will play with them.


Regarding how this could happen:
There's a counter of how many times you called GetUnitCommands with no second argument. If it passes a certain number, it results in an error.
The counter can be different on different machines because it counts calls from both unsynced and synced.
Your game has at least one place in the code with this issue (where the LuaRules error happened), and so did kmar's widgets most probably, which made it happen earlier on his machine than it did on anyone else's.

~0016907

PepeAmpere (reporter)

Ok, thx for clarification.

> Regarding how this could happen:

Theory you describe now seems to be same as the one i described above in 12th comment. So its gadget spamming + kmar desync more is probable because hes also spamming via widgets.

So it invalidates:

> P.S. your theory is incorrect.

Ok, clean for me now,

I will remove the gadget and use engine reverse and we will see if the desync will be reproduced again.

This can be closed until we provide repro without errors.

I think the limits you mention should be documented somewhere on the wiki next to API related to commands.

~0016908

hokomoko (developer)

"spamcommands requests (read or write queue). In result only on his machine some commands are dropped or not sent to other players which results in desync."
You'll have to squint really hard to make this mean what the issue really is, but that's not very important.


> I think the limits you mention should be documented somewhere on the wiki next to API related to commands.

It's right there:
https://springrts.com/wiki/Lua_SyncedRead#CommandQueues
+Notes

-Issue History
Date Modified Username Field Change
2016-11-20 03:11 PepeAmpere New Issue
2016-11-20 03:11 PepeAmpere File Added: 20161119_213505_Archers_Valley_v6_103.sdfz
2016-11-21 15:01 hokomoko Assigned To => hokomoko
2016-11-21 15:01 hokomoko Status new => feedback
2016-11-21 15:01 hokomoko Note Added: 0016883
2016-11-21 17:48 PepeAmpere Note Added: 0016884
2016-11-21 17:48 PepeAmpere Status feedback => assigned
2016-11-21 17:51 hokomoko Note Added: 0016885
2016-11-21 17:53 hokomoko Note Added: 0016886
2016-11-21 19:54 PepeAmpere File Added: 20161121_191425_Tumult_103.sdfz
2016-11-21 19:58 PepeAmpere File Added: infolog.txt
2016-11-21 19:58 PepeAmpere Note Added: 0016887
2016-11-21 20:15 PepeAmpere Note Added: 0016888
2016-11-21 20:15 PepeAmpere Note Added: 0016889
2016-11-21 20:15 hokomoko Note Added: 0016890
2016-11-21 20:17 PepeAmpere Note Added: 0016891
2016-11-21 20:29 hokomoko Note Added: 0016892
2016-11-21 20:33 hokomoko Status assigned => closed
2016-11-21 20:33 hokomoko Resolution open => no change required
2016-11-21 20:33 hokomoko Note Added: 0016893
2016-11-21 20:33 hokomoko Note Edited: 0016893 View Revisions
2016-11-21 20:34 hokomoko Note Edited: 0016893 View Revisions
2016-11-21 20:49 PepeAmpere Status closed => feedback
2016-11-21 20:49 PepeAmpere Resolution no change required => reopened
2016-11-21 20:49 PepeAmpere Note Added: 0016894
2016-11-21 20:53 hokomoko Note Added: 0016895
2016-11-21 20:55 hokomoko Note Edited: 0016895 View Revisions
2016-11-21 22:10 StickyStains Note Added: 0016896
2016-11-21 22:11 hokomoko Note Added: 0016897
2016-11-21 22:13 hokomoko Note Edited: 0016897 View Revisions
2016-11-22 03:17 Forboding Angel Note Added: 0016898
2016-11-22 03:19 Forboding Angel Note Added: 0016899
2016-11-22 03:24 hokomoko Note Added: 0016900
2016-11-22 14:14 PepeAmpere Note Added: 0016901
2016-11-22 14:14 PepeAmpere Status feedback => assigned
2016-11-22 14:17 hokomoko Note Added: 0016902
2016-11-22 15:01 hokomoko Note Added: 0016903
2016-11-22 18:27 PepeAmpere Note Added: 0016904
2016-11-22 18:28 PepeAmpere Note Added: 0016905
2016-11-22 19:06 hokomoko Note Added: 0016906
2016-11-22 19:23 PepeAmpere Note Added: 0016907
2016-11-22 19:28 hokomoko Note Added: 0016908
2016-11-22 19:28 hokomoko Status assigned => closed
2016-11-22 19:28 hokomoko Resolution reopened => no change required
+Issue History