View Issue Details

IDProjectCategoryView StatusLast Update
0005398Spring engineGeneralpublic2016-11-22 19:28
ReporterPepeAmpere Assigned Tohokomoko  
PrioritynormalSeverityminorReproducibilitysometimes
Status closedResolutionno change required 
Product Version103.0 
Summary0005398: Desync after taking units
DescriptionWatch replay attached. Observe that there was desync once one player tried to take units of other player.

more info and replay file: https://trello.com/c/HeAvDetv/

[f=0038455] <kmar> Allies: I took the abandoned units.
[f=0038455] kmar unpaused the game
[f=0039772] Sync error for fail0r in frame 39763 (got da36fef5, correct is 5f412aaf)
[f=0039789] Error: [DESYNC WARNING] checksum d75b7c35 from demo player 0 (kmar) does not match our checksum 84b0ad4c for frame-number 39780
Steps To Reproduce1) Watch replay
2) Observe desync (only in infolog)
TagsNo tags attached.
Attached Files
20161119_213505_Archers_Valley_v6_103.sdfz (Attachment missing)
20161121_191425_Tumult_103.sdfz (Attachment missing)
infolog.txt (Attachment missing)
Checked infolog.txt for Errors

Activities

hokomoko

2016-11-21 15:01

developer   ~0016883

There's a known desync in 103.0 for win vs. linux.
If that's the case here it's already fixed

PepeAmpere

2016-11-21 17:48

reporter   ~0016884

All players playing that match were on windows machines and also our host is using spring-dedicated.exe (windows machine)

hokomoko

2016-11-21 17:51

developer   ~0016885

kmar uses linux as far as I know

hokomoko

2016-11-21 17:53

developer   ~0016886

also, infologs are missing from the report

PepeAmpere

2016-11-21 19:58

reporter   ~0016887

Really, all players are running windows instances.

Attaching another replay + infolog

PepeAmpere

2016-11-21 20:15

reporter   ~0016888

```
[f=0032158] <pow> this hil?
```

and

```
[f=0032163] Sync error for kmar in frame 32159 (got 7b5f9d00, correct is b5388941)
```

So desync happens one frame after unpause so "catching-up" was not successfull

PepeAmpere

2016-11-21 20:15

reporter   ~0016889

```
[f=0032158] pow unpaused the game
```

and

```
[f=0032163] Sync error for kmar in frame 32159 (got 7b5f9d00, correct is b5388941)
```

sorry, bad cite

hokomoko

2016-11-21 20:15

developer   ~0016890

catching up is irrelevant.

PepeAmpere

2016-11-21 20:17

reporter   ~0016891

second infolog from the same game from unsynced player: https://trello-attachments.s3.amazonaws.com/54061fc793bf591aee00f4d2/582ee7e9037ec07512358d03/0b6485ca14ac3914efd8a8120529541f/infolog.txt

hokomoko

2016-11-21 20:29

developer   ~0016892

it is full of lua errors

hokomoko

2016-11-21 20:33

developer   ~0016893

Last edited: 2016-11-21 20:34

Please check all infologs in advance before reporting a desync as this could save me from compiling a debug build locally. (my folder is usually on latest spring and not on 103.0)
However, since I did compile said build, I can report the following errors:
[f=-000001] Chili-Error in `Chili Nota UI`:progressbar7 : [string "LuaUI/Widgets/chili/Headers/skinutils.lua"]:584: bad argument 0000005 to 'ClipPlane' (number expected, got NAN (check your code for div0))
[f=-000001] stacktrace:
    [C]: in ClipPlane
    [string "LuaUI/Widgets/chili/Headers/skinutils.lua"]:584
    (tail call): in [?]
    [string "LuaUI/Widgets/chili/Controls/control.lua"]:677: in _UpdateOwnDList
    ...
    [string "-------------------------------------------..."]:79
    [string "LuaUI/Widgets/api_chili.lua"]:89


[f=-000001] [widgets.lua] Error: false
[f=-000001] [widgets.lua] Error: Error in SelectionChanged(): [string "LuaUI/Widgets/gui_select_army_only.lua"]:85: attempt to index local 'ud' (a nil value)
[f=-000001] [widgets.lua] Error: Removed widget: Select army only


(note that these are not the ones present in the infolog you have posted above)

PepeAmpere

2016-11-21 20:49

reporter   ~0016894

Hokomoko - I do not understand your response.

1) This issue was reported as issue of the 103.0 product version (as you can see in header)
2) I do not understand your resolution - if I understand the system correctly, no widget error should cause desync of the simulation because it is unsynced stuff.

Btw, my theory was this:

I observe some errors on kmars machine which are not on my machine. I guess it is connected with perf-warrning messages, which appeared only on kmars machine, because he simulates orders of other players as usual + using some own micro-widgets which may spam commands requests (read or write queue). In result only on his machine some commands are dropped or not sent to other players which results in desync.

Anyway i think this is worth of solving on side of the engine. Engine should report same error in all synced scripts on all machines.

hokomoko

2016-11-21 20:53

developer   ~0016895

Last edited: 2016-11-21 20:55

[f=0028857] Error: LuaRules::RunCallIn: error = 2, GameFrame, [Internal Lua error: Call failure] [string "LuaRules/Gadgets/unit_reverse.lua"]:201: [GetUnitCommands] called too often without a 2nd argument to define maxNumCmds returned in the table, please check your code!
Especially when you only read the first cmd or want to check if the queue is non-empty, this can be a huge performance leak!

stack traceback:
    [C]: in function 'spGetUnitCommands'
    [string "LuaRules/Gadgets/unit_reverse.lua"]:201: in function 'GameFrame'
    [string "LuaRules/gadgets.lua"]:931: in function <[string "LuaRules/gadgets.lua"]:929>
    (tail call): ?


LuaRules => Synced
Fix your code please, the message is very informative.
Also there are NaN errors.

It's nice that you say widgets shouldn't affect sync but that is not the situation.
If you give faulty input to the engine (anywhere) it can cause desyncs.
I will not investigate any desync which had lua errors in the infolog.

P.S. your theory is incorrect.

StickyStains

2016-11-21 22:10

reporter   ~0016896

The problem with checking all input for desync-correctness is performance?
Or codebase inflation?
Just curious.

hokomoko

2016-11-21 22:11

developer   ~0016897

Last edited: 2016-11-21 22:13

performance

EDIT: and also a bit of principle. I can't treat all lua devs like bumbling fools who can't make sure what they send to the engine is correct. Content devs have responsibility too.

Forboding Angel

2016-11-22 03:17

reporter   ~0016898

The specific error that you're speaking of, hoko, is a widget, which as you know is unsynced.

Why would a widget crashing cause a sync error unless a poorly contrived gadget (In other words, lua created without error catching, which seems to be a common theme among spring lua devs) were depending upon info sent across unsynced<->synced?

Forboding Angel

2016-11-22 03:19

reporter   ~0016899

In case it isn't clear, None of the above is sarcasm. It is an actual question and I would very much like to know the answer because it's illuminating wrt how the engine functions.

hokomoko

2016-11-22 03:24

developer   ~0016900

look 3 messages above you.
It contains the word 'gadgets' 4 times and the word 'LuaRules' 5 times. So it's definitely not a widget.

How can unsynced affect synced? Let's say you give an order for a unit to move to {NaN, NaN, NaN}, this order may go straight to the network and then distributed between all players, entering the synced part of the engine.

PepeAmpere

2016-11-22 14:14

reporter   ~0016901

Calm down hoko.

Read your comments above. In first comment you were citing only Widget error, not Gadget error, that why I was curious.

I agree that gadget error may be reason to refuse to investigate on your side, but the problem is that error happens only on the unsynced machine, noone else have this in infolog.

And I believe engine should be robust enough so no widget will kill the sync of game. That's my user understanding of thing and I cannot imagine we could agree on something else.

Your example with {NaN, NaN, NaN} - this should not been accepted by anyone in network, i do not think so this is something which is technically impossible to archive and if there is such problem, it is regular bug.

hokomoko

2016-11-22 14:17

developer   ~0016902

1) It's a shame you didn't read the comment you're referring to. I first said there are errors in the infolog, and only then reported you another error.
2) It's fine that that's what you believe, I disagree.

hokomoko

2016-11-22 15:01

developer   ~0016903

Let me state this clearly:
Having lua errors in the infolog voids your desync warranty for that specific match.

And before anyone has silly ideas:
Removing lua errors from an infolog before reporting voids your desync warranty for life.

PepeAmpere

2016-11-22 18:27

reporter   ~0016904

> Having lua errors in the infolog voids your desync warranty for that specific match.

understand

> Removing lua errors from an infolog before reporting voids your desync warranty for life.

no idea what you talk about :) I have maybe 5 translations of this sentence in my mind and Im still not sure which to choose :)

Are you sarkastic? Threatening? Talking about technical rule which generally breaks sync? Is my reporters quality questioned? Or is my personal quality questioned? Are you telling me how error reporting and desync reporting works?

To make it clear from my side: I was no removing any error from infolog, I was just committed replay, infologs from two computers which were part of the game and Im wondering how is possible that sync-errors are not the same. Thats all.

> 2) It's fine that that's what you believe, I disagree.

So for you it makes sense that one player with shity widget can break game for other 20 players in game? Whats the point of having unsynced stuff then?

PepeAmpere

2016-11-22 18:28

reporter   ~0016905

* not sync errors => LuaRules errors are not the same

hokomoko

2016-11-22 19:06

developer   ~0016906

It wasn't aimed at you specifically at all, I'm sorry if this was implied.
Obviously you didn't remove them from your infolog because we know they were only visible this time on kmar's machine.

When I read "Having lua errors in the infolog voids your desync warranty" I immediately feared that someone in the future who's aware of this rule will have the idea to "just remove this single line from infolog.txt because it doesn't really matter" which will waste hours of my life.

So to deter said someone from acting on this idea, I've detailed the consequences.

> So for you it makes sense that one player with shity widget can break game for other 20 players in game? Whats the point of having unsynced stuff then?

If I were a game developer I'd make it clear that user widgets will void all warranty for the player.

If a player with a sub-par computer can't run the game in passable speed and always drops out once unit count is above 100, he will break every game he plays in for the other 20 player as well.

Also, if a player exits games prematurely, teamkills or does loads of other stupid crap the obvious solution is to not play with him unless he plays like a normal human being.

IMO, playing with faulty widgets is similar - players should know that if they ruin the game for everyone, no one will play with them.


Regarding how this could happen:
There's a counter of how many times you called GetUnitCommands with no second argument. If it passes a certain number, it results in an error.
The counter can be different on different machines because it counts calls from both unsynced and synced.
Your game has at least one place in the code with this issue (where the LuaRules error happened), and so did kmar's widgets most probably, which made it happen earlier on his machine than it did on anyone else's.

PepeAmpere

2016-11-22 19:23

reporter   ~0016907

Ok, thx for clarification.

> Regarding how this could happen:

Theory you describe now seems to be same as the one i described above in 12th comment. So its gadget spamming + kmar desync more is probable because hes also spamming via widgets.

So it invalidates:

> P.S. your theory is incorrect.

Ok, clean for me now,

I will remove the gadget and use engine reverse and we will see if the desync will be reproduced again.

This can be closed until we provide repro without errors.

I think the limits you mention should be documented somewhere on the wiki next to API related to commands.

hokomoko

2016-11-22 19:28

developer   ~0016908

"spamcommands requests (read or write queue). In result only on his machine some commands are dropped or not sent to other players which results in desync."
You'll have to squint really hard to make this mean what the issue really is, but that's not very important.


> I think the limits you mention should be documented somewhere on the wiki next to API related to commands.

It's right there:
https://springrts.com/wiki/Lua_SyncedRead#CommandQueues

Issue History

Date Modified Username Field Change
2016-11-20 03:11 PepeAmpere New Issue
2016-11-20 03:11 PepeAmpere File Added: 20161119_213505_Archers_Valley_v6_103.sdfz
2016-11-21 15:01 hokomoko Assigned To => hokomoko
2016-11-21 15:01 hokomoko Status new => feedback
2016-11-21 15:01 hokomoko Note Added: 0016883
2016-11-21 17:48 PepeAmpere Note Added: 0016884
2016-11-21 17:48 PepeAmpere Status feedback => assigned
2016-11-21 17:51 hokomoko Note Added: 0016885
2016-11-21 17:53 hokomoko Note Added: 0016886
2016-11-21 19:54 PepeAmpere File Added: 20161121_191425_Tumult_103.sdfz
2016-11-21 19:58 PepeAmpere File Added: infolog.txt
2016-11-21 19:58 PepeAmpere Note Added: 0016887
2016-11-21 20:15 PepeAmpere Note Added: 0016888
2016-11-21 20:15 PepeAmpere Note Added: 0016889
2016-11-21 20:15 hokomoko Note Added: 0016890
2016-11-21 20:17 PepeAmpere Note Added: 0016891
2016-11-21 20:29 hokomoko Note Added: 0016892
2016-11-21 20:33 hokomoko Status assigned => closed
2016-11-21 20:33 hokomoko Resolution open => no change required
2016-11-21 20:33 hokomoko Note Added: 0016893
2016-11-21 20:33 hokomoko Note Edited: 0016893
2016-11-21 20:34 hokomoko Note Edited: 0016893
2016-11-21 20:49 PepeAmpere Status closed => feedback
2016-11-21 20:49 PepeAmpere Resolution no change required => reopened
2016-11-21 20:49 PepeAmpere Note Added: 0016894
2016-11-21 20:53 hokomoko Note Added: 0016895
2016-11-21 20:55 hokomoko Note Edited: 0016895
2016-11-21 22:10 StickyStains Note Added: 0016896
2016-11-21 22:11 hokomoko Note Added: 0016897
2016-11-21 22:13 hokomoko Note Edited: 0016897
2016-11-22 03:17 Forboding Angel Note Added: 0016898
2016-11-22 03:19 Forboding Angel Note Added: 0016899
2016-11-22 03:24 hokomoko Note Added: 0016900
2016-11-22 14:14 PepeAmpere Note Added: 0016901
2016-11-22 14:14 PepeAmpere Status feedback => assigned
2016-11-22 14:17 hokomoko Note Added: 0016902
2016-11-22 15:01 hokomoko Note Added: 0016903
2016-11-22 18:27 PepeAmpere Note Added: 0016904
2016-11-22 18:28 PepeAmpere Note Added: 0016905
2016-11-22 19:06 hokomoko Note Added: 0016906
2016-11-22 19:23 PepeAmpere Note Added: 0016907
2016-11-22 19:28 hokomoko Note Added: 0016908
2016-11-22 19:28 hokomoko Status assigned => closed
2016-11-22 19:28 hokomoko Resolution reopened => no change required