View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||
---|---|---|---|---|---|---|---|---|---|
0005398 | Spring engine | General | public | 2016-11-20 03:11 | 2016-11-22 19:28 | ||||
Reporter | PepeAmpere | ||||||||
Assigned To | hokomoko | ||||||||
Priority | normal | Severity | minor | Reproducibility | sometimes | ||||
Status | closed | Resolution | no change required | ||||||
Product Version | 103.0 | ||||||||
Target Version | Fixed in Version | ||||||||
Summary | 0005398: Desync after taking units | ||||||||
Description | Watch replay attached. Observe that there was desync once one player tried to take units of other player. more info and replay file: https://trello.com/c/HeAvDetv/ [f=0038455] <kmar> Allies: I took the abandoned units. [f=0038455] kmar unpaused the game [f=0039772] Sync error for fail0r in frame 39763 (got da36fef5, correct is 5f412aaf) [f=0039789] Error: [DESYNC WARNING] checksum d75b7c35 from demo player 0 (kmar) does not match our checksum 84b0ad4c for frame-number 39780 | ||||||||
Steps To Reproduce | 1) Watch replay 2) Observe desync (only in infolog) | ||||||||
Tags | No tags attached. | ||||||||
Checked infolog.txt for Errors | |||||||||
Attached Files |
|
![]() |
|
hokomoko (developer) 2016-11-21 15:01 |
There's a known desync in 103.0 for win vs. linux. If that's the case here it's already fixed |
PepeAmpere (reporter) 2016-11-21 17:48 |
All players playing that match were on windows machines and also our host is using spring-dedicated.exe (windows machine) |
hokomoko (developer) 2016-11-21 17:51 |
kmar uses linux as far as I know |
hokomoko (developer) 2016-11-21 17:53 |
also, infologs are missing from the report |
PepeAmpere (reporter) 2016-11-21 19:58 |
Really, all players are running windows instances. Attaching another replay + infolog |
PepeAmpere (reporter) 2016-11-21 20:15 |
``` [f=0032158] <pow> this hil? ``` and ``` [f=0032163] Sync error for kmar in frame 32159 (got 7b5f9d00, correct is b5388941) ``` So desync happens one frame after unpause so "catching-up" was not successfull |
PepeAmpere (reporter) 2016-11-21 20:15 |
``` [f=0032158] pow unpaused the game ``` and ``` [f=0032163] Sync error for kmar in frame 32159 (got 7b5f9d00, correct is b5388941) ``` sorry, bad cite |
hokomoko (developer) 2016-11-21 20:15 |
catching up is irrelevant. |
PepeAmpere (reporter) 2016-11-21 20:17 |
second infolog from the same game from unsynced player: https://trello-attachments.s3.amazonaws.com/54061fc793bf591aee00f4d2/582ee7e9037ec07512358d03/0b6485ca14ac3914efd8a8120529541f/infolog.txt |
hokomoko (developer) 2016-11-21 20:29 |
it is full of lua errors |
hokomoko (developer) 2016-11-21 20:33 Last edited: 2016-11-21 20:34 |
Please check all infologs in advance before reporting a desync as this could save me from compiling a debug build locally. (my folder is usually on latest spring and not on 103.0) However, since I did compile said build, I can report the following errors: [f=-000001] Chili-Error in `Chili Nota UI`:progressbar7 : [string "LuaUI/Widgets/chili/Headers/skinutils.lua"]:584: bad argument 0000005 to 'ClipPlane' (number expected, got NAN (check your code for div0)) [f=-000001] stacktrace: [C]: in ClipPlane [string "LuaUI/Widgets/chili/Headers/skinutils.lua"]:584 (tail call): in [?] [string "LuaUI/Widgets/chili/Controls/control.lua"]:677: in _UpdateOwnDList ... [string "-------------------------------------------..."]:79 [string "LuaUI/Widgets/api_chili.lua"]:89 [f=-000001] [widgets.lua] Error: false [f=-000001] [widgets.lua] Error: Error in SelectionChanged(): [string "LuaUI/Widgets/gui_select_army_only.lua"]:85: attempt to index local 'ud' (a nil value) [f=-000001] [widgets.lua] Error: Removed widget: Select army only (note that these are not the ones present in the infolog you have posted above) |
PepeAmpere (reporter) 2016-11-21 20:49 |
Hokomoko - I do not understand your response. 1) This issue was reported as issue of the 103.0 product version (as you can see in header) 2) I do not understand your resolution - if I understand the system correctly, no widget error should cause desync of the simulation because it is unsynced stuff. Btw, my theory was this: I observe some errors on kmars machine which are not on my machine. I guess it is connected with perf-warrning messages, which appeared only on kmars machine, because he simulates orders of other players as usual + using some own micro-widgets which may spam commands requests (read or write queue). In result only on his machine some commands are dropped or not sent to other players which results in desync. Anyway i think this is worth of solving on side of the engine. Engine should report same error in all synced scripts on all machines. |
hokomoko (developer) 2016-11-21 20:53 Last edited: 2016-11-21 20:55 |
[f=0028857] Error: LuaRules::RunCallIn: error = 2, GameFrame, [Internal Lua error: Call failure] [string "LuaRules/Gadgets/unit_reverse.lua"]:201: [GetUnitCommands] called too often without a 2nd argument to define maxNumCmds returned in the table, please check your code! Especially when you only read the first cmd or want to check if the queue is non-empty, this can be a huge performance leak! stack traceback: [C]: in function 'spGetUnitCommands' [string "LuaRules/Gadgets/unit_reverse.lua"]:201: in function 'GameFrame' [string "LuaRules/gadgets.lua"]:931: in function <[string "LuaRules/gadgets.lua"]:929> (tail call): ? LuaRules => Synced Fix your code please, the message is very informative. Also there are NaN errors. It's nice that you say widgets shouldn't affect sync but that is not the situation. If you give faulty input to the engine (anywhere) it can cause desyncs. I will not investigate any desync which had lua errors in the infolog. P.S. your theory is incorrect. |
StickyStains (reporter) 2016-11-21 22:10 |
The problem with checking all input for desync-correctness is performance? Or codebase inflation? Just curious. |
hokomoko (developer) 2016-11-21 22:11 Last edited: 2016-11-21 22:13 |
performance EDIT: and also a bit of principle. I can't treat all lua devs like bumbling fools who can't make sure what they send to the engine is correct. Content devs have responsibility too. |
Forboding Angel (reporter) 2016-11-22 03:17 |
The specific error that you're speaking of, hoko, is a widget, which as you know is unsynced. Why would a widget crashing cause a sync error unless a poorly contrived gadget (In other words, lua created without error catching, which seems to be a common theme among spring lua devs) were depending upon info sent across unsynced<->synced? |
Forboding Angel (reporter) 2016-11-22 03:19 |
In case it isn't clear, None of the above is sarcasm. It is an actual question and I would very much like to know the answer because it's illuminating wrt how the engine functions. |
hokomoko (developer) 2016-11-22 03:24 |
look 3 messages above you. It contains the word 'gadgets' 4 times and the word 'LuaRules' 5 times. So it's definitely not a widget. How can unsynced affect synced? Let's say you give an order for a unit to move to {NaN, NaN, NaN}, this order may go straight to the network and then distributed between all players, entering the synced part of the engine. |
PepeAmpere (reporter) 2016-11-22 14:14 |
Calm down hoko. Read your comments above. In first comment you were citing only Widget error, not Gadget error, that why I was curious. I agree that gadget error may be reason to refuse to investigate on your side, but the problem is that error happens only on the unsynced machine, noone else have this in infolog. And I believe engine should be robust enough so no widget will kill the sync of game. That's my user understanding of thing and I cannot imagine we could agree on something else. Your example with {NaN, NaN, NaN} - this should not been accepted by anyone in network, i do not think so this is something which is technically impossible to archive and if there is such problem, it is regular bug. |
hokomoko (developer) 2016-11-22 14:17 |
1) It's a shame you didn't read the comment you're referring to. I first said there are errors in the infolog, and only then reported you another error. 2) It's fine that that's what you believe, I disagree. |
hokomoko (developer) 2016-11-22 15:01 |
Let me state this clearly: Having lua errors in the infolog voids your desync warranty for that specific match. And before anyone has silly ideas: Removing lua errors from an infolog before reporting voids your desync warranty for life. |
PepeAmpere (reporter) 2016-11-22 18:27 |
> Having lua errors in the infolog voids your desync warranty for that specific match. understand > Removing lua errors from an infolog before reporting voids your desync warranty for life. no idea what you talk about :) I have maybe 5 translations of this sentence in my mind and Im still not sure which to choose :) Are you sarkastic? Threatening? Talking about technical rule which generally breaks sync? Is my reporters quality questioned? Or is my personal quality questioned? Are you telling me how error reporting and desync reporting works? To make it clear from my side: I was no removing any error from infolog, I was just committed replay, infologs from two computers which were part of the game and Im wondering how is possible that sync-errors are not the same. Thats all. > 2) It's fine that that's what you believe, I disagree. So for you it makes sense that one player with shity widget can break game for other 20 players in game? Whats the point of having unsynced stuff then? |
PepeAmpere (reporter) 2016-11-22 18:28 |
* not sync errors => LuaRules errors are not the same |
hokomoko (developer) 2016-11-22 19:06 |
It wasn't aimed at you specifically at all, I'm sorry if this was implied. Obviously you didn't remove them from your infolog because we know they were only visible this time on kmar's machine. When I read "Having lua errors in the infolog voids your desync warranty" I immediately feared that someone in the future who's aware of this rule will have the idea to "just remove this single line from infolog.txt because it doesn't really matter" which will waste hours of my life. So to deter said someone from acting on this idea, I've detailed the consequences. > So for you it makes sense that one player with shity widget can break game for other 20 players in game? Whats the point of having unsynced stuff then? If I were a game developer I'd make it clear that user widgets will void all warranty for the player. If a player with a sub-par computer can't run the game in passable speed and always drops out once unit count is above 100, he will break every game he plays in for the other 20 player as well. Also, if a player exits games prematurely, teamkills or does loads of other stupid crap the obvious solution is to not play with him unless he plays like a normal human being. IMO, playing with faulty widgets is similar - players should know that if they ruin the game for everyone, no one will play with them. Regarding how this could happen: There's a counter of how many times you called GetUnitCommands with no second argument. If it passes a certain number, it results in an error. The counter can be different on different machines because it counts calls from both unsynced and synced. Your game has at least one place in the code with this issue (where the LuaRules error happened), and so did kmar's widgets most probably, which made it happen earlier on his machine than it did on anyone else's. |
PepeAmpere (reporter) 2016-11-22 19:23 |
Ok, thx for clarification. > Regarding how this could happen: Theory you describe now seems to be same as the one i described above in 12th comment. So its gadget spamming + kmar desync more is probable because hes also spamming via widgets. So it invalidates: > P.S. your theory is incorrect. Ok, clean for me now, I will remove the gadget and use engine reverse and we will see if the desync will be reproduced again. This can be closed until we provide repro without errors. I think the limits you mention should be documented somewhere on the wiki next to API related to commands. |
hokomoko (developer) 2016-11-22 19:28 |
"spamcommands requests (read or write queue). In result only on his machine some commands are dropped or not sent to other players which results in desync." You'll have to squint really hard to make this mean what the issue really is, but that's not very important. > I think the limits you mention should be documented somewhere on the wiki next to API related to commands. It's right there: https://springrts.com/wiki/Lua_SyncedRead#CommandQueues |
![]() |
|||
Date Modified | Username | Field | Change |
---|---|---|---|
2016-11-20 03:11 | PepeAmpere | New Issue | |
2016-11-20 03:11 | PepeAmpere | File Added: 20161119_213505_Archers_Valley_v6_103.sdfz | |
2016-11-21 15:01 | hokomoko | Assigned To | => hokomoko |
2016-11-21 15:01 | hokomoko | Status | new => feedback |
2016-11-21 15:01 | hokomoko | Note Added: 0016883 | |
2016-11-21 17:48 | PepeAmpere | Note Added: 0016884 | |
2016-11-21 17:48 | PepeAmpere | Status | feedback => assigned |
2016-11-21 17:51 | hokomoko | Note Added: 0016885 | |
2016-11-21 17:53 | hokomoko | Note Added: 0016886 | |
2016-11-21 19:54 | PepeAmpere | File Added: 20161121_191425_Tumult_103.sdfz | |
2016-11-21 19:58 | PepeAmpere | File Added: infolog.txt | |
2016-11-21 19:58 | PepeAmpere | Note Added: 0016887 | |
2016-11-21 20:15 | PepeAmpere | Note Added: 0016888 | |
2016-11-21 20:15 | PepeAmpere | Note Added: 0016889 | |
2016-11-21 20:15 | hokomoko | Note Added: 0016890 | |
2016-11-21 20:17 | PepeAmpere | Note Added: 0016891 | |
2016-11-21 20:29 | hokomoko | Note Added: 0016892 | |
2016-11-21 20:33 | hokomoko | Status | assigned => closed |
2016-11-21 20:33 | hokomoko | Resolution | open => no change required |
2016-11-21 20:33 | hokomoko | Note Added: 0016893 | |
2016-11-21 20:33 | hokomoko | Note Edited: 0016893 | View Revisions |
2016-11-21 20:34 | hokomoko | Note Edited: 0016893 | View Revisions |
2016-11-21 20:49 | PepeAmpere | Status | closed => feedback |
2016-11-21 20:49 | PepeAmpere | Resolution | no change required => reopened |
2016-11-21 20:49 | PepeAmpere | Note Added: 0016894 | |
2016-11-21 20:53 | hokomoko | Note Added: 0016895 | |
2016-11-21 20:55 | hokomoko | Note Edited: 0016895 | View Revisions |
2016-11-21 22:10 | StickyStains | Note Added: 0016896 | |
2016-11-21 22:11 | hokomoko | Note Added: 0016897 | |
2016-11-21 22:13 | hokomoko | Note Edited: 0016897 | View Revisions |
2016-11-22 03:17 | Forboding Angel | Note Added: 0016898 | |
2016-11-22 03:19 | Forboding Angel | Note Added: 0016899 | |
2016-11-22 03:24 | hokomoko | Note Added: 0016900 | |
2016-11-22 14:14 | PepeAmpere | Note Added: 0016901 | |
2016-11-22 14:14 | PepeAmpere | Status | feedback => assigned |
2016-11-22 14:17 | hokomoko | Note Added: 0016902 | |
2016-11-22 15:01 | hokomoko | Note Added: 0016903 | |
2016-11-22 18:27 | PepeAmpere | Note Added: 0016904 | |
2016-11-22 18:28 | PepeAmpere | Note Added: 0016905 | |
2016-11-22 19:06 | hokomoko | Note Added: 0016906 | |
2016-11-22 19:23 | PepeAmpere | Note Added: 0016907 | |
2016-11-22 19:28 | hokomoko | Note Added: 0016908 | |
2016-11-22 19:28 | hokomoko | Status | assigned => closed |
2016-11-22 19:28 | hokomoko | Resolution | reopened => no change required |