View Issue Details

IDProjectCategoryView StatusLast Update
0004833Spring engineGeneralpublic2015-06-25 17:05
ReporterGoogle_Frog Assigned Tohokomoko  
PriorityhighSeveritymajorReproducibilityrandom
Status resolvedResolutionfixed 
Product Version98.0.1+git 
Target Version99.0 
Summary0004833: 98.0.1-868 Desync
DescriptionA desync occured in a 1v1 with ZK v1.3.6.5. I have the replays and infologs of both players.
TagsNo tags attached.
Attached Files
Desync_Svatopluk_Hide_and_Seek_v03_98.0.1-868-g6128051 develop.sdf (Attachment missing)
Desync_GoogleFrog_Hide_and_Seek_v03_98.0.1-868-g6128051 develop.sdf (Attachment missing)
desync_svatopluk.txt (Attachment missing)
desync_googlefrog.txt (Attachment missing)
Desync ZK_1.3.6.6_DeltaSiegeDry Deluxe V3_98.0.1-876-g6fbb06e develop.sdf (Attachment missing)
desync_98.0.1-876_infolog.txt (Attachment missing)
desync_debug_infolog.txt (Attachment missing)
Checked infolog.txt for Errors

Activities

abma

2015-06-22 18:19

administrator   ~0014680

Last edited: 2015-06-22 18:19

very likely the lua gadget error:
Error: LuaRules::RunCallIn: error = 2, UnitLoaded, [Internal Lua error: Call failure] [string "LuaRules/Gadgets/unit_conditional_transport..."]:88: GiveOrderToUnit() recursion is not permitted
stack traceback:

(imo)

abma

2015-06-22 18:36

administrator   ~0014683

Last edited: 2015-06-22 18:37

not sure if useful:

[f=0048956] [~CPathCache(28x28)] cacheHits=0 hitPercentage=0% numHashColls=0 maxCacheSize=0
[f=0048956] [~CPathCache(28x28)] cacheHits=2413 hitPercentage=39% numHashColls=0 maxCacheSize=74
[f=0048956] [~CPathCache(112x112)] cacheHits=0 hitPercentage=0% numHashColls=0 maxCacheSize=0
[f=0048956] [~CPathCache(112x112)] cacheHits=4267 hitPercentage=0% numHashColls=0 maxCacheSize=201
[f=0048956] Statistics for RectangleOptimizer: 2%
[f=0048956] Statistics for RectangleOptimizer: 2%

vs

[f=0048956] [~CPathCache(28x28)] cacheHits=2402 hitPercentage=39% numHashColls=0 maxCacheSize=74
[f=0048956] [~CPathCache(112x112)] cacheHits=0 hitPercentage=0% numHashColls=0 maxCacheSize=0
[f=0048956] [~CPathCache(112x112)] cacheHits=4025 hitPercentage=0% numHashColls=0 maxCacheSize=201
[f=0048956] Statistics for RectangleOptimizer: 4%
[f=0048956] Statistics for RectangleOptimizer: 4%


exit was at same frame, so shouldn't be the lines identical? (if not desynced)

else i don't see any unusal in infolog.txt


@GoogleFrog:

where do these lines in your infolog.txt come from?

[f=0000000] Loaded local team color config.
[f=0000000] Loading last game win data
[f=0000000] Resetting win data
[f=0000000] Last game player count: 2, This game player count: 2
[f=0000000] Last game allyTeam count: 3, This game allyTeam count: 3
[f=0000000] Player and team counts match, continuing
[f=0000000] Testing last game's team 1 mapped to this game's team 0
[f=0000000] Testing last game's team 0 mapped to this game's team 1
[f=0000000] All players and teams match from last game, using last game's scores as base


maybe this additional widget triggered the desync? they are not in svatopluk's infolog.txt

Google_Frog

2015-06-22 19:25

reporter   ~0014684

How would a gadget error such as that cause desync? The error handlers within the lua interface should deal with that sort of thing without breaking.

I don't know whether we were desynced with each other. There was a lot of spectators too.

Those lines are just the win counter widget. It's a widget.

abma

2015-06-22 19:33

administrator   ~0014685

"should"... fixing a gadget vs catching a real desync.

abma

2015-06-22 19:36

administrator   ~0014686

Last edited: 2015-06-22 19:37

afaik its not confirmed that gadget (lua) errors can cause desyncs, but gadgets can cause desyncs when written badly.

it should not... but i don't want to find out as it would take a lot of time imo.

abma

2015-06-23 00:02

administrator   ~0014687

[f=0022813] [unit_script.lua] Error: [string "scripts/dante.lua"]:384: bad argument 0000004 to 'SetUnitWeaponState' (number expected, got NAN (check your code for div0))
[f=0022813] [UnitScript] Error: LuaRules::RunCallIn: error = 1, CLuaUnitScript::FireWeapon, [Internal Lua error: Call failure] [string "LuaRules/Gadgets/unit_script.lua"]:260: attempt to index global 'debug' (a nil value)
[f=0022866] Error: [DESYNC_WARNING] checksum 48502bdf from player 2 (Chesti) does not match our checksum db03d215 for frame-number 22860


uhm... fix dante's lua code?!

abma

2015-06-23 00:05

administrator   ~0014688

i don't see how to continue here with sync debugging without fixing the lua code first.

@jk: other ideas left?

hokomoko

2015-06-23 01:56

developer   ~0014689

They have the line:
local speedmult = 1/(Spring.GetUnitRulesParam(unitID,"slowState") or 1)

Apparently slowState can be 0?

abma

2015-06-23 02:11

administrator   ~0014690

Last edited: 2015-06-23 02:19

summary:

thats where the line where it crashes:

https://github.com/ZeroK-RTS/Zero-K/blob/master/scripts/dante.lua#L384

so reloadTime or speedmult is very large.

about slowState, no clue where it comes from (and yes, it looks like it is the candidate which causes the NAN).

last commit which changed GetUnitRulesParam:
https://github.com/spring/spring/commit/92c54d5b15775c39a8c67b203c28fe9b9db51cea

and the current function:
https://github.com/spring/spring/blob/92c54d5b15775c39a8c67b203c28fe9b9db51cea/rts/Lua/LuaSyncedRead.cpp#L4246

hokomoko

2015-06-23 10:15

developer   ~0014692

Last edited: 2015-06-23 10:16

https://github.com/ZeroK-RTS/Zero-K/pull/871

I've also tested, and got speedMult to be 'inf'

Google_Frog

2015-06-23 13:33

reporter   ~0014693

Last edited: 2015-06-23 13:34

That Dante error looks likely. But where did you get this from abma? It is not in either infolog.

abma

2015-06-23 18:42

administrator   ~0014695

https://github.com/spring/spring/blob/develop/rts/lib/lua/include/LuaInclude.h#L87

atm it requires a DEBUG build. not sure if it makes sense to add a dedicated compile option/setting for that as it seems important for testing.

always enabling imo is bad, as this very likely impacts performance.

Google_Frog

2015-06-24 14:13

reporter   ~0014698

Last edited: 2015-06-24 14:14

The Dante and conditionally transportability bugs were fixed. I had a desync in the first game of testing 98.0.1-876. Weirdly it was after the game was over.

abma

2015-06-24 23:02

administrator   ~0014699

Last edited: 2015-06-25 00:26

in the replay Desync ZK_1.3.6.6_DeltaSiegeDry Deluxe V3_98.0.1-876-g6fbb06e develop.sdf i don't get a error msg about NAN in lua code.

abma

2015-06-25 00:47

administrator   ~0014700

also i don't desync to the demo. as the demo is quiet long, the usual steps to debug this (valgrind) are not practical: it takes way to long, uses a lot of memory + cpu and maybe it finds nothing.

ideally we have a lot shorter demo which instantly desyncs.

we can't exclude stuff like memory-corruption & network errors. zk sends a lot of data at the end of a game, maybe this causes some very rare bug in the disconnect code / desync.

this is why i see the desync in "DeltaSiegeDry Deluxe" as minor atm:

please test more multiplayer games and see if this desyncs as well, if so please report!

maybe hokomoko/jk have an idea, but i'll give up at this point without more information.

hokomoko

2015-06-25 17:05

developer   ~0014701

Fix ed20e0c68d8720b352f90e158a3e1cd426c88203 committed to develop branch: Fix 0004833
deleting the synced projectile was originally based on when there are no
more unsynced subparticles, which led to desyncs if they weren't created
due to hitting the limit
This was changed to deleting the projectile after ttl+ParticleTTL, repo: spring changeset id: 5211

Issue History

Date Modified Username Field Change
2015-06-22 15:11 Google_Frog New Issue
2015-06-22 15:12 Google_Frog File Added: Desync_Svatopluk_Hide_and_Seek_v03_98.0.1-868-g6128051 develop.sdf
2015-06-22 15:12 Google_Frog File Added: Desync_GoogleFrog_Hide_and_Seek_v03_98.0.1-868-g6128051 develop.sdf
2015-06-22 15:12 Google_Frog File Added: desync_svatopluk.txt
2015-06-22 15:12 Google_Frog File Added: desync_googlefrog.txt
2015-06-22 18:19 abma Note Added: 0014680
2015-06-22 18:19 abma Note Edited: 0014680
2015-06-22 18:24 abma Target Version => 99.0
2015-06-22 18:29 abma Relationship added related to 0004835
2015-06-22 18:36 abma Note Added: 0014683
2015-06-22 18:36 abma Status new => feedback
2015-06-22 18:37 abma Note Edited: 0014683
2015-06-22 19:25 Google_Frog Note Added: 0014684
2015-06-22 19:25 Google_Frog Status feedback => new
2015-06-22 19:33 abma Note Added: 0014685
2015-06-22 19:36 abma Note Added: 0014686
2015-06-22 19:37 abma Note Edited: 0014686
2015-06-22 19:38 abma Relationship deleted related to 0004835
2015-06-23 00:02 abma Note Added: 0014687
2015-06-23 00:05 abma Note Added: 0014688
2015-06-23 00:05 abma Assigned To => jK
2015-06-23 00:05 abma Status new => feedback
2015-06-23 01:56 hokomoko Note Added: 0014689
2015-06-23 02:11 abma Note Added: 0014690
2015-06-23 02:13 abma Note Edited: 0014690
2015-06-23 02:19 abma Note Edited: 0014690
2015-06-23 10:15 hokomoko Note Added: 0014692
2015-06-23 10:16 hokomoko Note Edited: 0014692
2015-06-23 13:33 Google_Frog Note Added: 0014693
2015-06-23 13:33 Google_Frog Status feedback => assigned
2015-06-23 13:34 Google_Frog Note Edited: 0014693
2015-06-23 18:42 abma Note Added: 0014695
2015-06-24 14:13 Google_Frog Note Added: 0014698
2015-06-24 14:13 Google_Frog File Added: Desync ZK_1.3.6.6_DeltaSiegeDry Deluxe V3_98.0.1-876-g6fbb06e develop.sdf
2015-06-24 14:14 Google_Frog Note Edited: 0014698
2015-06-24 14:15 Google_Frog File Added: desync_98.0.1-876_infolog.txt
2015-06-24 14:47 hokomoko File Added: desync_debug_infolog.txt
2015-06-24 23:02 abma Note Added: 0014699
2015-06-25 00:26 abma Note Edited: 0014699
2015-06-25 00:47 abma Note Added: 0014700
2015-06-25 17:05 hokomoko Changeset attached => spring develop ed20e0c6
2015-06-25 17:05 hokomoko Note Added: 0014701
2015-06-25 17:05 hokomoko Assigned To jK => hokomoko
2015-06-25 17:05 hokomoko Status assigned => resolved
2015-06-25 17:05 hokomoko Resolution open => fixed