View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||
---|---|---|---|---|---|---|---|---|---|
0004688 | Spring engine | General | public | 2015-02-28 02:54 | 2016-01-07 00:23 | ||||
Reporter | user744 | ||||||||
Assigned To | hokomoko | ||||||||
Priority | normal | Severity | minor | Reproducibility | N/A | ||||
Status | resolved | Resolution | fixed | ||||||
Product Version | 98.0 | ||||||||
Target Version | Fixed in Version | 100.0+git | |||||||
Summary | 0004688: Desync with XTA test-934 | ||||||||
Description | This game desynced at about 12 minutes. http://replays.springrts.com/replay/22fff0540a8e46c48fba34bef11828f2/ | ||||||||
Additional Information | infolog jools http://pastebin.com/3M2xNj1K | ||||||||
Tags | No tags attached. | ||||||||
Checked infolog.txt for Errors | |||||||||
Attached Files |
|
![]() |
|
2015-02-28 02:56 |
xta_fleabowl_desync_raumfuellung_infolog.txt = my infolog |
2015-02-28 03:05 |
Maybe worth of note: That was first game of this mod version and for me the mod-download via springlobby had maybe failed in some way: SL showed synced and completed download, but on start spring crashed. (dont have infolog.) It was some errors about unitdefs/unitdef_post.lua that should not happen. It also said pool-archive "xyxyxxyxyxyxyx.spd already exists." I closed lobby, deleted pool, rejoined, spring loaded normal. Player Thorgasm noted: "you dont match the host" and pasted this error: [00:40:08] <Thorgasm> [f=0000000] Warning: [DESYNC WARNING] path-checksum 48828f53 for player 2 (Raumfuellung) does not match local checksum f75f66fb; stale PathEstimator-cache? [00:40:17] <Thorgasm> [f=0000000] Warning: [DESYNC WARNING] path-checksum 48828f53 for player 4 (6u0w) does not match local checksum f75f66fb; stale PathEstimator-cache? Note that there is two players with checksum error. Game was normal until 12min. |
2015-02-28 03:09 |
Another game with just one player on SAME map that desynced: http://replays.springrts.com/replay/9314f154f144840c21336de9f6f925c4/ This game did not desync: http://replays.springrts.com/replay/3b06f154d1e73506e69975a6e94ebf20/ random guess => Is it possible that the failed download/gamestart has somehow broken the pathcache for this map? (i did not delete pathcache yet, in case of further tests..) |
abma (administrator) 2015-02-28 14:09 |
is the desync reproduceable? does deleting pathcache fix it? |
Jools (reporter) 2015-02-28 17:02 Last edited: 2015-02-28 17:04 |
I should also say that XTA test-934 again includes a long lost fleabowl gadget that has been disabled for many years, and it could be that the reason it was disabled was simply because it caused desyncs. Or it could be because of that path cache. XTA test-935 has rewritten the flea stuff so that the gadget only has a synced part and instead communicates with widgets with GameRulesParams. It could be that this fixes the desyncs, but we have not tested yet. |
2015-02-28 19:37 |
> is the desync reproduceable? Except the 2 posted games on "Red Comet" I do not know of other desyncs. > does deleting pathcache fix it? I figured it is better to not delete it yet, and "hopefully" provoke a few more desyncs? related? https://springrts.com/mantis/view.php?id=3809 XTA svn is https://code.google.com/p/xta-springrts/source/list |
Jools (reporter) 2015-03-01 18:10 |
The desync is definitely reproducible also with version -935. We had desyncs on another map as well, here is my infolog if that helps: The replay: http://replays.springrts.com/replay/8842f354a46683b1db513c1b8dab427d/ |
Jools (reporter) 2015-03-01 18:13 |
I think the next step is to try to rule out if desync is related to fleas: in this xta version also the lups was updated, with quite a bit of changes. Note that xta hasn't had any desyncs since version 9.728 before, that's 17 releases or about 2 years I think. I think it's related to xta and not engine, but maybe could be still interesting to find out why. |
2015-03-12 17:44 |
wrt to the original desynced game ( http://replays.springrts.com/replay/22fff0540a8e46c48fba34bef11828f2 ) , I looked at infologs: path-checksum f75f66fb for player 0 ([PRO]Jools) path-checksum f75f66fb for player 1 (Thorgasm) path-checksum 48828f53 for player 2 (Raumfuellung) path-checksum 48828f53 for player 3 (NTG) path-checksum 48828f53 for player 4 (6u0w) 2 players had checksum f75f66fb 3 players had checksum 48828f53 So 2 (or 3) players had the same wrong checksum? It seems unlikely that this would happen by sheer randomness? |
Jools (reporter) 2015-03-12 18:03 |
I think we can rule out new factions as cause for desync: the archer's valley game has a desync and it only contains arm and core units. And fleas. I don't know if it's related, but there are two maps called red comet, I mean two filenames, but they point to same smf. From my archivecache.lua: #1: { name = "red_comet.sd7", path = [[U:\bin\Spring\Data\maps\]], modified = "1425136685", checksum = "3608457918", archivedata = { mapfile = "maps/Red Comet.smf", modtype = 3, name = "Red Comet", name_pure = "Red Comet", }, }, #2: { name = "RedComet.sd7", path = [[U:\bin\Spring\Data\maps\]], modified = "1416945622", checksum = "3608457918", archivedata = { mapfile = "maps/Red Comet.smf", modtype = 3, name = "Red Comet", name_pure = "Red Comet", }, }, |
Jools (reporter) 2015-03-30 02:59 Last edited: 2015-03-30 03:10 |
Happened again with XTA test-950 on TitalDuel: http://replays.springrts.com/replay/7f8a1855f177c75ee6a0112f45991bb2/ Happened at time 5:49, ~3 seconds after a /give all command. Not sure this helps so much, but I thought it's something to narrow it down. There were some widget and gadget errors because of getting too many unit commands, could be related to those in this case. |
Jools (reporter) 2015-03-30 16:55 |
Here is the stacktrace: [f=0010443] [unit_script.lua] Error: [string "scripts/arm_easter_egg.lua"]:23: [GetUnitCommands] called too often without a 2nd argument to define maxNumCmds returned in the table, please check your code! Especially when you only read the first cmd or want to check if the queue is non-empty, this can be a huge performance leak! [f=0010443] Error: LuaRules::RunCallIn: error = 2, GameFrame, [Internal Lua error: Call failure] [string "LuaGadgets/Gadgets/unit_script.lua"]:259: attempt to index global 'debug' (a nil value) stack traceback: [C]: in function 'sp_CallAsUnit' [string "LuaGadgets/Gadgets/unit_script.lua"]:820: in function 'GameFrame' [string "LuaRules/gadgets.lua"]:834: in function <[string "LuaRules/gadgets.lua"]:832> (tail call): ? The sp_CallAsUnit is in the basecontent unit_script.lua and in a call to wake up units: sp_CallAsUnit(unitID, WakeUp, sleeper). Can the fact that "debug" table is nil cause a desync? Also could this be related to https://springrts.com/mantis/view.php?id=1050, iterating through a table with keys that are coroutines? This is the structure of the loop in unit_script: function gadget:GameFrame() local n = sp_GetGameFrame() local zzz = sleepers[n] if zzz then sleepers[n] = nil -- Wake up the lazy bastards for this frame (in reverse order). -- NOTE: -- 1. during WakeUp() a thread t1 might Signal (kill) another thread t2 -- 2. t2 might also be registered in sleepers[n] and not yet woken up -- 3. if so, t1's signal would cause t2 to be removed from sleepers[n] -- via Signal --> RemoveTableElement -- 4. therefore we cannot use the "for i = 1, #zzz" pattern since the -- container size/contents might change while we are iterating over -- it (and a Lua for-loop range expression is only evaluated once) while (#zzz > 0) do local sleeper = zzz[#zzz] local unitID = sleeper.unitID zzz[#zzz] = nil PushActiveUnitID(unitID) sp_CallAsUnit(unitID, WakeUp, sleeper) PopActiveUnitID() end end And this is the Wakeup function: local function WakeUp(thread, ...) thread.container = nil local co = thread.thread local good, err = co_resume(co, ...) if (not good) then Spring.Log(section, LOG.ERROR, err) Spring.Log(section, LOG.ERROR, debug.traceback(co)) RunOnError(thread) end end To me it seems the loop has a function that invokes a coroutine, but I just followed the https://springrts.com/wiki/Debugging_sync_errors and don't know much more about this. |
hokomoko (developer) 2016-01-07 00:23 |
path checksums were remade, should be fixed. |
![]() |
|||
Date Modified | Username | Field | Change |
---|---|---|---|
2015-02-28 02:54 |
|
New Issue | |
2015-02-28 02:54 |
|
File Added: 20150228_003745_Red Comet_98.sdf | |
2015-02-28 02:56 |
|
Note Added: 0014093 | |
2015-02-28 03:05 |
|
Note Added: 0014094 | |
2015-02-28 03:09 |
|
Note Added: 0014095 | |
2015-02-28 03:10 |
|
File Added: xta_fleabowl_desync_raumfuellung_infolog.txt | |
2015-02-28 14:09 | abma | Note Added: 0014096 | |
2015-02-28 14:09 | abma | Status | new => feedback |
2015-02-28 17:02 | Jools | Note Added: 0014097 | |
2015-02-28 17:04 | Jools | Note Edited: 0014097 | View Revisions |
2015-02-28 19:37 |
|
Note Added: 0014098 | |
2015-02-28 19:37 |
|
Status | feedback => new |
2015-03-01 18:10 | Jools | Note Added: 0014102 | |
2015-03-01 18:10 | Jools | File Added: infolog_jools2.txt | |
2015-03-01 18:13 | Jools | Note Added: 0014103 | |
2015-03-12 17:44 |
|
Note Added: 0014150 | |
2015-03-12 18:03 | Jools | Note Added: 0014151 | |
2015-03-30 02:59 | Jools | Note Added: 0014253 | |
2015-03-30 03:00 | Jools | File Added: infolog_jools_desync_3.txt | |
2015-03-30 03:04 | Jools | Note Edited: 0014253 | View Revisions |
2015-03-30 03:10 | Jools | Note Edited: 0014253 | View Revisions |
2015-03-30 16:55 | Jools | Note Added: 0014254 | |
2016-01-07 00:23 | hokomoko | Note Added: 0015456 | |
2016-01-07 00:23 | hokomoko | Status | new => resolved |
2016-01-07 00:23 | hokomoko | Fixed in Version | => 100.0+git |
2016-01-07 00:23 | hokomoko | Resolution | open => fixed |
2016-01-07 00:23 | hokomoko | Assigned To | => hokomoko |