2025-06-14 05:00 CEST

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0004688Spring engineGeneralpublic2016-01-07 00:23
Reporteruser744 
Assigned Tohokomoko 
PrioritynormalSeverityminorReproducibilityN/A
StatusresolvedResolutionfixed 
Product Version98.0 
Target VersionFixed in Version100.0+git 
Summary0004688: Desync with XTA test-934
DescriptionThis game desynced at about 12 minutes.
http://replays.springrts.com/replay/22fff0540a8e46c48fba34bef11828f2/
Additional Informationinfolog jools http://pastebin.com/3M2xNj1K
TagsNo tags attached.
Checked infolog.txt for Errors
Attached Files

-Relationships
+Relationships

-Notes

~0014093

user744

xta_fleabowl_desync_raumfuellung_infolog.txt = my infolog

~0014094

user744

Maybe worth of note:
That was first game of this mod version and for me the mod-download via springlobby had maybe failed in some way:

SL showed synced and completed download, but on start spring crashed.
(dont have infolog.)
It was some errors about unitdefs/unitdef_post.lua that should not happen. It also said pool-archive "xyxyxxyxyxyxyx.spd already exists."
I closed lobby, deleted pool, rejoined, spring loaded normal.


Player Thorgasm noted: "you dont match the host" and pasted this error:

[00:40:08] <Thorgasm> [f=0000000] Warning: [DESYNC WARNING] path-checksum 48828f53 for player 2 (Raumfuellung) does not match local checksum f75f66fb; stale PathEstimator-cache?
[00:40:17] <Thorgasm> [f=0000000] Warning: [DESYNC WARNING] path-checksum 48828f53 for player 4 (6u0w) does not match local checksum f75f66fb; stale PathEstimator-cache?

Note that there is two players with checksum error.

Game was normal until 12min.

~0014095

user744

Another game with just one player on SAME map that desynced:
http://replays.springrts.com/replay/9314f154f144840c21336de9f6f925c4/

This game did not desync:
http://replays.springrts.com/replay/3b06f154d1e73506e69975a6e94ebf20/


random guess => Is it possible that the failed download/gamestart has somehow broken the pathcache for this map?
(i did not delete pathcache yet, in case of further tests..)

~0014096

abma (administrator)

is the desync reproduceable?


does deleting pathcache fix it?

~0014097

Jools (reporter)

Last edited: 2015-02-28 17:04

View 2 revisions

I should also say that XTA test-934 again includes a long lost fleabowl gadget that has been disabled for many years, and it could be that the reason it was disabled was simply because it caused desyncs.

Or it could be because of that path cache.

XTA test-935 has rewritten the flea stuff so that the gadget only has a synced part and instead communicates with widgets with GameRulesParams. It could be that this fixes the desyncs, but we have not tested yet.

~0014098

user744

> is the desync reproduceable?
Except the 2 posted games on "Red Comet" I do not know of other desyncs.

> does deleting pathcache fix it?
I figured it is better to not delete it yet, and "hopefully" provoke a few more desyncs?


related? https://springrts.com/mantis/view.php?id=3809


XTA svn is https://code.google.com/p/xta-springrts/source/list

~0014102

Jools (reporter)

The desync is definitely reproducible also with version -935. We had desyncs on another map as well, here is my infolog if that helps:

The replay: http://replays.springrts.com/replay/8842f354a46683b1db513c1b8dab427d/

~0014103

Jools (reporter)

I think the next step is to try to rule out if desync is related to fleas: in this xta version also the lups was updated, with quite a bit of changes.

Note that xta hasn't had any desyncs since version 9.728 before, that's 17 releases or about 2 years I think. I think it's related to xta and not engine, but maybe could be still interesting to find out why.

~0014150

user744

wrt to the original desynced game ( http://replays.springrts.com/replay/22fff0540a8e46c48fba34bef11828f2 ) , I looked at infologs:
path-checksum f75f66fb for player 0 ([PRO]Jools)
path-checksum f75f66fb for player 1 (Thorgasm)
path-checksum 48828f53 for player 2 (Raumfuellung)
path-checksum 48828f53 for player 3 (NTG)
path-checksum 48828f53 for player 4 (6u0w)

2 players had checksum f75f66fb
3 players had checksum 48828f53

So 2 (or 3) players had the same wrong checksum? It seems unlikely that this would happen by sheer randomness?

~0014151

Jools (reporter)

I think we can rule out new factions as cause for desync: the archer's valley game has a desync and it only contains arm and core units. And fleas.

I don't know if it's related, but there are two maps called red comet, I mean two filenames, but they point to same smf. From my archivecache.lua:


#1:
{
            name = "red_comet.sd7",
            path = [[U:\bin\Spring\Data\maps\]],
            modified = "1425136685",
            checksum = "3608457918",
            archivedata = {
                mapfile = "maps/Red Comet.smf",
                modtype = 3,
                name = "Red Comet",
                name_pure = "Red Comet",
            },
        },

#2:

{
            name = "RedComet.sd7",
            path = [[U:\bin\Spring\Data\maps\]],
            modified = "1416945622",
            checksum = "3608457918",
            archivedata = {
                mapfile = "maps/Red Comet.smf",
                modtype = 3,
                name = "Red Comet",
                name_pure = "Red Comet",
            },
        },

~0014253

Jools (reporter)

Last edited: 2015-03-30 03:10

View 3 revisions

Happened again with XTA test-950 on TitalDuel: http://replays.springrts.com/replay/7f8a1855f177c75ee6a0112f45991bb2/

Happened at time 5:49, ~3 seconds after a /give all command. Not sure this helps so much, but I thought it's something to narrow it down.

There were some widget and gadget errors because of getting too many unit commands, could be related to those in this case.

~0014254

Jools (reporter)

Here is the stacktrace:

[f=0010443] [unit_script.lua] Error: [string "scripts/arm_easter_egg.lua"]:23: [GetUnitCommands] called too often without a 2nd argument to define maxNumCmds returned in the table, please check your code!
Especially when you only read the first cmd or want to check if the queue is non-empty, this can be a huge performance leak!

[f=0010443] Error: LuaRules::RunCallIn: error = 2, GameFrame, [Internal Lua error: Call failure] [string "LuaGadgets/Gadgets/unit_script.lua"]:259: attempt to index global 'debug' (a nil value)

stack traceback:
    [C]: in function 'sp_CallAsUnit'
    [string "LuaGadgets/Gadgets/unit_script.lua"]:820: in function 'GameFrame'
    [string "LuaRules/gadgets.lua"]:834: in function <[string "LuaRules/gadgets.lua"]:832>
    (tail call): ?

The sp_CallAsUnit is in the basecontent unit_script.lua and in a call to wake up units: sp_CallAsUnit(unitID, WakeUp, sleeper). Can the fact that "debug" table is nil cause a desync?

Also could this be related to https://springrts.com/mantis/view.php?id=1050, iterating through a table with keys that are coroutines?

This is the structure of the loop in unit_script:

function gadget:GameFrame()

local n = sp_GetGameFrame()
    local zzz = sleepers[n]

    if zzz then
        sleepers[n] = nil

        -- Wake up the lazy bastards for this frame (in reverse order).
        -- NOTE:
        -- 1. during WakeUp() a thread t1 might Signal (kill) another thread t2
        -- 2. t2 might also be registered in sleepers[n] and not yet woken up
        -- 3. if so, t1's signal would cause t2 to be removed from sleepers[n]
        -- via Signal --> RemoveTableElement
        -- 4. therefore we cannot use the "for i = 1, #zzz" pattern since the
        -- container size/contents might change while we are iterating over
        -- it (and a Lua for-loop range expression is only evaluated once)
        while (#zzz > 0) do
            local sleeper = zzz[#zzz]
            local unitID = sleeper.unitID

            zzz[#zzz] = nil

            PushActiveUnitID(unitID)
            sp_CallAsUnit(unitID, WakeUp, sleeper)
            PopActiveUnitID()
        end
    end

And this is the Wakeup function:

local function WakeUp(thread, ...)
    thread.container = nil
    local co = thread.thread
    local good, err = co_resume(co, ...)
    if (not good) then
        Spring.Log(section, LOG.ERROR, err)
        Spring.Log(section, LOG.ERROR, debug.traceback(co))
        RunOnError(thread)
    end
end

To me it seems the loop has a function that invokes a coroutine, but I just followed the https://springrts.com/wiki/Debugging_sync_errors and don't know much more about this.

~0015456

hokomoko (developer)

path checksums were remade, should be fixed.
+Notes

-Issue History
Date Modified Username Field Change
2015-02-28 02:54 user744 New Issue
2015-02-28 02:54 user744 File Added: 20150228_003745_Red Comet_98.sdf
2015-02-28 02:56 user744 Note Added: 0014093
2015-02-28 03:05 user744 Note Added: 0014094
2015-02-28 03:09 user744 Note Added: 0014095
2015-02-28 03:10 user744 File Added: xta_fleabowl_desync_raumfuellung_infolog.txt
2015-02-28 14:09 abma Note Added: 0014096
2015-02-28 14:09 abma Status new => feedback
2015-02-28 17:02 Jools Note Added: 0014097
2015-02-28 17:04 Jools Note Edited: 0014097 View Revisions
2015-02-28 19:37 user744 Note Added: 0014098
2015-02-28 19:37 user744 Status feedback => new
2015-03-01 18:10 Jools Note Added: 0014102
2015-03-01 18:10 Jools File Added: infolog_jools2.txt
2015-03-01 18:13 Jools Note Added: 0014103
2015-03-12 17:44 user744 Note Added: 0014150
2015-03-12 18:03 Jools Note Added: 0014151
2015-03-30 02:59 Jools Note Added: 0014253
2015-03-30 03:00 Jools File Added: infolog_jools_desync_3.txt
2015-03-30 03:04 Jools Note Edited: 0014253 View Revisions
2015-03-30 03:10 Jools Note Edited: 0014253 View Revisions
2015-03-30 16:55 Jools Note Added: 0014254
2016-01-07 00:23 hokomoko Note Added: 0015456
2016-01-07 00:23 hokomoko Status new => resolved
2016-01-07 00:23 hokomoko Fixed in Version => 100.0+git
2016-01-07 00:23 hokomoko Resolution open => fixed
2016-01-07 00:23 hokomoko Assigned To => hokomoko
+Issue History