2019-08-21 22:23 CEST

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0005955Spring engineGeneralpublic2018-05-12 17:44
ReporterFloris 
Assigned ToKloot 
PrioritynormalSeverityminorReproducibilityhave not tried
StatusresolvedResolutionfixed 
Product Version104.0 +git 
Target VersionFixed in Version 
Summary0005955: desyncing
Descriptionmaintenance ...-358 desyncs

http://replays.springrts.com/replay/52cbcb5a084e4c31da54045cbafd7536/
TagsNo tags attached.
Checked infolog.txt for lua Errors
Attached Files

-Relationships
+Relationships

-Notes

~0018999

Floris (reporter)

[23:53:59] <[President]Trump> i saw replay of desync game
[23:54:11] <[PinK]triton> ok?
[23:54:23] <[PinK]triton> flow said he reported issue already
[23:54:30] <[PinK]triton> you think you found the reason?
[23:54:34] <[President]Trump> noticed my nukes and screaners didnt stockpile in replay, while they did in the game
[23:55:01] <[PinK]triton> I can share this information to flow
[23:55:05] <[PinK]triton> but nothing else much
[23:55:11] <[President]Trump> other player's nukes and mercury did stockpile
[23:55:16] <[PinK]triton> ok

~0019000

Kloot (developer)

Last edited: 2018-04-10 20:40

View 2 revisions

running the demo "only" turned up a random memory corruption bug (also present in 151-g11de57d), nothing related to weapon stockpiling which is ancient code.

since ZK has so far been desync-free using 358-gb58f0b, it would help to know if this can be reproduced with 378-g4eeb848.

~0019031

Kloot (developer)

I can not find anything suspect in 378+.

NB if you're not aware of it already: the "restore dead units" (?) trick you used as a spectator in another DSD game a few days ago does desync demos.

~0019032

Floris (reporter)

Last edited: 2018-04-17 22:51

View 2 revisions

at ~ 28 mins in BanDolf starts desyncing on maintenance -409

http://replays.springrts.com/replay/3053d65ab14f41ee0185b0f9660f129a/

~0019033

Floris (reporter)

and another game full of desyncs:
http://replays.springrts.com/replay/b068d65af6de2b2d64c3274c74181d11/

~0019035

Floris (reporter)

recorded moment of BanDolf desyncing:
https://www.youtube.com/watch?v=RMYooZ8dIXs

how can we help?

~0019036

Kloot (developer)

afaics in both 3053d65ab14f41ee0185b0f9660f129a and b068d65af6de2b2d64c3274c74181d11 only one player diverged while the rest kept a consistent state (BanDolf even came back into sync before going out of it again), which suggests an unsynced origin and may be harder to reproduce.

right now I have no idea about the cause, addrsan and signan builds show no errors and local checksum when replaying is always (tested 10 runs) constant.

the best way to help would be to get as many people trying as many "unusual" things as early as possible (plus whatever BanDolf was doing) to narrow down what triggers it. assuming the source was introduced between 151 and 358 another option is to bisect, but that will take more time.

~0019037

Floris (reporter)

we have desyncs on 151 as well, ...maybe lightly less often though

~0019038

Doo (reporter)

As far as I can remember, we've had some (a few) desyncs on spring 103 aswell.

The information I can share is very thin, but i guess it's still worth notifying.
In one of my desyncs, i spotted that a gadget had completly stopped working:
unit_stomp.lua -- It prevents krogoth stomp weapon (that fires on each footsteps, causing damages all around the foot) to damage units beside peewees/aks/scouts
The issue trump mentionned seems to be related to unit_mercscr_stockpile_limit.lua, which handles the stockpiling of screamers and mercuries (allows/disallows stockpiling to set a 5 stockpiled missiles limit)

I can't tell if synced gadgets stopping working is just an effect of being unsynced, or if it means a synced gadget/LUS or if a global Lua instability is the actual cause of the desyncs.

Is it possible a gadget or unit script causes desyncs for players ? What kind of code would be tricky / would possibly cause desyncs (as in, is there some things that I should just avoid playing with)?
That would help us check our gadgetry again to make sure nothing sticks out here.

Is the use of math.random in synced code safe?

One of the unit scripts uses this:
        for count, piece in pairs(piecetable) do
            randomnumber = math.random(1,2)
                        [...]
                end
Is this safe? Or is there a chance that the math.random() result for piece[count] differs from one player to another?

~0019039

Kloot (developer)

Last edited: 2018-04-19 13:37

View 3 revisions

@Floris I only knew of the "Lua OOM while catching up" desync in 151, this is news to me.


"Is it possible a gadget or unit script causes desyncs for players ?"

yes, for example if the game archive is corrupted (which happened more than once while ZK was using sdp) a gadget can crash or fail to load on one machine. another common mistake is to use tables as table keys in synced Lua, which will cause iteration order to diverge.


"Is the use of math.random in synced code safe?"

yes.


"In one of my desyncs, i spotted that a gadget had completly stopped working: unit_stomp.lua"

did your infolog mention anything special about that gadget?

~0019040

Doo (reporter)

Balanced Annihilation is usually downloaded through SpringLobby as sdp, and rarely as an sd7 from springfiles or whatever direct download link. Especially for the test versions.
Should I consider this as a possible source of this issue?
(But then i'd ask why now, why not when we were playing BA 9.46 on spring 103?)

~0019041

Kloot (developer)

Last edited: 2018-04-19 14:05

View 3 revisions

"Balanced Annihilation is usually downloaded through SpringLobby as sdp ... Should I consider this as a possible source of this issue?"

sdp download corruption issues were common with ZK last year (just search mantis), so I would strongly recommend staying away from pool archives.


"But then i'd ask why now, why not when we were playing BA 9.46 on spring 103?"

I don't have statistics, but regular hosting of test versions distributed via sdp seems to be more popular now. it's also possible (but speculation) the downloader implementation used by springlobby was broken after 103, or engine filesystem changes might have snuck in a bug.

~0019042

Google_Frog (reporter)

Zero-K is not necessarily desync-free on 358-gb58f0b. See these reports https://github.com/ZeroK-RTS/CrashReports/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+desync.

I have not been paying too much attention to the desync logs recently. I recall glancing at each as it appeared and deciding either that it was some other issue (such as a bad download) or that there was little to do at the time.

~0019046

Kloot (developer)

all of the 358 ZK desyncs can be traced to Lua OOM's or broken content.

there might of course still be a "genuine" case lurking in engine code (especially after the recent refactors), but until the BA community rules out pool corruption by switching to sdz for tests this will go onto the UTR pile.

~0019052

Floris (reporter)

At first the spec RPaulson desyncs for a while and then (much) later player spike_spb desyncs later (together with some others)

My 2nd pc desynced shortly as well (Floris) it has manually downloaded and installed the BA archive.

http://replays.springrts.com/replay/7211e25a88e25d9942c23e368a21b655/

will attach spike_spb's replay

~0019053

Floris (reporter)

added spike_spb' s infolog as well

~0019054

Kloot (developer)

RPaulson didn't desync after rejoining, so it's a non-deterministic bug with an unknown (but small) probability of being triggered.

unfortunately 7211e25a88e25d9942c23e368a21b655 didn't reveal anything new yet.

~0019105

Kloot (developer)

nuked
+Notes

-Issue History
Date Modified Username Field Change
2018-04-09 23:20 Floris New Issue
2018-04-09 23:20 Floris File Added: infolog.txt
2018-04-09 23:55 Floris Note Added: 0018999
2018-04-10 20:38 Kloot Note Added: 0019000
2018-04-10 20:40 Kloot Note Edited: 0019000 View Revisions
2018-04-16 17:33 Kloot Status new => closed
2018-04-16 17:33 Kloot Resolution open => unable to reproduce
2018-04-16 17:33 Kloot Note Added: 0019031
2018-04-17 22:43 Floris Status closed => feedback
2018-04-17 22:43 Floris Resolution unable to reproduce => reopened
2018-04-17 22:43 Floris Note Added: 0019032
2018-04-17 22:51 abma Note Edited: 0019032 View Revisions
2018-04-18 00:30 Floris Note Added: 0019033
2018-04-18 00:30 Floris Status feedback => new
2018-04-19 12:24 Floris Note Added: 0019035
2018-04-19 12:53 Kloot Note Added: 0019036
2018-04-19 12:55 Floris Note Added: 0019037
2018-04-19 13:21 Doo Note Added: 0019038
2018-04-19 13:32 Kloot Note Added: 0019039
2018-04-19 13:33 Kloot Note Edited: 0019039 View Revisions
2018-04-19 13:36 Doo Note Added: 0019040
2018-04-19 13:37 Kloot Note Edited: 0019039 View Revisions
2018-04-19 13:51 Kloot Note Added: 0019041
2018-04-19 14:00 Kloot Note Edited: 0019041 View Revisions
2018-04-19 14:05 Kloot Note Edited: 0019041 View Revisions
2018-04-19 17:41 Google_Frog Note Added: 0019042
2018-04-22 17:45 Kloot Status new => closed
2018-04-22 17:45 Kloot Note Added: 0019046
2018-04-26 21:29 Floris Status closed => feedback
2018-04-26 21:29 Floris Note Added: 0019052
2018-04-26 21:29 Floris File Added: 20180426_205048_DeltaSiegePrime_Ultimate_104.0.1-413-gd902a7b_maintenance.sdfz
2018-04-26 21:34 Floris File Added: infolog_spike_spb.txt
2018-04-26 21:34 Floris Note Added: 0019053
2018-04-26 21:34 Floris Status feedback => new
2018-04-27 18:42 Kloot Note Added: 0019054
2018-05-12 15:53 Kloot Assigned To => Kloot
2018-05-12 15:53 Kloot Status new => assigned
2018-05-12 17:44 Kloot Status assigned => resolved
2018-05-12 17:44 Kloot Resolution reopened => fixed
2018-05-12 17:44 Kloot Note Added: 0019105
+Issue History