View Issue Details

IDProjectCategoryView StatusLast Update
0005955Spring engineGeneralpublic2018-05-12 17:44
ReporterFloris Assigned ToKloot  
PrioritynormalSeverityminorReproducibilityhave not tried
Status resolvedResolutionfixed 
Product Version104.0 +git 
Summary0005955: desyncing
Descriptionmaintenance ...-358 desyncs

http://replays.springrts.com/replay/52cbcb5a084e4c31da54045cbafd7536/
TagsNo tags attached.
Attached Files
infolog.txt (Attachment missing)
20180426_205048_DeltaSiegePrime_Ultimate_104.0.1-413-gd902a7b_maintenance.sdfz (Attachment missing)
Checked infolog.txt for Errors

Activities

Floris

2018-04-09 23:55

reporter   ~0018999

[23:53:59] <[President]Trump> i saw replay of desync game
[23:54:11] <[PinK]triton> ok?
[23:54:23] <[PinK]triton> flow said he reported issue already
[23:54:30] <[PinK]triton> you think you found the reason?
[23:54:34] <[President]Trump> noticed my nukes and screaners didnt stockpile in replay, while they did in the game
[23:55:01] <[PinK]triton> I can share this information to flow
[23:55:05] <[PinK]triton> but nothing else much
[23:55:11] <[President]Trump> other player's nukes and mercury did stockpile
[23:55:16] <[PinK]triton> ok

Kloot

2018-04-10 20:38

developer   ~0019000

Last edited: 2018-04-10 20:40

running the demo "only" turned up a random memory corruption bug (also present in 151-g11de57d), nothing related to weapon stockpiling which is ancient code.

since ZK has so far been desync-free using 358-gb58f0b, it would help to know if this can be reproduced with 378-g4eeb848.

Kloot

2018-04-16 17:33

developer   ~0019031

I can not find anything suspect in 378+.

NB if you're not aware of it already: the "restore dead units" (?) trick you used as a spectator in another DSD game a few days ago does desync demos.

Floris

2018-04-17 22:43

reporter   ~0019032

Last edited: 2018-04-17 22:51

at ~ 28 mins in BanDolf starts desyncing on maintenance -409

http://replays.springrts.com/replay/3053d65ab14f41ee0185b0f9660f129a/

Floris

2018-04-18 00:30

reporter   ~0019033

and another game full of desyncs:
http://replays.springrts.com/replay/b068d65af6de2b2d64c3274c74181d11/

Floris

2018-04-19 12:24

reporter   ~0019035

recorded moment of BanDolf desyncing:
https://www.youtube.com/watch?v=RMYooZ8dIXs

how can we help?

Kloot

2018-04-19 12:53

developer   ~0019036

afaics in both 3053d65ab14f41ee0185b0f9660f129a and b068d65af6de2b2d64c3274c74181d11 only one player diverged while the rest kept a consistent state (BanDolf even came back into sync before going out of it again), which suggests an unsynced origin and may be harder to reproduce.

right now I have no idea about the cause, addrsan and signan builds show no errors and local checksum when replaying is always (tested 10 runs) constant.

the best way to help would be to get as many people trying as many "unusual" things as early as possible (plus whatever BanDolf was doing) to narrow down what triggers it. assuming the source was introduced between 151 and 358 another option is to bisect, but that will take more time.

Floris

2018-04-19 12:55

reporter   ~0019037

we have desyncs on 151 as well, ...maybe lightly less often though

Doo

2018-04-19 13:21

reporter   ~0019038

As far as I can remember, we've had some (a few) desyncs on spring 103 aswell.

The information I can share is very thin, but i guess it's still worth notifying.
In one of my desyncs, i spotted that a gadget had completly stopped working:
unit_stomp.lua -- It prevents krogoth stomp weapon (that fires on each footsteps, causing damages all around the foot) to damage units beside peewees/aks/scouts
The issue trump mentionned seems to be related to unit_mercscr_stockpile_limit.lua, which handles the stockpiling of screamers and mercuries (allows/disallows stockpiling to set a 5 stockpiled missiles limit)

I can't tell if synced gadgets stopping working is just an effect of being unsynced, or if it means a synced gadget/LUS or if a global Lua instability is the actual cause of the desyncs.

Is it possible a gadget or unit script causes desyncs for players ? What kind of code would be tricky / would possibly cause desyncs (as in, is there some things that I should just avoid playing with)?
That would help us check our gadgetry again to make sure nothing sticks out here.

Is the use of math.random in synced code safe?

One of the unit scripts uses this:
        for count, piece in pairs(piecetable) do
            randomnumber = math.random(1,2)
                        [...]
                end
Is this safe? Or is there a chance that the math.random() result for piece[count] differs from one player to another?

Kloot

2018-04-19 13:32

developer   ~0019039

Last edited: 2018-04-19 13:37

@Floris I only knew of the "Lua OOM while catching up" desync in 151, this is news to me.


"Is it possible a gadget or unit script causes desyncs for players ?"

yes, for example if the game archive is corrupted (which happened more than once while ZK was using sdp) a gadget can crash or fail to load on one machine. another common mistake is to use tables as table keys in synced Lua, which will cause iteration order to diverge.


"Is the use of math.random in synced code safe?"

yes.


"In one of my desyncs, i spotted that a gadget had completly stopped working: unit_stomp.lua"

did your infolog mention anything special about that gadget?

Doo

2018-04-19 13:36

reporter   ~0019040

Balanced Annihilation is usually downloaded through SpringLobby as sdp, and rarely as an sd7 from springfiles or whatever direct download link. Especially for the test versions.
Should I consider this as a possible source of this issue?
(But then i'd ask why now, why not when we were playing BA 9.46 on spring 103?)

Kloot

2018-04-19 13:51

developer   ~0019041

Last edited: 2018-04-19 14:05

"Balanced Annihilation is usually downloaded through SpringLobby as sdp ... Should I consider this as a possible source of this issue?"

sdp download corruption issues were common with ZK last year (just search mantis), so I would strongly recommend staying away from pool archives.


"But then i'd ask why now, why not when we were playing BA 9.46 on spring 103?"

I don't have statistics, but regular hosting of test versions distributed via sdp seems to be more popular now. it's also possible (but speculation) the downloader implementation used by springlobby was broken after 103, or engine filesystem changes might have snuck in a bug.

Google_Frog

2018-04-19 17:41

reporter   ~0019042

Zero-K is not necessarily desync-free on 358-gb58f0b. See these reports https://github.com/ZeroK-RTS/CrashReports/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+desync.

I have not been paying too much attention to the desync logs recently. I recall glancing at each as it appeared and deciding either that it was some other issue (such as a bad download) or that there was little to do at the time.

Kloot

2018-04-22 17:45

developer   ~0019046

all of the 358 ZK desyncs can be traced to Lua OOM's or broken content.

there might of course still be a "genuine" case lurking in engine code (especially after the recent refactors), but until the BA community rules out pool corruption by switching to sdz for tests this will go onto the UTR pile.

Floris

2018-04-26 21:29

reporter   ~0019052

At first the spec RPaulson desyncs for a while and then (much) later player spike_spb desyncs later (together with some others)

My 2nd pc desynced shortly as well (Floris) it has manually downloaded and installed the BA archive.

http://replays.springrts.com/replay/7211e25a88e25d9942c23e368a21b655/

will attach spike_spb's replay

Floris

2018-04-26 21:34

reporter   ~0019053

added spike_spb' s infolog as well
infolog_spike_spb.txt (Attachment missing)

Kloot

2018-04-27 18:42

developer   ~0019054

RPaulson didn't desync after rejoining, so it's a non-deterministic bug with an unknown (but small) probability of being triggered.

unfortunately 7211e25a88e25d9942c23e368a21b655 didn't reveal anything new yet.

Kloot

2018-05-12 17:44

developer   ~0019105

nuked

Issue History

Date Modified Username Field Change
2018-04-09 23:20 Floris New Issue
2018-04-09 23:20 Floris File Added: infolog.txt
2018-04-09 23:55 Floris Note Added: 0018999
2018-04-10 20:38 Kloot Note Added: 0019000
2018-04-10 20:40 Kloot Note Edited: 0019000
2018-04-16 17:33 Kloot Status new => closed
2018-04-16 17:33 Kloot Resolution open => unable to reproduce
2018-04-16 17:33 Kloot Note Added: 0019031
2018-04-17 22:43 Floris Status closed => feedback
2018-04-17 22:43 Floris Resolution unable to reproduce => reopened
2018-04-17 22:43 Floris Note Added: 0019032
2018-04-17 22:51 abma Note Edited: 0019032
2018-04-18 00:30 Floris Note Added: 0019033
2018-04-18 00:30 Floris Status feedback => new
2018-04-19 12:24 Floris Note Added: 0019035
2018-04-19 12:53 Kloot Note Added: 0019036
2018-04-19 12:55 Floris Note Added: 0019037
2018-04-19 13:21 Doo Note Added: 0019038
2018-04-19 13:32 Kloot Note Added: 0019039
2018-04-19 13:33 Kloot Note Edited: 0019039
2018-04-19 13:36 Doo Note Added: 0019040
2018-04-19 13:37 Kloot Note Edited: 0019039
2018-04-19 13:51 Kloot Note Added: 0019041
2018-04-19 14:00 Kloot Note Edited: 0019041
2018-04-19 14:05 Kloot Note Edited: 0019041
2018-04-19 17:41 Google_Frog Note Added: 0019042
2018-04-22 17:45 Kloot Status new => closed
2018-04-22 17:45 Kloot Note Added: 0019046
2018-04-26 21:29 Floris Status closed => feedback
2018-04-26 21:29 Floris Note Added: 0019052
2018-04-26 21:29 Floris File Added: 20180426_205048_DeltaSiegePrime_Ultimate_104.0.1-413-gd902a7b_maintenance.sdfz
2018-04-26 21:34 Floris File Added: infolog_spike_spb.txt
2018-04-26 21:34 Floris Note Added: 0019053
2018-04-26 21:34 Floris Status feedback => new
2018-04-27 18:42 Kloot Note Added: 0019054
2018-05-12 15:53 Kloot Assigned To => Kloot
2018-05-12 15:53 Kloot Status new => assigned
2018-05-12 17:44 Kloot Status assigned => resolved
2018-05-12 17:44 Kloot Resolution reopened => fixed
2018-05-12 17:44 Kloot Note Added: 0019105