I've spend few hours today trying to understand how synchronization between server & clients is checked and whether anything could be done to implement resync relatively easy. I'm not a c++/programming expert, so feel free to correct me if I'm wrong. Here are my thoughts anyway.
if SYNCCHECK is defined, it looks like that sync check is based on thorough evaluation of any change happening to variables of Synced* types as well as few ASSERT_SYNCED() calls. The hash of changes then accumulated into unsigned int variable (g_checksum). This checksum is sent to server every frame and if it doesn't match to the majority consensus checksum, then the client sent erroneous checksum is declared to have desync. Game over.
In my view the current scheme (if I got it right) is fine as far as detecting desync occurrence, but it may have few flaws:
1. It's impossible to tell what synchronized component caused a desync exactly. (Yes, looks like there is SYNCDEBUG tracer, which will likely to outline where desync has happened, but it requires special spring build)
2. It's impossible to send partial state update as the root cause for desync remains unknown. Resync is doable, but requires full synchronized state transfer.
3. Every change to a synced variable causes hashing computation overhead. As synced variable may be modified several times during simulation then overhead stacks up.
What I'd like to discuss is an alternative approach to sync detection & resync. What if every class holding synced primitives had a checksum function as part of it? Another words - hashing/control sum mechanism is applied to synchronized class entity rather to individual variables. Hashes then could be "summed up" in order to represent a homogenous group of entities and finally combined into the "frame state hash" being the "sum" of aforementioned groups.
Code: Select all
UnitX.crc=hash of synced variables
activeUnits.crc=sum of UnitX.crc
....
same for map, projectiles, features and other entities that must be in sync.
....
frame.crc=sum of map.crc, activeUnits.crc, projectiles.crc, features.crc, etc...
1. It's possible to identify sync entity caused desync, after resync has been complete. It's precise down to an individual entity and it's also possible to dump to the log file both correct and erroneous state for such entity in order to narrow down to an individual sync variable.
2. It's possible to resynchronize desynced client using only a subset of global sync state. Upon receiving wrong frame.crc from a client, resynchronization protocol will first request CRCs for type groups (map.crc, activeUnits.crc, projectiles.crc, features.crc) and next it will discover what entity caused an issue exactly and finally request will request a correct copy.
3. CRC calculation could be made on demand, rather than on every change of individual sync variable.
However I'm not sure about few things:
1. Best way to introduce crc() function to sync entities.
1a. How to define the list of such entities?
1b. How to make crc() in D.R.Y. way?
2. Synced part of Lua seems to have its own state not necessary reflected to regular C++ synced entities.
2a. Can Lua stack be CRC'ed and uploaded/downloaded?
2b. Should mods/maps explicitly declare if they support resync capability (if not 2a)?
A bit incoherent and likely flawed, still I hope this could ignite a healthy discussion.