Page 1 of 2
Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 10:40
by abma
update: pairs with (long) strings seems sync safe, the cause for the desync was a conversion of numbers into strings! (sprintf)
(again) a desync was found which was caused by the usage of pairs() to iterate over a table in a synced script:
https://springrts.com/mantis/view.php?id=3436
where should a warning about this placed in the wiki?
its not that pairs always cause desync, just in many cases in synced code: ideally don't use it at all in synced code!
There exists some warning:
https://springrts.com/wiki/Debugging_sy ... rs#Running but very likely no gamedev will see this ever.
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 11:04
by hokomoko
Random Idea: It may be possible to add a warning about hash collision in the lua code which is only active compiling with the debug_lua flag (like the signans) and in synced.
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 11:17
by Silentwings
To make clear - this only occurs if the table has functions or suchlike as its key values. Iteration with "normal" data types as keys is perfectly safe. Avoiding pairs() in synced code is not a good way to solve this imo, its too useful.
I actually did check that page, the one time I ever caused a desync (as it turned out, the cause different). I would put it as a question Q11 "Is there anything in 'normal' lua that I cannot do in Spring?" on
https://springrts.com/wiki/Lua_Beginners_FAQ and mention also that sometimes parts the standard lua libs (io, os, etc) are removed for safety.
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 11:31
by gajop
I think it's right where it should be: I would search for it only when I started getting sync errors.
Pairs are usually fine as Silentwings said, having tables/functions/userdata as keys is usually a bug or sign of bad design imo.
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 11:33
by abma
Silentwings wrote:To make clear - this only occurs if the table has functions or suchlike as its key values. Iteration with "normal" data types as keys is perfectly safe. Avoiding pairs() in synced code is not a good way to solve this imo, its too useful.
i'm not sure about this as i didn't test any further, but in this case it seems that iterating over a table with strings as keys seems to cause a desync as the order isn't the same on all computers.
afaik the problem is, that the order of a table is unspecific:
http://stackoverflow.com/questions/3097 ... ts-written and results to different order on different platforms. but maybe jk / hokomoko can explain better what is the cause, i don't want to investigate this any further.
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 12:07
by jK
Silentwings wrote:Iteration with "normal" data types as keys is perfectly safe. Avoiding pairs() in synced code is not a good way to solve this imo, its too useful.
1. Spring already prints warnings for the 100% sure unsyncing types (tables, userdata, ...)
2. the code that causes the desync described in mantis ticket uses _strings_ as keys (and those were thought to be sync-safe till now)
3. cause the desync is limited to a few maps, it has to be a problem of the set of strings
4. so it seems that hash collisions of strings can cause desyncs, too
5. so only numbers as keys can be assumed as 100% safe
6. leaving even less safe use cases of pairs() in synced code
PS: no idea how much performance it would cost to add these hash collision cases in the current synced pairs() warning code. Problem is that there is no exposed function to get the hash of a lua object (the hash is kept for purely internal usage).
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 12:27
by Kloot
I assume the problem here is related to use of size_t in the LVM (lstring/ltable hash calculation or collision handling) since all Windows builds are 32-bit and the desyncs only occurred across platforms with x64 linux clients involved.
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 12:49
by gajop
If it really isn't working correctly with strings then it should be replaced with a custom implementation. Strings are probably the most common associative mapping type. The point is that the pairs() function shouldn't be made obsolete regardless.
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 13:38
by Kloot
jK's deleted post wrote:Kloot wrote:...
Oh, you have eagle eyes!
Do you have something to say to me, Dear Leader...?
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 13:58
by Silentwings
Afaik strings as keys are "normally" fine, I think I would personally have caused about 10e10 desyncs over the years if they weren't. If it's hash collisions... ugh. Have to agree with gajop, if strings are not safe then if at all possible they should be made safe, doing without pairs() for strings is crippling.
Re: Desync with table iterating using pairs() in synced code
Posted: 28 Sep 2015, 14:02
by hokomoko
Kloot wrote:I assume the problem here is related to use of size_t in the LVM (lstring/ltable hash calculation or collision handling) since all Windows builds are 32-bit and the desyncs only occurred across platforms with x64 linux clients involved.
https://github.com/spring/spring/blob/d ... ng.cpp#L76
The size_t use here doesn't seem to be able to affect the result.
EDIT:
I think that when tables are enlarged the new sizes are different depending on the OS (because the struct memory sizes may be different), and that's what may cause collisions to happen differently.
Re: Desync with table iterating using pairs() in synced code
Posted: 03 Oct 2015, 03:38
by Google_Frog
I think the desync is contained to cases with very long strings as keys. Otherwise we would have noticed it a lot earlier. Perhaps this string length could be determined and disallowed, or at least prominently warned against on the wiki (perhaps a page on desync causes for lua?).
For example this is a key for the example we have of a table which desyncs:
[code]14964888300011432244680303211512152848561544492015604920157649361592496816084856162448721640485610447121656487216724904168848881704490417204888173648721752485617684760178447761800474418164696183247121848474418644728188047281896472812046801912472819284712194447121960474419764760199247765044792202448082040477652047765364792136474455247765684760584477660047601524744616476063248246484808664485616847446804824696482471248407284840184474430161149629841143274448564244760295211432296477676048722920113364084856288849847764872280476028564952344480879248402004712282449361512485680848402792493627764952276049528244856360484027284968456474484047442696490439248242664495285647442164728328476026324904872476026447282600480812724856888474424724808125648882440477690447445647122408477644047769204760232471223764808376480893648562344484025684792231248569524840124049042280484022644808968487222484840472476022164792984487224847124047122184490410004888200847762152490420564760101648562120480820884808103248242072480810484808210448241064482421364856108048082168488810964808220048241112484022324808112848247247121144479222964840116047762328485611764776236048401192482423924824120848722424479212244872245647923124760248847922504474425204744253647442552476012884856258448241304484026164792132048722648493613364872268049041352484027124968136848402744495213844904884696140048402808492014164808284049201432485628724968144848402904506414644824293611384148048402968114484884760 14961160830001159224116563032115281528115761544115601560115761576115921592115921608115441624115281640115601041165616561156016721152816881152817041152817201152817361146417521140017681152817841149618001151218161152818321151218481160818641156018801156018961149612011672191211560192811576194411608196011624197611624199211672504116242024116242040116245201162453611592136116085521159256811608584115766001159215211640616116086321160864811608664115921681162468011592696116087121157672811496184116243016116082984115127441148042411640295211544296116247601149629201152840811592288811528776115442801160828561152834411560792115602001157628241151215121157680811528279211576277611560276011544824115763601160827281156045611624840115602696116243921154426641162485611592216115923281157626321160887211560264116242600115921272114648881152824721159212561154424401159290411560561165624081162444011640920115922321160823761170437611592936115922344117042568115762312115129521156012401156022801146422641154496811592224811560472116402216116249841157624811608401167221841159210001156020081168821521160820561157610161154421201159220881157610321152820721156010481144821041159210641151221361159210801149621681157610961151222001159211121152822321162411281152872116721144115442296115121160115442328115281176115602360116881192115762392116561208115922424116081224115602456116083121154424881160825041154425201157625361159225521157612881157625841167213041162426161159213201167226481160813361168826801165613521175227121152813681176827441160813841178488116561400117682808114961416117522840503214321167228721151214481168829041152814641160829361154414801157629681154448811656 24 3032 7.6996262499926e+020[/code]
It is over 3000 characters long.
Re: Desync with table iterating using pairs() in synced code
Posted: 03 Oct 2015, 08:46
by hokomoko
EDIT: see jKs post
Re: Desync with table iterating using pairs() in synced code
Posted: 03 Oct 2015, 11:18
by jK
hokomoko wrote:because only the start is used for calculating the hash
not the start, it will max use 512 chars for the hash, and when a string is longer it will just use each n'th char.
Re: Desync with table iterating using pairs() in synced code
Posted: 06 Oct 2015, 08:37
by smoth
isn't pairs syntactical sugar for traversing a table?
Re: Desync with table iterating using pairs() in synced code
Posted: 06 Oct 2015, 09:45
by hokomoko
smoth wrote:isn't pairs syntactical sugar for traversing a table?
It doesn't matter, the order of items in the table may change between machines.
Re: Desync with table iterating using pairs() in synced code
Posted: 06 Oct 2015, 10:18
by Silentwings
What are the possible solutions to this? It looks like a nasty problem to me.
Re: Desync with table iterating using pairs() in synced code
Posted: 06 Oct 2015, 10:57
by abma
Silentwings wrote:What are the possible solutions to this? It looks like a nasty problem to me.
1. don't use pairs
2. only use tables/pairs() with strings < 512 chars or numbers as keys (?)
3. change engine: make pairs() deterministic with string as keys
4. do nothing until you have desyncs
Re: Desync with table iterating using pairs() in synced code
Posted: 06 Oct 2015, 11:20
by gajop
3 and 2 with regards to small keys
Whats the alternative for iterating a table really? Synced pairs or sth ? Never used that
Re: Desync with table iterating using pairs() in synced code
Posted: 06 Oct 2015, 11:58
by Silentwings
I'm with 2, maybe a warning can be printed for crazy long strings. I can't judge 3, obviously 1 and 4 are bad.