Desync with table iterating using pairs() in synced code
Moderator: Moderators
Desync with table iterating using pairs() in synced code
update: pairs with (long) strings seems sync safe, the cause for the desync was a conversion of numbers into strings! (sprintf)
(again) a desync was found which was caused by the usage of pairs() to iterate over a table in a synced script: https://springrts.com/mantis/view.php?id=3436
where should a warning about this placed in the wiki?
its not that pairs always cause desync, just in many cases in synced code: ideally don't use it at all in synced code!
There exists some warning: https://springrts.com/wiki/Debugging_sy ... rs#Running but very likely no gamedev will see this ever.
(again) a desync was found which was caused by the usage of pairs() to iterate over a table in a synced script: https://springrts.com/mantis/view.php?id=3436
where should a warning about this placed in the wiki?
its not that pairs always cause desync, just in many cases in synced code: ideally don't use it at all in synced code!
There exists some warning: https://springrts.com/wiki/Debugging_sy ... rs#Running but very likely no gamedev will see this ever.
Last edited by abma on 08 Oct 2015, 22:30, edited 1 time in total.
Reason: added note about the cause of the desync
Reason: added note about the cause of the desync
Re: Desync with table iterating using pairs() in synced code
Random Idea: It may be possible to add a warning about hash collision in the lua code which is only active compiling with the debug_lua flag (like the signans) and in synced.
- Silentwings
- Posts: 3720
- Joined: 25 Oct 2008, 00:23
Re: Desync with table iterating using pairs() in synced code
To make clear - this only occurs if the table has functions or suchlike as its key values. Iteration with "normal" data types as keys is perfectly safe. Avoiding pairs() in synced code is not a good way to solve this imo, its too useful.
I actually did check that page, the one time I ever caused a desync (as it turned out, the cause different). I would put it as a question Q11 "Is there anything in 'normal' lua that I cannot do in Spring?" on https://springrts.com/wiki/Lua_Beginners_FAQ and mention also that sometimes parts the standard lua libs (io, os, etc) are removed for safety.
I actually did check that page, the one time I ever caused a desync (as it turned out, the cause different). I would put it as a question Q11 "Is there anything in 'normal' lua that I cannot do in Spring?" on https://springrts.com/wiki/Lua_Beginners_FAQ and mention also that sometimes parts the standard lua libs (io, os, etc) are removed for safety.
Re: Desync with table iterating using pairs() in synced code
I think it's right where it should be: I would search for it only when I started getting sync errors.
Pairs are usually fine as Silentwings said, having tables/functions/userdata as keys is usually a bug or sign of bad design imo.
Pairs are usually fine as Silentwings said, having tables/functions/userdata as keys is usually a bug or sign of bad design imo.
Re: Desync with table iterating using pairs() in synced code
i'm not sure about this as i didn't test any further, but in this case it seems that iterating over a table with strings as keys seems to cause a desync as the order isn't the same on all computers.Silentwings wrote:To make clear - this only occurs if the table has functions or suchlike as its key values. Iteration with "normal" data types as keys is perfectly safe. Avoiding pairs() in synced code is not a good way to solve this imo, its too useful.
afaik the problem is, that the order of a table is unspecific: http://stackoverflow.com/questions/3097 ... ts-written and results to different order on different platforms. but maybe jk / hokomoko can explain better what is the cause, i don't want to investigate this any further.
Re: Desync with table iterating using pairs() in synced code
1. Spring already prints warnings for the 100% sure unsyncing types (tables, userdata, ...)Silentwings wrote:Iteration with "normal" data types as keys is perfectly safe. Avoiding pairs() in synced code is not a good way to solve this imo, its too useful.
2. the code that causes the desync described in mantis ticket uses _strings_ as keys (and those were thought to be sync-safe till now)
3. cause the desync is limited to a few maps, it has to be a problem of the set of strings
4. so it seems that hash collisions of strings can cause desyncs, too
5. so only numbers as keys can be assumed as 100% safe
6. leaving even less safe use cases of pairs() in synced code
PS: no idea how much performance it would cost to add these hash collision cases in the current synced pairs() warning code. Problem is that there is no exposed function to get the hash of a lua object (the hash is kept for purely internal usage).
Re: Desync with table iterating using pairs() in synced code
I assume the problem here is related to use of size_t in the LVM (lstring/ltable hash calculation or collision handling) since all Windows builds are 32-bit and the desyncs only occurred across platforms with x64 linux clients involved.
Re: Desync with table iterating using pairs() in synced code
If it really isn't working correctly with strings then it should be replaced with a custom implementation. Strings are probably the most common associative mapping type. The point is that the pairs() function shouldn't be made obsolete regardless.
Re: Desync with table iterating using pairs() in synced code
Do you have something to say to me, Dear Leader...?jK's deleted post wrote:Oh, you have eagle eyes!Kloot wrote:...
- Silentwings
- Posts: 3720
- Joined: 25 Oct 2008, 00:23
Re: Desync with table iterating using pairs() in synced code
Afaik strings as keys are "normally" fine, I think I would personally have caused about 10e10 desyncs over the years if they weren't. If it's hash collisions... ugh. Have to agree with gajop, if strings are not safe then if at all possible they should be made safe, doing without pairs() for strings is crippling.
Re: Desync with table iterating using pairs() in synced code
https://github.com/spring/spring/blob/d ... ng.cpp#L76Kloot wrote:I assume the problem here is related to use of size_t in the LVM (lstring/ltable hash calculation or collision handling) since all Windows builds are 32-bit and the desyncs only occurred across platforms with x64 linux clients involved.
The size_t use here doesn't seem to be able to affect the result.
EDIT:
I think that when tables are enlarged the new sizes are different depending on the OS (because the struct memory sizes may be different), and that's what may cause collisions to happen differently.
-
- Moderator
- Posts: 2464
- Joined: 12 Oct 2007, 09:24
Re: Desync with table iterating using pairs() in synced code
I think the desync is contained to cases with very long strings as keys. Otherwise we would have noticed it a lot earlier. Perhaps this string length could be determined and disallowed, or at least prominently warned against on the wiki (perhaps a page on desync causes for lua?).
For example this is a key for the example we have of a table which desyncs:
[code]14964888300011432244680303211512152848561544492015604920157649361592496816084856162448721640485610447121656487216724904168848881704490417204888173648721752485617684760178447761800474418164696183247121848474418644728188047281896472812046801912472819284712194447121960474419764760199247765044792202448082040477652047765364792136474455247765684760584477660047601524744616476063248246484808664485616847446804824696482471248407284840184474430161149629841143274448564244760295211432296477676048722920113364084856288849847764872280476028564952344480879248402004712282449361512485680848402792493627764952276049528244856360484027284968456474484047442696490439248242664495285647442164728328476026324904872476026447282600480812724856888474424724808125648882440477690447445647122408477644047769204760232471223764808376480893648562344484025684792231248569524840124049042280484022644808968487222484840472476022164792984487224847124047122184490410004888200847762152490420564760101648562120480820884808103248242072480810484808210448241064482421364856108048082168488810964808220048241112484022324808112848247247121144479222964840116047762328485611764776236048401192482423924824120848722424479212244872245647923124760248847922504474425204744253647442552476012884856258448241304484026164792132048722648493613364872268049041352484027124968136848402744495213844904884696140048402808492014164808284049201432485628724968144848402904506414644824293611384148048402968114484884760 14961160830001159224116563032115281528115761544115601560115761576115921592115921608115441624115281640115601041165616561156016721152816881152817041152817201152817361146417521140017681152817841149618001151218161152818321151218481160818641156018801156018961149612011672191211560192811576194411608196011624197611624199211672504116242024116242040116245201162453611592136116085521159256811608584115766001159215211640616116086321160864811608664115921681162468011592696116087121157672811496184116243016116082984115127441148042411640295211544296116247601149629201152840811592288811528776115442801160828561152834411560792115602001157628241151215121157680811528279211576277611560276011544824115763601160827281156045611624840115602696116243921154426641162485611592216115923281157626321160887211560264116242600115921272114648881152824721159212561154424401159290411560561165624081162444011640920115922321160823761170437611592936115922344117042568115762312115129521156012401156022801146422641154496811592224811560472116402216116249841157624811608401167221841159210001156020081168821521160820561157610161154421201159220881157610321152820721156010481144821041159210641151221361159210801149621681157610961151222001159211121152822321162411281152872116721144115442296115121160115442328115281176115602360116881192115762392116561208115922424116081224115602456116083121154424881160825041154425201157625361159225521157612881157625841167213041162426161159213201167226481160813361168826801165613521175227121152813681176827441160813841178488116561400117682808114961416117522840503214321167228721151214481168829041152814641160829361154414801157629681154448811656 24 3032 7.6996262499926e+020[/code]
It is over 3000 characters long.
For example this is a key for the example we have of a table which desyncs:
[code]14964888300011432244680303211512152848561544492015604920157649361592496816084856162448721640485610447121656487216724904168848881704490417204888173648721752485617684760178447761800474418164696183247121848474418644728188047281896472812046801912472819284712194447121960474419764760199247765044792202448082040477652047765364792136474455247765684760584477660047601524744616476063248246484808664485616847446804824696482471248407284840184474430161149629841143274448564244760295211432296477676048722920113364084856288849847764872280476028564952344480879248402004712282449361512485680848402792493627764952276049528244856360484027284968456474484047442696490439248242664495285647442164728328476026324904872476026447282600480812724856888474424724808125648882440477690447445647122408477644047769204760232471223764808376480893648562344484025684792231248569524840124049042280484022644808968487222484840472476022164792984487224847124047122184490410004888200847762152490420564760101648562120480820884808103248242072480810484808210448241064482421364856108048082168488810964808220048241112484022324808112848247247121144479222964840116047762328485611764776236048401192482423924824120848722424479212244872245647923124760248847922504474425204744253647442552476012884856258448241304484026164792132048722648493613364872268049041352484027124968136848402744495213844904884696140048402808492014164808284049201432485628724968144848402904506414644824293611384148048402968114484884760 14961160830001159224116563032115281528115761544115601560115761576115921592115921608115441624115281640115601041165616561156016721152816881152817041152817201152817361146417521140017681152817841149618001151218161152818321151218481160818641156018801156018961149612011672191211560192811576194411608196011624197611624199211672504116242024116242040116245201162453611592136116085521159256811608584115766001159215211640616116086321160864811608664115921681162468011592696116087121157672811496184116243016116082984115127441148042411640295211544296116247601149629201152840811592288811528776115442801160828561152834411560792115602001157628241151215121157680811528279211576277611560276011544824115763601160827281156045611624840115602696116243921154426641162485611592216115923281157626321160887211560264116242600115921272114648881152824721159212561154424401159290411560561165624081162444011640920115922321160823761170437611592936115922344117042568115762312115129521156012401156022801146422641154496811592224811560472116402216116249841157624811608401167221841159210001156020081168821521160820561157610161154421201159220881157610321152820721156010481144821041159210641151221361159210801149621681157610961151222001159211121152822321162411281152872116721144115442296115121160115442328115281176115602360116881192115762392116561208115922424116081224115602456116083121154424881160825041154425201157625361159225521157612881157625841167213041162426161159213201167226481160813361168826801165613521175227121152813681176827441160813841178488116561400117682808114961416117522840503214321167228721151214481168829041152814641160829361154414801157629681154448811656 24 3032 7.6996262499926e+020[/code]
It is over 3000 characters long.
Re: Desync with table iterating using pairs() in synced code
EDIT: see jKs post
Last edited by hokomoko on 03 Oct 2015, 11:36, edited 1 time in total.
Re: Desync with table iterating using pairs() in synced code
not the start, it will max use 512 chars for the hash, and when a string is longer it will just use each n'th char.hokomoko wrote:because only the start is used for calculating the hash
Re: Desync with table iterating using pairs() in synced code
isn't pairs syntactical sugar for traversing a table?
Re: Desync with table iterating using pairs() in synced code
It doesn't matter, the order of items in the table may change between machines.smoth wrote:isn't pairs syntactical sugar for traversing a table?
- Silentwings
- Posts: 3720
- Joined: 25 Oct 2008, 00:23
Re: Desync with table iterating using pairs() in synced code
What are the possible solutions to this? It looks like a nasty problem to me.
Re: Desync with table iterating using pairs() in synced code
1. don't use pairsSilentwings wrote:What are the possible solutions to this? It looks like a nasty problem to me.
2. only use tables/pairs() with strings < 512 chars or numbers as keys (?)
3. change engine: make pairs() deterministic with string as keys
4. do nothing until you have desyncs
Re: Desync with table iterating using pairs() in synced code
3 and 2 with regards to small keys
Whats the alternative for iterating a table really? Synced pairs or sth ? Never used that
Whats the alternative for iterating a table really? Synced pairs or sth ? Never used that
- Silentwings
- Posts: 3720
- Joined: 25 Oct 2008, 00:23
Re: Desync with table iterating using pairs() in synced code
I'm with 2, maybe a warning can be printed for crazy long strings. I can't judge 3, obviously 1 and 4 are bad.