The responsible function is the following and has another issue:
Code: Select all
static inline lua_Hash calchash(const char *str, size_t l) {
lua_Hash h = cast(unsigned int, l); /* seed */
size_t step = (l>>5)+1; /* if string is too long, don't hash all its chars */
size_t l1;
for (l1=l; l1>=step; l1-=step) { /* compute hash */
h = h ^ ((h<<5)+(h>>2)+cast(unsigned char, str[l1-1]));
}
return h;
}
I also added a time profiler and didn't noticed any massive time spend in it, so it seems to be a border case that lua wants to optimize here with their strange "only use every x'th char for hash".
@1. no optionabma wrote:1. don't use pairs
2. only use tables/pairs() with strings < 512 chars or numbers as keys (?)
3. change engine: make pairs() deterministic with string as keys
4. do nothing until you have desyncs
@3. not easy and likely no one wants to investigate the error in deep
@4. it's not that this desync happened often so far, so the question is if it really is worth it to spend x+ hours on fixing lua itself
@2. I can easily add an error for the 64 chars limit, but this will delude the user that <64char strings would be safe, but hash collisions can happen there too.
My plausible `solutions` would be look like this:
- Switch lua hash function to HsieHash that is used elsewhere in the engine
- Write an UnitTest that checks how many hash collisions HsieHash has in <256chars range (with human readable chars of ASCII range, unicode would be overkill)
- Based on the results of that UT, add an error when using pairs() for strings >Nchars