2025-07-22 13:53 CEST

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0003339Spring engineGeneralpublic2012-11-29 20:22
Reporterabma 
Assigned Toabma 
PrioritynormalSeveritycrashReproducibilityalways
StatusresolvedResolutionfixed 
Product Version91.0.1+git 
Target VersionFixed in Version 
Summary0003339: desync in QTPFS with validation client sharing cache when no cache exists
Description[f=0018960] Sync error for ValidationClient in frame 18645 (got 68a860d7, correct is 42180536)

http://buildbot.springrts.com/builders/validationtests/builds/1956/steps/validation%20test_2/logs/stdio
TagsNo tags attached.
Checked infolog.txt for Errors
Attached Files
  • gz file icon grep.log.gz (232,215 bytes) 2012-11-27 14:32
  • diff file icon QTPFS-DesyncFix.diff (797 bytes) 2012-11-29 17:43 -
    diff --git a/rts/Sim/Path/QTPFS/PathManager.cpp b/rts/Sim/Path/QTPFS/PathManager.cpp
    index 22fc268..da1faec 100644
    --- a/rts/Sim/Path/QTPFS/PathManager.cpp
    +++ b/rts/Sim/Path/QTPFS/PathManager.cpp
    @@ -188,6 +188,13 @@ void QTPFS::PathManager::Load() {
     		pfsCheckSum = mapCheckSum ^ modCheckSum;
     
     		for (unsigned int layerNum = 0; layerNum < nodeLayers.size(); layerNum++) {
    +			#ifndef QTPFS_CONSERVATIVE_NEIGHBOR_CACHE_UPDATES
    +			if (haveCacheDir) {
    +				// if cache-dir exists, must set node relations after de-serializing its trees
    +				nodeLayers[layerNum].ExecNodeNeighborCacheUpdates(MAP_RECTANGLE, numTerrainChanges);
    +			}
    +			#endif
    +
     			pfsCheckSum ^= nodeTrees[layerNum]->GetCheckSum();
     			maxNumLeafNodes = std::max(nodeLayers[layerNum].GetNumLeafNodes(), maxNumLeafNodes);
     		}
    
    diff file icon QTPFS-DesyncFix.diff (797 bytes) 2012-11-29 17:43 +

-Relationships
related to 0003071closed Desync, maybe QTPFS? 
+Relationships

-Notes

~0009368

abma (administrator)

valgrind run with RAI needed i guess...

~0009396

abma (administrator)

Last edited: 2012-11-27 14:34

View 3 revisions

ok, grepped through the output:

first desync i found was run 1556
the second is 1880

http://buildbot.springrts.com/builders/validationtests/builds/1556
https://github.com/spring/spring/commit/bf0cbf898b03219e8e9f8d730ab4eafae2710e3e

http://buildbot.springrts.com/builders/validationtests/builds/1880
https://github.com/spring/spring/commit/c14af2e31c3322fbed43e266ce934bd1a929b21f

it seems to not desync every time, so it doesn't have one of these commits... (just this or some commit before of it...)

~0009397

abma (administrator)

added the grep created with:
for i in $(ls |sort -h -r); do echo $i; ( if [ -n "$(echo $i | grep bz2)" ]; then bzcat $i; else cat $i; fi )| grep "Sync error"; done >/tmp/grep.lo

~0009398

abma (administrator)

hm, 1556 is pre 91.0... so that was the already fixed desync i guess.


since 1880 it seems every validation run produced a desync.

~0009399

abma (administrator)

Last edited: 2012-11-27 16:52

View 2 revisions

the demo files from this run:
http://buildbot.springrts.com/builders/validationtests/builds/1981/steps/validation%20test_2/logs/stdio

http://abma.de/tmp/20121127_155054_Altair_Crossing-V1_91.0.1-489-g3266337_develop.sdf
http://abma.de/tmp/20121127_155056_Altair_Crossing-V1_91.0.1-489-g3266337_develop.sdf

~0009400

abma (administrator)

Last edited: 2012-11-27 17:05

View 2 revisions

hmm, qtpfs is used...

thats one commit before 1556:
https://github.com/spring/spring/commit/64f869da82cae7a7b2d1c21cf462de7fff35f0ff

~0009401

abma (administrator)

ok, valgrinded 20121127_155054 .. no error found

~0009405

abma (administrator)

fixed one cause:
https://github.com/spring/spring/commit/7e3e5d05a8b1b3f73d4a792a6e713ce98c4746c9

seems there are still points left...

~0009406

abma (administrator)

possible desync causes:

std::vector & std::map needs custom compare

both is used in QTPFS

~0009407

abma (administrator)

Last edited: 2012-11-28 12:19

View 2 revisions

vectors / maps seems to be fine, but:

boost::threads pulls cmath in... that could be the cause...

QTPFS seems to be always multithreaded... possible cause, too

~0009409

abma (administrator)

to reproduce desync:

/cheat
/give 100 armflea
move units

desync

~0009410

abma (administrator)

Last edited: 2012-11-28 23:55

View 2 revisions

syncdebug leads to:

Server: #0 Assert<float> [/var/tmp/home/dev/spring/develop/rts/System/Sync/SyncedPrimitiveBase.h:48]
Server: #1 SyncedPrimitive<float>::Sync(char const*) [/var/tmp/home/dev/spring/develop/rts/System/Sync/SyncedPrimitive.h:45]
Server: #2 SyncedPrimitive<float>::SyncedPrimitive(float) [/var/tmp/home/dev/spring/develop/rts/System/Sync/SyncedPrimitive.h:86]
Server: 0000003 SyncedFloat3::SyncedFloat3(float3 const&) [/var/tmp/home/dev/spring/develop/rts/System/Sync/SyncedFloat3.h:43]
Server: 0000004 CGroundMoveType::GetNextWayPoint() [/var/tmp/home/dev/spring/develop/rts/Sim/MoveTypes/GroundMoveType.cpp:1323 (discriminator 1)]


https://github.com/spring/spring/blob/develop/rts/Sim/Path/QTPFS/PathManager.cpp#L924

~0009411

abma (administrator)

with the normal pathfinder 100 armfleas it doesn't desync, so... 99.9% QTPFS i would say...

~0009412

abma (administrator)

hmm:

cached & cached seems to sync (can't be 100% sure...)
cached & uncached seems to desync: http://buildbot.springrts.com/builders/validationtests/builds/1995/steps/validation%20test_2/logs/stdio
uncached & uncached seems to desync : http://buildbot.springrts.com/builders/validationtests/builds/1993/steps/validation%20test_2/logs/stdio

~0009413

abma (administrator)

Last edited: 2012-11-29 02:32

View 2 revisions

ouch... i think i got it:

cached true:
[f=0000000] [PathManager] pfs-checksum: 4998f8a1, mem-footprint: 125MB

vs

cached false:
[f=0000000] [PathManager] pfs-checksum: 4998f8a1, mem-footprint: 126MB

seems to mostly desync.

cached/cached seems to always sync (didn't see a single case in ~10 runs or so where it desyncs while other cached/uncached desynced in maybe 80%)


as uncached/uncached desyncs too, the cache generating code seems to be broken. sooo, the MT-code there is broken for sure :-)

~0009414

abma (administrator)

much more errors:

uncached:
[f=0000000] initialized node-layer 16 (6 MB, 1351 leafs, ratio 0.005154)
cached:
[f=0000000] initialized node-layer 16 (6 MB, 1 leafs, ratio 0.000004)

eieiei:

spring-headless: /home/buildbot/slave/full-linux/build/rts/Sim/Path/QTPFS/PathManager.cpp:528: void QTPFS::PathManager::Serialize(const string&): Assertion `nodeTrees[i]->IsLeaf()' failed.

http://buildbot.springrts.com/builders/validationtests/builds/2001/steps/validation%20test_1/logs/stdio

~0009415

Kloot (developer)

Last edited: 2012-11-29 17:45

View 2 revisions

sorry about this, attached patch should fix it.

(in PathDefines.hpp, better bump QTPFS_CACHE_VERSION to 5 as well)

~0009416

abma (administrator)

nothing to excuse, thanks for your patch(es)! :)


thanks, applied:

https://github.com/spring/spring/commit/fb72ada7c118ae4ee90e3123573a1b8db0f133ad


the desync seems to be fixed, an assertion still fails:

spring-headless: /home/buildbot/slave/full-linux/build/rts/Sim/Path/QTPFS/PathManager.cpp:535: void QTPFS::PathManager::Serialize(const string&): Assertion `nodeTrees[i]->IsLeaf()' failed.

~0009417

Kloot (developer)

can't help much there (I assume it triggers because the validation client starts before the server has fully finished generating/writing the cache files)

~0009418

abma (administrator)

yep, i think/thought thats the problem, too...
+Notes

-Issue History
Date Modified Username Field Change
2012-11-23 09:50 abma New Issue
2012-11-23 09:51 abma Product Version => 91.0.1+git
2012-11-23 09:52 abma Note Added: 0009368
2012-11-27 14:24 abma Note Added: 0009396
2012-11-27 14:24 abma Note Edited: 0009396 View Revisions
2012-11-27 14:32 abma File Added: grep.log.gz
2012-11-27 14:32 abma Note Added: 0009397
2012-11-27 14:34 abma Note Edited: 0009396 View Revisions
2012-11-27 15:14 abma Note Added: 0009398
2012-11-27 16:51 abma Note Added: 0009399
2012-11-27 16:52 abma Note Edited: 0009399 View Revisions
2012-11-27 16:53 abma Note Added: 0009400
2012-11-27 17:05 abma Note Edited: 0009400 View Revisions
2012-11-27 18:13 abma Note Added: 0009401
2012-11-27 22:49 abma Summary desync in validation client => desync in QTPFS
2012-11-27 23:06 abma Relationship added related to 0003071
2012-11-27 23:06 abma Note Added: 0009405
2012-11-27 23:11 abma Note Added: 0009406
2012-11-28 00:04 abma Note Added: 0009407
2012-11-28 12:19 abma Note Edited: 0009407 View Revisions
2012-11-28 23:44 abma Note Added: 0009409
2012-11-28 23:49 abma Note Added: 0009410
2012-11-28 23:52 abma Note Added: 0009411
2012-11-28 23:55 abma Note Edited: 0009410 View Revisions
2012-11-29 02:12 abma Note Added: 0009412
2012-11-29 02:31 abma Note Added: 0009413
2012-11-29 02:32 abma Note Edited: 0009413 View Revisions
2012-11-29 02:33 abma Summary desync in QTPFS => desync in QTPFS with validation client sharing cache when no cache exists
2012-11-29 04:28 abma Note Added: 0009414
2012-11-29 17:43 Kloot File Added: QTPFS-DesyncFix.diff
2012-11-29 17:44 Kloot Note Added: 0009415
2012-11-29 17:45 Kloot Note Edited: 0009415 View Revisions
2012-11-29 17:59 abma Note Added: 0009416
2012-11-29 20:19 Kloot Note Added: 0009417
2012-11-29 20:22 abma Note Added: 0009418
2012-11-29 20:22 abma Status new => resolved
2012-11-29 20:22 abma Resolution open => fixed
2012-11-29 20:22 abma Assigned To => abma
+Issue History