View Issue Details

IDProjectCategoryView StatusLast Update
0003339Spring engineGeneralpublic2012-11-29 20:22
Reporterabma Assigned Toabma  
PrioritynormalSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
Product Version91.0.1+git 
Summary0003339: desync in QTPFS with validation client sharing cache when no cache exists
Description[f=0018960] Sync error for ValidationClient in frame 18645 (got 68a860d7, correct is 42180536)

http://buildbot.springrts.com/builders/validationtests/builds/1956/steps/validation%20test_2/logs/stdio
TagsNo tags attached.
Attached Files
grep.log.gz (Attachment missing)
QTPFS-DesyncFix.diff (Attachment missing)
Checked infolog.txt for Errors

Relationships

related to 0003071 closed Desync, maybe QTPFS? 

Activities

abma

2012-11-23 09:52

administrator   ~0009368

valgrind run with RAI needed i guess...

abma

2012-11-27 14:24

administrator   ~0009396

Last edited: 2012-11-27 14:34

ok, grepped through the output:

first desync i found was run 1556
the second is 1880

http://buildbot.springrts.com/builders/validationtests/builds/1556
https://github.com/spring/spring/commit/bf0cbf898b03219e8e9f8d730ab4eafae2710e3e

http://buildbot.springrts.com/builders/validationtests/builds/1880
https://github.com/spring/spring/commit/c14af2e31c3322fbed43e266ce934bd1a929b21f

it seems to not desync every time, so it doesn't have one of these commits... (just this or some commit before of it...)

abma

2012-11-27 14:32

administrator   ~0009397

added the grep created with:
for i in $(ls |sort -h -r); do echo $i; ( if [ -n "$(echo $i | grep bz2)" ]; then bzcat $i; else cat $i; fi )| grep "Sync error"; done >/tmp/grep.lo

abma

2012-11-27 15:14

administrator   ~0009398

hm, 1556 is pre 91.0... so that was the already fixed desync i guess.


since 1880 it seems every validation run produced a desync.

abma

2012-11-27 16:51

administrator   ~0009399

Last edited: 2012-11-27 16:52

the demo files from this run:
http://buildbot.springrts.com/builders/validationtests/builds/1981/steps/validation%20test_2/logs/stdio

http://abma.de/tmp/20121127_155054_Altair_Crossing-V1_91.0.1-489-g3266337_develop.sdf
http://abma.de/tmp/20121127_155056_Altair_Crossing-V1_91.0.1-489-g3266337_develop.sdf

abma

2012-11-27 16:53

administrator   ~0009400

Last edited: 2012-11-27 17:05

hmm, qtpfs is used...

thats one commit before 1556:
https://github.com/spring/spring/commit/64f869da82cae7a7b2d1c21cf462de7fff35f0ff

abma

2012-11-27 18:13

administrator   ~0009401

ok, valgrinded 20121127_155054 .. no error found

abma

2012-11-27 23:06

administrator   ~0009405

fixed one cause:
https://github.com/spring/spring/commit/7e3e5d05a8b1b3f73d4a792a6e713ce98c4746c9

seems there are still points left...

abma

2012-11-27 23:11

administrator   ~0009406

possible desync causes:

std::vector & std::map needs custom compare

both is used in QTPFS

abma

2012-11-28 00:04

administrator   ~0009407

Last edited: 2012-11-28 12:19

vectors / maps seems to be fine, but:

boost::threads pulls cmath in... that could be the cause...

QTPFS seems to be always multithreaded... possible cause, too

abma

2012-11-28 23:44

administrator   ~0009409

to reproduce desync:

/cheat
/give 100 armflea
move units

desync

abma

2012-11-28 23:49

administrator   ~0009410

Last edited: 2012-11-28 23:55

syncdebug leads to:

Server: #0 Assert<float> [/var/tmp/home/dev/spring/develop/rts/System/Sync/SyncedPrimitiveBase.h:48]
Server: #1 SyncedPrimitive<float>::Sync(char const*) [/var/tmp/home/dev/spring/develop/rts/System/Sync/SyncedPrimitive.h:45]
Server: #2 SyncedPrimitive<float>::SyncedPrimitive(float) [/var/tmp/home/dev/spring/develop/rts/System/Sync/SyncedPrimitive.h:86]
Server: 0000003 SyncedFloat3::SyncedFloat3(float3 const&) [/var/tmp/home/dev/spring/develop/rts/System/Sync/SyncedFloat3.h:43]
Server: 0000004 CGroundMoveType::GetNextWayPoint() [/var/tmp/home/dev/spring/develop/rts/Sim/MoveTypes/GroundMoveType.cpp:1323 (discriminator 1)]


https://github.com/spring/spring/blob/develop/rts/Sim/Path/QTPFS/PathManager.cpp#L924

abma

2012-11-28 23:52

administrator   ~0009411

with the normal pathfinder 100 armfleas it doesn't desync, so... 99.9% QTPFS i would say...

abma

2012-11-29 02:12

administrator   ~0009412

hmm:

cached & cached seems to sync (can't be 100% sure...)
cached & uncached seems to desync: http://buildbot.springrts.com/builders/validationtests/builds/1995/steps/validation%20test_2/logs/stdio
uncached & uncached seems to desync : http://buildbot.springrts.com/builders/validationtests/builds/1993/steps/validation%20test_2/logs/stdio

abma

2012-11-29 02:31

administrator   ~0009413

Last edited: 2012-11-29 02:32

ouch... i think i got it:

cached true:
[f=0000000] [PathManager] pfs-checksum: 4998f8a1, mem-footprint: 125MB

vs

cached false:
[f=0000000] [PathManager] pfs-checksum: 4998f8a1, mem-footprint: 126MB

seems to mostly desync.

cached/cached seems to always sync (didn't see a single case in ~10 runs or so where it desyncs while other cached/uncached desynced in maybe 80%)


as uncached/uncached desyncs too, the cache generating code seems to be broken. sooo, the MT-code there is broken for sure :-)

abma

2012-11-29 04:28

administrator   ~0009414

much more errors:

uncached:
[f=0000000] initialized node-layer 16 (6 MB, 1351 leafs, ratio 0.005154)
cached:
[f=0000000] initialized node-layer 16 (6 MB, 1 leafs, ratio 0.000004)

eieiei:

spring-headless: /home/buildbot/slave/full-linux/build/rts/Sim/Path/QTPFS/PathManager.cpp:528: void QTPFS::PathManager::Serialize(const string&): Assertion `nodeTrees[i]->IsLeaf()' failed.

http://buildbot.springrts.com/builders/validationtests/builds/2001/steps/validation%20test_1/logs/stdio

Kloot

2012-11-29 17:44

developer   ~0009415

Last edited: 2012-11-29 17:45

sorry about this, attached patch should fix it.

(in PathDefines.hpp, better bump QTPFS_CACHE_VERSION to 5 as well)

abma

2012-11-29 17:59

administrator   ~0009416

nothing to excuse, thanks for your patch(es)! :)


thanks, applied:

https://github.com/spring/spring/commit/fb72ada7c118ae4ee90e3123573a1b8db0f133ad


the desync seems to be fixed, an assertion still fails:

spring-headless: /home/buildbot/slave/full-linux/build/rts/Sim/Path/QTPFS/PathManager.cpp:535: void QTPFS::PathManager::Serialize(const string&): Assertion `nodeTrees[i]->IsLeaf()' failed.

Kloot

2012-11-29 20:19

developer   ~0009417

can't help much there (I assume it triggers because the validation client starts before the server has fully finished generating/writing the cache files)

abma

2012-11-29 20:22

administrator   ~0009418

yep, i think/thought thats the problem, too...

Issue History

Date Modified Username Field Change
2012-11-23 09:50 abma New Issue
2012-11-23 09:51 abma Product Version => 91.0.1+git
2012-11-23 09:52 abma Note Added: 0009368
2012-11-27 14:24 abma Note Added: 0009396
2012-11-27 14:24 abma Note Edited: 0009396
2012-11-27 14:32 abma File Added: grep.log.gz
2012-11-27 14:32 abma Note Added: 0009397
2012-11-27 14:34 abma Note Edited: 0009396
2012-11-27 15:14 abma Note Added: 0009398
2012-11-27 16:51 abma Note Added: 0009399
2012-11-27 16:52 abma Note Edited: 0009399
2012-11-27 16:53 abma Note Added: 0009400
2012-11-27 17:05 abma Note Edited: 0009400
2012-11-27 18:13 abma Note Added: 0009401
2012-11-27 22:49 abma Summary desync in validation client => desync in QTPFS
2012-11-27 23:06 abma Relationship added related to 0003071
2012-11-27 23:06 abma Note Added: 0009405
2012-11-27 23:11 abma Note Added: 0009406
2012-11-28 00:04 abma Note Added: 0009407
2012-11-28 12:19 abma Note Edited: 0009407
2012-11-28 23:44 abma Note Added: 0009409
2012-11-28 23:49 abma Note Added: 0009410
2012-11-28 23:52 abma Note Added: 0009411
2012-11-28 23:55 abma Note Edited: 0009410
2012-11-29 02:12 abma Note Added: 0009412
2012-11-29 02:31 abma Note Added: 0009413
2012-11-29 02:32 abma Note Edited: 0009413
2012-11-29 02:33 abma Summary desync in QTPFS => desync in QTPFS with validation client sharing cache when no cache exists
2012-11-29 04:28 abma Note Added: 0009414
2012-11-29 17:43 Kloot File Added: QTPFS-DesyncFix.diff
2012-11-29 17:44 Kloot Note Added: 0009415
2012-11-29 17:45 Kloot Note Edited: 0009415
2012-11-29 17:59 abma Note Added: 0009416
2012-11-29 20:19 Kloot Note Added: 0009417
2012-11-29 20:22 abma Note Added: 0009418
2012-11-29 20:22 abma Status new => resolved
2012-11-29 20:22 abma Resolution open => fixed
2012-11-29 20:22 abma Assigned To => abma