2019-12-10 10:12 CET

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0000916Spring engineGeneralpublic2008-05-09 01:42
ReporterKDR_11k 
Assigned ToKloot 
PrioritynormalSeveritymajorReproducibilitysometimes
StatusresolvedResolutionfixed 
Product Version 
Target VersionFixed in Version 
Summary0000916: [SVN 5828] Desync during combat with inaccurate weapons
DescriptionUsing THIS WIP3 ( http://kdr_11k.from-hell.net/THIS%20WIP3.zip ) we've had desyncs happen several times, they seem to be caused by the unit Dagger in combat. The Dagger uses a highly inaccurate weapon and most of its shots miss. The desyncs begin when a unit dies in such a battle. Seems to be possible to reproduce by sending groups of several daggers against each other. A similar battle using the Mace unit (has perfectly accurate beamlasers) doesn't seem to cause a desync.
Additional InformationI'd upload a replay but all the replays I have crash the moment the game would start.

This also seems to be capable of causing the /cheat crash.
TagsNo tags attached.
Checked infolog.txt for Errors
Attached Files

-Relationships
+Relationships

-Notes

~0002224

imbaczek (reporter)

upload them anyway, so the crash can be fixed.

btw you have to upload two replays that weren't in sync to find something meaningful.

~0002225

KDR_11k (reporter)

Uploaded three replays, I think all of them desynced but I can't check as they all crash.

~0002226

imbaczek (reporter)

Last edited: 2008-05-03 12:24

those are different games, right? what's needed is two replays of the same game but two different players.

btw i can't watch your replays because:
    gametype=THIS WIP3.sd7;
    modhash=-1133907484;
but my script.txt says:
    GameType=THIS WIP3.sd7;
    ModHash=-425686198;
ideas? ta content?

edit: i think i know... my springcontent is probably b0rked.

~0002227

imbaczek (reporter)

tried getting base content from http://planetspring.free.fr/spring/base/, didn't work.

~0002228

KDR_11k (reporter)

I used the installer buildserv generated at http://planetspring.free.fr/spring/installer/spring_0.76b1+svn5828.exe

Dunno, replacing base doesn't seem to be enough sometimes.

~0002229

Auswaschbar (reporter)

All demos from THIS will desync because of: http://spring.clan-sy.com/websvn/listing.php?repname=spring&path=%2F&rev=5653&sc=1

~0002230

imbaczek (reporter)

Last edited: 2008-05-03 14:17

that worked.

#0 0x0087f40d in CMiniMap::DrawForReal ()
#1 0x0077f4bf in CGame::Draw ()
#2 0x00573a74 in SpringApp::Update ()
0000003 0x0057bdfb in SpringApp::Run ()
0000004 0x005733c1 in Run ()
0000005 0x00573607 in WinMain@16 ()
#6 0x0093a6d7 in main ()

maybe your lua minimap drawer exposes a bug during replays?

ps. buildserv's binaries aren't stripped, so you can easily run them in gdb to get this output.

~0002231

imbaczek (reporter)

(gdb) bt
#0 CMiniMap::DrawForReal (this=0xd9cf298) at rts/Sim/Misc/LosHandler.h:78
#1 0x006b660e in CGame::Draw (this=0xd993a18) at rts\Game\Game.cpp:2669
#2 0x0040bb8e in SpringApp::Update (this=0x23feac)
    at rts\System\SpringApp.cpp:769
0000003 0x004103dc in SpringApp::Run (this=0x23feac, argc=2, argv=0x34558)
    at rts\System\SpringApp.cpp:952
0000004 0x00408916 in Run (argc=2, argv=0x34558) at rts\System\Main.cpp:49
0000005 0x00408a62 in WinMain@16 (hInstanceIn=0x400000, hPrevInstance=0x0,
    lpCmdLine=0x251efd "916-1.sdf", nCmdShow=10) at rts\System\Main.cpp:110
#6 0x00776190 in main ()

now i need to know whether this is caused by wrong random seed or a genuine bug.

~0002232

Kloot (developer)

Last edited: 2008-05-03 22:41

Genuine bug, the demo (0.76b1+.sdf) defines only two allyteams but the allyteam index passed to LosHandler::InLos(CWorldObject*, int) is 3 (gu->myAllyTeam) so the access is out of bounds (losMap[3] has size 0 but the gigantically long expression for the index

std::max(0, std::min(losSizeY - 1, ((int) (object->pos.z * invLosDiv)))) * losSizeX+ std::max(0, std::min(losSizeX - 1, ((int)(object->pos.x * invLosDiv))))

evaluates to 8322).

edit: changing the evaluation order in CMiniMap::DrawForReal() so that gu->spectatingFullView is tested first (and similarly in CGameHelper::GuiTraceRay()) bypasses the crash, but the demo then spams tons of RunCallIn errors.

another edit: gu->myAllyTeam should be 2 (NOT 3) when watching the demo since there are only two teams/allyteams and one spectator to begin with (KDR/0, Lurker/1, and Spear), all the callin errors disappear when forcing myAllyTeam to that value so they're unrelated to the loss-of-sync issue.

~0002233

lurker (reporter)

Uploaded my versions of the replays.

~0002239

Auswaschbar (reporter)

Replays work again. Now we can hunt the bug this report was for ;)

~0002240

Kloot (developer)

Last edited: 2008-05-06 02:29

Isolated the source of the desync between DESYNC 080503-THISspace-0.76b1+.sdf and 080503-THISspace-0.76b1+.sdf. In KDR's demo (DESYNC*), when his Dagger starts shooting at Lurker's at around 1:40, the collision detection handler registers hits at these frames:


    DetectHit(), frame 3060, ID 7, team 1, p: 0xa07d6b8, p->pos: 3614.983154 132.062973 4574.239746, p->speed: -16.402075 0.130530 11.443554
    DetectHit(), frame 3137, ID 7, team 1, p: 0x9fdf0b0, p->pos: 3429.988770 131.674438 4538.070312, p->speed: -19.674892 0.188384 3.586518
    DetectHit(), frame 3137, ID 7, team 1, p: 0xaa87b90, p->pos: 3431.535645 131.750534 4540.508789, p->speed: -19.763159 0.046791 3.068450
    DetectHit(), frame 3153, ID 7, team 1, p: 0x9de49a0, p->pos: 3389.143066 131.590225 4522.762695, p->speed: -19.883038 0.107184 2.157145
    DetectHit(), frame 3169, ID 7, team 1, p: 0xa7588b8, p->pos: 3353.739746 132.194397 4526.453613, p->speed: -19.966389 0.153268 1.148820
    DetectHit(), frame 3185, ID 7, team 1, p: 0x8da9c38, p->pos: 3313.123779 132.463943 4516.988770, p->speed: -19.987667 0.160410 0.683663
    DetectHit(), frame 3284, ID 7, team 1, p: 0x8e4ce28, p->pos: 3090.103027 132.303253 4464.639648, p->speed: -19.740204 0.133573 -3.210367
    DetectHit(), frame 3284, ID 7, team 1, p: 0xa75b700, p->pos: 3092.460693 132.373169 4464.141602, p->speed: -19.622421 0.137286 -3.865451
    DetectHit(), frame 3300, ID 7, team 1, p: 0xa032688, p->pos: 3053.374756 132.177109 4455.171387, p->speed: -19.653708 0.138076 -3.703077
    DetectHit(), frame 3301, ID 7, team 1, p: 0x9dbc638, p->pos: 3036.230225 132.740234 4451.729004, p->speed: -19.534393 0.159423 -4.287431
    DetectHit(), frame 3316, ID 7, team 1, p: 0x8ded558, p->pos: 3016.385986 132.494629 4448.111816, p->speed: -19.581224 0.145531 -4.068726
    DetectHit(), frame 3317, ID 7, team 1, p: 0xa1564c8, p->pos: 2999.844238 132.774826 4443.069336, p->speed: -19.435452 0.150635 -4.715987

    DetectHit(), frame 3332, ID 7, team 1, p: 0xa75fcd0, p->pos: 2979.766113 130.783264 4440.027832, p->speed: -19.489359 0.057514 -4.490161

    DetectHit(), frame 3333, ID 7, team 1, p: 0xa760038, p->pos: 2963.052734 128.998047 4436.692871, p->speed: -19.357143 -0.036160 -5.029877
    DetectHit(), frame 3349, ID 7, team 1, p: 0x8e86840, p->pos: 2923.644287 133.055847 4428.292480, p->speed: -19.402908 0.165704 -4.847646


While in Lurker's universe the hit on frame 3332 goes undetected:


    DetectHit(), frame 3060, ID 7, team 1, p: 0x9e3b730, p->pos: 3614.983154 132.062973 4574.239746, p->speed: -16.402075 0.130530 11.443554
    DetectHit(), frame 3137, ID 7, team 1, p: 0x9ee0ba0, p->pos: 3429.988770 131.674438 4538.070312, p->speed: -19.674892 0.188384 3.586518
    DetectHit(), frame 3137, ID 7, team 1, p: 0x8db7910, p->pos: 3431.535645 131.750534 4540.508789, p->speed: -19.763159 0.046791 3.068450
    DetectHit(), frame 3153, ID 7, team 1, p: 0x8e93350, p->pos: 3389.143066 131.590225 4522.762695, p->speed: -19.883038 0.107184 2.157145
    DetectHit(), frame 3169, ID 7, team 1, p: 0xa6a3f48, p->pos: 3353.739746 132.194397 4526.453613, p->speed: -19.966389 0.153268 1.148820
    DetectHit(), frame 3185, ID 7, team 1, p: 0x8e7ea50, p->pos: 3313.123779 132.463943 4516.988770, p->speed: -19.987667 0.160410 0.683663
    DetectHit(), frame 3284, ID 7, team 1, p: 0x9e21bf8, p->pos: 3090.103027 132.303253 4464.639648, p->speed: -19.740204 0.133573 -3.210367
    DetectHit(), frame 3284, ID 7, team 1, p: 0xa166880, p->pos: 3092.460693 132.373169 4464.141602, p->speed: -19.622421 0.137286 -3.865451
    DetectHit(), frame 3300, ID 7, team 1, p: 0xa6dc070, p->pos: 3053.374756 132.177109 4455.171387, p->speed: -19.653708 0.138076 -3.703077
    DetectHit(), frame 3301, ID 7, team 1, p: 0xa6dc2f8, p->pos: 3036.230225 132.740234 4451.729004, p->speed: -19.534393 0.159423 -4.287431
    DetectHit(), frame 3316, ID 7, team 1, p: 0xa713968, p->pos: 3016.385986 132.494629 4448.111816, p->speed: -19.581224 0.145531 -4.068726
    DetectHit(), frame 3317, ID 7, team 1, p: 0x9e34678, p->pos: 2999.844238 132.774826 4443.069336, p->speed: -19.435452 0.150635 -4.715987

    DetectHit(), frame 3333, ID 7, team 1, p: 0x9de1ba8, p->pos: 2960.276855 130.840775 4435.537598, p->speed: -19.489359 0.057514 -4.490161
    DetectHit(), frame 3349, ID 7, team 1, p: 0x8e7edf0, p->pos: 2923.644287 133.055847 4428.292480, p->speed: -19.402908 0.165704 -4.847646



This could be either my coldet routines at fault or caused by a difference in the Dagger's projectile paths due to their inaccuracy, I'm not sure yet. Note that these results seem to vary from run to run as well, the hit at frame 3316 is sometimes reported in Lurker's demo and missed other times, and occasionally it shows up on frame 3317 instead (as a double hit).

~0002241

Kloot (developer)

Last edited: 2008-05-06 14:45

Registering everything between frames 3300 and 3350 produces:



KDR:

f: 3316, H, pr: 0xa5a24f8, h(p): 1135868008, h(s): 1057723249, p: 3016.385986 132.494629 4448.111816, s: -19.581224 0.145531 -4.068726
f: 3316, M, pr: 0xa5a2860, h(p): 1135793657, h(s): 1058525874, p: 3019.279785 132.624191 4447.785156, s: -19.435452 0.150635 -4.715987
f: 3316, M, pr: 0xa55ba50, h(p): 1136696207, h(s): 1015011002, p: 3291.594238 129.863098 4511.871582, s: -19.489359 0.057514 -4.490161
f: 3316, M, pr: 0x8e98128, h(p): 1136725125, h(s): 3157144519, p: 3292.124756 129.612823 4522.199219, s: -19.357143 -0.036160 -5.029877

f: 3317, H, pr: 0xa5a2860, h(p): 1135960919, h(s): 1058525874, p: 2999.844238 132.774826 4443.069336, s: -19.435452 0.150635 -4.715987
f: 3317, M, pr: 0xa55ba50, h(p): 1136767246, h(s): 1015011002, p: 3272.104980 129.920609 4507.381348, s: -19.489359 0.057514 -4.490161
f: 3317, M, pr: 0x8e98128, h(p): 1136670387, h(s): 3157144519, p: 3272.767578 129.576660 4517.169434, s: -19.357143 -0.036160 -5.029877

...

f: 3330, M, pr: 0xa55ba50, h(p): 1135937769, h(s): 1015011002, p: 3018.744629 130.668243 4449.008301, s: -19.489359 0.057514 -4.490161
f: 3330, M, pr: 0x8e98128, h(p): 1136055545, h(s): 3157144519, p: 3021.124268 129.106537 4451.782227, s: -19.357143 -0.036160 -5.029877
f: 3331, M, pr: 0xa55ba50, h(p): 1135815160, h(s): 1015011002, p: 2999.255371 130.725754 4444.518066, s: -19.489359 0.057514 -4.490161
f: 3331, M, pr: 0x8e98128, h(p): 1135638599, h(s): 3157144519, p: 3001.767090 129.070374 4446.752441, s: -19.357143 -0.036160 -5.029877

f: 3332, H, pr: 0xa55ba50, h(p): 1135752447, h(s): 1015011002, p: 2979.766113 130.783264 4440.027832, s: -19.489359 0.057514 -4.490161
f: 3332, M, pr: 0x8e98128, h(p): 1135715205, h(s): 3157144519, p: 2982.409912 129.034210 4441.722656, s: -19.357143 -0.036160 -5.029877

f: 3333, H, pr: 0x8e98128, h(p): 1135831763, h(s): 3157144519, p: 2963.052734 128.998047 4436.692871, s: -19.357143 -0.036160 -5.029877


LURKER:

f: 3316, H, pr: 0x8ec46f8, h(p): 1135868008, h(s): 1057723249, p: 3016.385986 132.494629 4448.111816, s: -19.581224 0.145531 -4.068726
f: 3316, M, pr: 0xa031398, h(p): 1135793657, h(s): 1058525874, p: 3019.279785 132.624191 4447.785156, s: -19.435452 0.150635 -4.715987
f: 3316, M, pr: 0x8f11068, h(p): 1136696207, h(s): 1015011002, p: 3291.594238 129.863098 4511.871582, s: -19.489359 0.057514 -4.490161
f: 3316, M, pr: 0x8f5c2e8, h(p): 1136725125, h(s): 3157144519, p: 3292.124756 129.612823 4522.199219, s: -19.357143 -0.036160 -5.029877

f: 3317, H, pr: 0xa031398, h(p): 1135960919, h(s): 1058525874, p: 2999.844238 132.774826 4443.069336, s: -19.435452 0.150635 -4.715987
f: 3317, M, pr: 0x8f11068, h(p): 1136767246, h(s): 1015011002, p: 3272.104980 129.920609 4507.381348, s: -19.489359 0.057514 -4.490161
f: 3317, M, pr: 0x8f5c2e8, h(p): 1136670387, h(s): 3157144519, p: 3272.767578 129.576660 4517.169434, s: -19.357143 -0.036160 -5.029877

...

f: 3330, M, pr: 0x8f11068, h(p): 1135937769, h(s): 1015011002, p: 3018.744629 130.668243 4449.008301, s: -19.489359 0.057514 -4.490161
f: 3330, M, pr: 0x8f5c2e8, h(p): 1136055545, h(s): 3157144519, p: 3021.124268 129.106537 4451.782227, s: -19.357143 -0.036160 -5.029877
f: 3331, M, pr: 0x8f11068, h(p): 1135815160, h(s): 1015011002, p: 2999.255371 130.725754 4444.518066, s: -19.489359 0.057514 -4.490161
f: 3331, M, pr: 0x8f5c2e8, h(p): 1135638599, h(s): 3157144519, p: 3001.767090 129.070374 4446.752441, s: -19.357143 -0.036160 -5.029877

f: 3332, M, pr: 0x8f11068, h(p): 1135752447, h(s): 1015011002, p: 2979.766113 130.783264 4440.027832, s: -19.489359 0.057514 -4.490161
f: 3332, M, pr: 0x8f5c2e8, h(p): 1135715205, h(s): 3157144519, p: 2982.409912 129.034210 4441.722656, s: -19.357143 -0.036160 -5.029877

f: 3333, H, pr: 0x8f11068, h(p): 1135693598, h(s): 1015011002, p: 2960.276855 130.840775 4435.537598, s: -19.489359 0.057514 -4.490161
f: 3333, M, pr: 0x8f5c2e8, h(p): 1135831763, h(s): 3157144519, p: 2963.052734 128.998047 4436.692871, s: -19.357143 -0.036160 -5.029877



h() is a hash function for float3 objects which does this:

inline unsigned int hash(const float3& f) {
    const unsigned int hx = *((int*) &f.x);
    const unsigned int hy = *((int*) &f.y);
    const unsigned int hz = *((int*) &f.z);
    return (hx ^ hy ^ hz);
}

So it looks very much like the coldet code is borked since the hashes for both projectiles that are alive in frame 3332 match for speed and for position.

~0002242

Kloot (developer)

The desync actually occurs even earlier, I wrote a Python script to compare the outputs and got a mismatch at frame 3300:

F: 3300
        KDR: [('H', 1135722999, 1073522223), ('M', 1135683758, 1060522772), ('M', 1138702792, 1057723249), ('M', 1138707673, 1058525874)]
        LUR: [('H', 1135722999, 1073522223), ('H', 1135683758, 1060522772), ('M', 1138702792, 1057723249), ('M', 1138707673, 1058525874)]

Now how to fix this. ;)

~0002243

imbaczek (reporter)

1. those two lines look the same ;p
2. why don't just use diff -u?
3. no idea how to fix this; possible causes (list is not exhaustive)
- uninitialized variables
- buffer overrun/underrun
- dangling pointer (hey, great fun! :P)
dunno if svn has some of these, you might want to run mcheck periodically and compile with use_mmgr=on for the last two. (mcheck: http://www.delorie.com/gnu/docs/glibc/libc_33.html)

~0002250

Kloot (developer)

Last edited: 2008-05-09 01:26

baczek:

1) They are (very reproducably) not, don't trust your eyes ;)
2) I was only interested in the first pair of non-matching lines
3) Thanks, but the source is actually none of those. By adding a hash function to CMatrix44 similar to the float3 one, ie.

    unsigned int CMatrix44f::Hash() const {
        const unsigned int m00 = *((int*) &m[ 0]);
        const unsigned int m01 = *((int*) &m[ 1]);
        const unsigned int m02 = *((int*) &m[ 2]);
        const unsigned int m03 = *((int*) &m[ 3]);
        const unsigned int m04 = *((int*) &m[ 4]);
        const unsigned int m05 = *((int*) &m[ 5]);
        const unsigned int m06 = *((int*) &m[ 6]);
        const unsigned int m07 = *((int*) &m[ 7]);
        const unsigned int m08 = *((int*) &m[ 8]);
        const unsigned int m09 = *((int*) &m[ 9]);
        const unsigned int m10 = *((int*) &m[10]);
        const unsigned int m11 = *((int*) &m[11]);
        const unsigned int m12 = *((int*) &m[12]);
        const unsigned int m13 = *((int*) &m[13]);
        const unsigned int m14 = *((int*) &m[14]);
        const unsigned int m15 = *((int*) &m[15]);

        const unsigned int hA = m00 ^ m01 ^ m02 ^ m03;
        const unsigned int hB = m04 ^ m05 ^ m06 ^ m07;
        const unsigned int hC = m08 ^ m09 ^ m10 ^ m11;
        const unsigned int hD = m12 ^ m13 ^ m14 ^ m15;
        return (hA ^ hB ^ hC ^ hD);
    }

I managed to track it down to CUnit::GetTransformMatrix(), which the coldet handler calls in two places. The one in CCollisionHandler::Collision(const CUnit* u, const float3& p) is on the code path executed by both demos, and it returns a different matrix for KDR's than for Lurker's:

KDR:
    CCH::DH(u 0x9faad68, p0 1135683758)
    CCH:Coll(u 0x9faad68, p 1135683758)/1 -- diff 3226392320
    CCH:Coll(u 0x9faad68, p 1135683758)/2 -- m 1893190800

LUR:
    CCH::DH(u 0x8e47310, p0 1135683758)
    CCH:Coll(u 0x8e47310, p 1135683758)/1 -- diff 3226392320
    CCH:Coll(u 0x8e47310, p 1135683758)/2 -- m 1893195391

(note the hash inequalities in their lower-order bytes) with the relevant code segment being:

bool CCollisionHandler::Collision(const CUnit* u, const float3& p) {
    ...
    CMatrix44f m;
    u->GetTransformMatrix(m);

    printf("\tCCH:Coll(u %p, p %u)/2 -- m %u\n", u, p.hash(), m.hash());
    ...
}

This is on frame 3300.


EDIT: GetTransformMatrix reads a member of gu, that explains a lot ;)

~0002251

imbaczek (reporter)

good job! guess i'm too used to fighting dangling pointers :>

~0002252

Kloot (developer)

Fixed as of r5859.
+Notes

-Issue History
Date Modified Username Field Change
2008-05-03 09:43 KDR_11k New Issue
2008-05-03 11:20 imbaczek Note Added: 0002224
2008-05-03 11:53 KDR_11k File Added: DESYNC 080503-THISspace-0.76b1+-0.sdf
2008-05-03 11:54 KDR_11k File Added: DESYNC 080503-THISspace-0.76b1+.sdf
2008-05-03 11:55 KDR_11k File Added: DESYNC 080503-THISspace-0.76b1+-1.sdf
2008-05-03 11:56 KDR_11k Note Added: 0002225
2008-05-03 12:09 imbaczek Note Added: 0002226
2008-05-03 12:24 imbaczek Note Edited: 0002226
2008-05-03 12:44 imbaczek Note Added: 0002227
2008-05-03 13:28 KDR_11k Note Added: 0002228
2008-05-03 13:57 Auswaschbar Note Added: 0002229
2008-05-03 14:00 imbaczek Note Added: 0002230
2008-05-03 14:17 imbaczek Note Edited: 0002230
2008-05-03 14:39 imbaczek Note Added: 0002231
2008-05-03 14:55 Kloot Note Added: 0002232
2008-05-03 14:58 Kloot Note Edited: 0002232
2008-05-03 15:01 Kloot Note Edited: 0002232
2008-05-03 15:09 Kloot Note Edited: 0002232
2008-05-03 15:32 lurker File Added: 080503-THISspace-0.76b1+.sdf
2008-05-03 15:32 lurker File Added: 080503-THISspace-0.76b1+-0.sdf
2008-05-03 15:33 lurker File Added: 080503-THISspace-0.76b1+-1.sdf
2008-05-03 15:33 lurker Note Added: 0002233
2008-05-03 22:41 Kloot Note Edited: 0002232
2008-05-05 23:21 Auswaschbar Note Added: 0002239
2008-05-06 01:46 Kloot Note Added: 0002240
2008-05-06 01:47 Kloot Note Edited: 0002240
2008-05-06 02:02 Kloot Note Edited: 0002240
2008-05-06 02:03 Kloot Note Edited: 0002240
2008-05-06 02:08 Kloot Note Edited: 0002240
2008-05-06 02:15 Kloot Note Edited: 0002240
2008-05-06 02:15 Kloot Note Edited: 0002240
2008-05-06 02:17 Kloot Note Edited: 0002240
2008-05-06 02:27 Kloot Note Edited: 0002240
2008-05-06 02:29 Kloot Note Edited: 0002240
2008-05-06 14:30 Kloot Note Added: 0002241
2008-05-06 14:31 Kloot Note Edited: 0002241
2008-05-06 14:45 Kloot Note Edited: 0002241
2008-05-06 16:38 Kloot Note Added: 0002242
2008-05-06 16:47 imbaczek Note Added: 0002243
2008-05-09 01:12 Kloot Note Added: 0002250
2008-05-09 01:14 Kloot Note Edited: 0002250
2008-05-09 01:26 Kloot Note Edited: 0002250
2008-05-09 01:37 imbaczek Note Added: 0002251
2008-05-09 01:42 Kloot Status new => resolved
2008-05-09 01:42 Kloot Resolution open => fixed
2008-05-09 01:42 Kloot Assigned To => Kloot
2008-05-09 01:42 Kloot Note Added: 0002252
+Issue History