2024-04-23 11:08 CEST

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0003436Spring engineGeneralpublic2015-10-19 11:20
Reporterburp 
Assigned Tohokomoko 
PrioritynormalSeveritycrashReproducibilityalways
StatusresolvedResolutionfixed 
Product Version100.0 
Target Version101.0Fixed in Version 
Summary0003436: Desync from different float printing on linux and windows
DescriptionWhen we play the map longcat32 (which has no special lua stuff) on zero-k, linux users and windows users are always split after a few seconds. Usually all linux users get the desync message because most players are on windows. This only happens on this map.
Steps To ReproducePlay zero-k (could not try other mods yet) on longcat32 map with windows and linux clients (linux x86_64).
TagsNo tags attached.
Checked infolog.txt for Errors
Attached Files

-Relationships
related to 0001050resolvedKloot Desync while using the jumpjets script. 
+Relationships

-Notes

~0009691

Kloot (developer)

if possible, please test if this also happens using a 92.0 test build

~0010374

abma (administrator)

no feedback, we assume its fixed. if not, please reopen

~0015124

hokomoko (developer)

Apparently still occurs in 100.0

~0015128

abma (administrator)

infolog.txt? replay?

~0015129

burp (reporter)

Last edited: 2015-09-09 22:40

View 3 revisions

Linux:
20150906_153744_LongCat32_100.infolog.txt (from replaying demo)
20150906_153744_LongCat32_100.sdf

~0015130

Rafal99 (reporter)

Windows 7, 64-bit:
20150906_153744_LongCat32_100_Windows.sdf

~0015131

abma (administrator)

Last edited: 2015-09-10 00:14

View 2 revisions

suspicious when run with spring-headless:
[f=0000037] Calling Garbage Collector on excessive LuaUI memory usage: 102.5 MB
[f=0000332] Calling Garbage Collector on excessive LuaUI memory usage: 104.8 MB
[f=0000665] Calling Garbage Collector on excessive LuaUI memory usage: 105.1 MB
[f=0000991] Calling Garbage Collector on excessive LuaUI memory usage: 105.3 MB

~0015132

abma (administrator)

Last edited: 2015-09-10 00:16

View 2 revisions

couldn't find an issue with valgrind/signan enabled!

~0015133

hokomoko (developer)

map itself doesn't have lua, I guess it's because it's huge?

~0015134

abma (administrator)

Last edited: 2015-09-10 00:21

View 2 revisions

http://api.springfiles.com/?springname=LongCat32

no, contents are:

maps/LongCat32.smf
maps/.LongCat32.smd.swp
maps/LongCat32.smd
maps/LongCat32.smt

map is 32x6

~0015189

abma (administrator)

the usual cases seems to be not the cause (no errors with valgrind/signan), so debugging with sync-debug builds are required:

https://springrts.com/wiki/Debugging_sync_errors

~0015214

abma (administrator)

Last edited: 2015-09-23 20:49

View 2 revisions

tested locally with linux64 as server and spring-headless on windows.

instant desyncs when giving commander a mex build order.


used 100.0.1-207-g313b5e5

syncdebug-server.log:
Server: 0000004 SpringApp::Run() [/home/buildbot/slave/linux-static-x64/build/build/syncdebug/../../rts/System/SpringApp.cpp:978]
Server: === Backtrace 18 ===
Server: #0 void Sync::AssertDebugger<unsigned int>(unsigned int const&, char const*) [/home/buildbot/slave/linux-static-x64/build/build/syncdebug/../../rts/System/Sync/SyncedPrimitiveBase.h:31]
Server: #1 CGame::ClientReadNet() [/home/buildbot/slave/linux-static-x64/build/build/syncdebug/../../rts/Net/NetCommands.cpp:511]
Server: #2 CGame::Update() [/home/buildbot/slave/linux-static-x64/build/build/syncdebug/../../rts/Game/Game.cpp:1005]
Server: 0000003 SpringApp::Update() [/home/buildbot/slave/linux-static-x64/build/build/syncdebug/../../rts/System/SpringApp.cpp:942]
Server: 0000004 SpringApp::Run() [/home/buildbot/slave/linux-static-x64/build/build/syncdebug/../../rts/System/SpringApp.cpp:978]
Server: Done!

~0015215

abma (administrator)

an other try with MAX_STACK=10:

Server: 0x42E52C94/ 1.14587067e+02 instead of 0x44CA0000/ 1.61600000e+03, frame 000381, backtrace 1 in "copyfloat"

Server: === Backtrace 1 ===
Server: #0 Sync::Assert(void const*, unsigned int, char const*) [/home/abma/dev/spring/develop/rts/System/Sync/SyncedPrimitiveBase.h:45]
Server: #1 void Sync::Assert<float>(float const&, char const*) [/home/abma/dev/spring/develop/rts/System/Sync/SyncedPrimitiveBase.h:60]
Server: #2 SyncedPrimitive<float>::Sync(char const*) [/home/abma/dev/spring/develop/rts/System/Sync/SyncedPrimitive.h:43]
Server: 0000003 SyncedPrimitive<float>::SyncedPrimitive(float) [/home/abma/dev/spring/develop/rts/System/Sync/SyncedPrimitive.h:84]
Server: 0000004 SyncedFloat3::SyncedFloat3(float3 const&) [/home/abma/dev/spring/develop/rts/System/Sync/SyncedFloat3.h:43]
Server: 0000005 CGroundMoveType::GetNewPath() [/home/abma/dev/spring/develop/rts/Sim/MoveTypes/GroundMoveType.cpp:1247]
Server: #6 CGroundMoveType::StartEngine(bool) [/home/abma/dev/spring/develop/rts/Sim/MoveTypes/GroundMoveType.cpp:1446]
Server: #7 CGroundMoveType::ReRequestPath(bool) [/home/abma/dev/spring/develop/rts/Sim/MoveTypes/GroundMoveType.cpp:1262]
Server: #8 CGroundMoveType::StartMoving(float3, float) [/home/abma/dev/spring/develop/rts/Sim/MoveTypes/GroundMoveType.cpp:430]
Server: #9 CMobileCAI::SetGoal(float3 const&, float3 const&, float) [/home/abma/dev/spring/develop/rts/Sim/Units/CommandAI/MobileCAI.cpp:898]

~0015216

abma (administrator)

Last edited: 2015-09-23 22:36

View 2 revisions

can't reproduce in BA, very likely related to the zk lua code which sets mex positions on this map. not all mex positions cause the desync happen!

@zkdevs:

can you print all mex positions generated by the gadget (?) on this map with windows and linux please?

~0015247

Rafal99 (reporter)

I added a widget (dbg_print_metal_spots.lua) that prints position and metal value of all metal spots generated by ZK gadget, and also prints calculated income of each created mex.

The output of the widget on my system (Win 7 x64) can be seen in widget_output_win7.txt

~0015248

Rafal99 (reporter)

Last edited: 2015-09-27 02:44

View 6 revisions

Seems Lamer added the output for linux64. Mex incomes differ starting from the third one, and metal spots are different in the array and it is more than just their order.

This means the issue is in ZK metal spot finder, but the question is how could the Lua code produce different results for Windows players and Linux players without giving different results for everyone?

~0015249

Google_Frog (reporter)

My suggestion is to look for a bug in Spring.GetGroundInfo.

~0015250

hokomoko (developer)

Last edited: 2015-09-27 12:14

View 2 revisions

Also check the gadget for any calculations involving NaNs, division by 0 and stuff with math.huge

~0015251

abma (administrator)

Last edited: 2015-09-27 23:40

View 3 revisions

@hokomoko:https://springrts.com/mantis/view.php?id=3436#c15132

@others:

https://github.com/ZeroK-RTS/Zero-K/blob/master/LuaRules/Gadgets/mex_spot_finder.lua#L386
https://springrts.com/wiki/Debugging_sync_errors :

"If you're a game developer, please be aware that Lua may be a source of desyncs. E.g. table iteration using pairs when you have tables, coroutines, or functions as keys is not a sync-safe operation, see mantis 0001050 for example of such a desync."

uniqueGroups is a table!

https://github.com/ZeroK-RTS/Zero-K/blob/master/LuaRules/Gadgets/mex_spot_finder.lua#L290

~0015252

abma (administrator)

Last edited: 2015-09-28 18:12

View 2 revisions

for the reference / further discussion:

https://springrts.com/phpbb/viewtopic.php?f=23&t=33906

upstream bug report:

https://github.com/ZeroK-RTS/Zero-K/issues/1069

~0015286

hokomoko (developer)

Apparently this is not due to pairs.
https://github.com/spring/spring/blob/develop/rts/lib/lua/include/luaconf.h#L532

The conversion from a number to a string representation is based on sprintf which can produce different strings in windows and linux for the same float value (e+14 vs. e+014).

~0015288

hokomoko (developer)

Fix 2e57d0d3c0c55abbd6ec71b6fe8b2fc0b3f1fbff committed to develop branch: Added lexical_cast for number->string conversions
Fix 0003436, repo: spring changeset id: 5684

~0015302

hokomoko (developer)

While the issue of e+014 vs. e+14 can be solved, I'm afraid other rounding issues may be a problem.

I'm starting to think this isn't safely fixable and float->string conversion should be warned against in documentation
+Notes

-Issue History
Date Modified Username Field Change
2013-02-02 21:01 burp New Issue
2013-02-02 22:55 Kloot Note Added: 0009691
2013-02-02 22:55 Kloot Assigned To => abma
2013-02-02 22:55 Kloot Status new => feedback
2013-02-02 22:55 Kloot Assigned To abma =>
2013-04-03 03:23 abma Note Added: 0010374
2013-04-03 03:23 abma Status feedback => resolved
2013-04-03 03:23 abma Resolution open => fixed
2013-04-03 03:23 abma Assigned To => abma
2015-09-07 15:26 hokomoko Assigned To abma =>
2015-09-07 15:26 hokomoko Note Added: 0015124
2015-09-07 15:26 hokomoko Status resolved => feedback
2015-09-07 15:26 hokomoko Resolution fixed => reopened
2015-09-07 15:26 hokomoko Assigned To => hokomoko
2015-09-07 15:26 hokomoko Status feedback => new
2015-09-07 15:27 hokomoko Severity block => major
2015-09-07 15:27 hokomoko Status new => assigned
2015-09-07 15:27 hokomoko Assigned To hokomoko =>
2015-09-09 20:39 abma Note Added: 0015128
2015-09-09 20:40 abma Status assigned => feedback
2015-09-09 22:34 burp File Added: 20150906_153744_LongCat32_100.infolog.txt
2015-09-09 22:35 burp File Added: 20150906_153744_LongCat32_100.sdf
2015-09-09 22:36 burp Note Added: 0015129
2015-09-09 22:36 burp Status feedback => new
2015-09-09 22:40 burp Note Edited: 0015129 View Revisions
2015-09-09 22:40 burp Note Edited: 0015129 View Revisions
2015-09-09 22:41 Rafal99 File Added: 20150906_153744_LongCat32_100_Windows.sdf
2015-09-09 22:42 Rafal99 Note Added: 0015130
2015-09-09 23:16 abma Severity major => crash
2015-09-09 23:16 abma Product Version 91.0 => 100.0
2015-09-09 23:16 abma Target Version => 101.0
2015-09-09 23:35 abma Priority high => normal
2015-09-10 00:12 abma Note Added: 0015131
2015-09-10 00:14 abma Note Edited: 0015131 View Revisions
2015-09-10 00:16 abma Note Added: 0015132
2015-09-10 00:16 abma Note Edited: 0015132 View Revisions
2015-09-10 00:16 hokomoko Note Added: 0015133
2015-09-10 00:19 abma Note Added: 0015134
2015-09-10 00:21 abma Note Edited: 0015134 View Revisions
2015-09-17 11:42 abma Note Added: 0015189
2015-09-17 11:46 abma Summary Desync between linux and windows players => Desync between linux and windows players on LongCat32
2015-09-23 20:48 abma Note Added: 0015214
2015-09-23 20:49 abma Note Edited: 0015214 View Revisions
2015-09-23 20:52 abma File Added: trace1-client.log
2015-09-23 20:52 abma File Added: trace0.log
2015-09-23 20:53 abma File Added: 20150923_204703_LongCat32_100.0.1-207-g313b5e5 develop.sdf
2015-09-23 20:53 abma File Added: syncdebug-server.log
2015-09-23 22:09 abma Note Added: 0015215
2015-09-23 22:35 abma Note Added: 0015216
2015-09-23 22:36 abma Note Edited: 0015216 View Revisions
2015-09-23 22:45 abma Summary Desync between linux and windows players on LongCat32 => Desync between linux and windows players on zk/LongCat32 when placing a mex on a specific spot
2015-09-24 23:35 abma Status new => feedback
2015-09-26 18:33 Rafal99 Note Added: 0015247
2015-09-26 18:34 Rafal99 File Added: dbg_print_metal_spots.lua
2015-09-26 18:34 Rafal99 File Added: widget_output_win7.txt
2015-09-27 01:31 lamer File Added: widget_output_linux64.txt
2015-09-27 02:08 Rafal99 Note Added: 0015248
2015-09-27 02:17 Rafal99 Note Edited: 0015248 View Revisions
2015-09-27 02:34 Rafal99 Note Edited: 0015248 View Revisions
2015-09-27 02:35 Rafal99 Note Edited: 0015248 View Revisions
2015-09-27 02:42 Rafal99 Note Edited: 0015248 View Revisions
2015-09-27 02:44 Rafal99 Note Edited: 0015248 View Revisions
2015-09-27 03:29 Google_Frog Note Added: 0015249
2015-09-27 12:14 hokomoko Note Added: 0015250
2015-09-27 12:14 hokomoko Note Edited: 0015250 View Revisions
2015-09-27 23:33 abma Note Added: 0015251
2015-09-27 23:33 abma Status feedback => resolved
2015-09-27 23:33 abma Resolution reopened => no change required
2015-09-27 23:33 abma Assigned To => abma
2015-09-27 23:34 abma Note Edited: 0015251 View Revisions
2015-09-27 23:40 abma Note Edited: 0015251 View Revisions
2015-09-28 01:28 abma Relationship added related to 0001050
2015-09-28 14:51 abma Note Added: 0015252
2015-09-28 18:12 abma Note Edited: 0015252 View Revisions
2015-10-08 17:24 hokomoko Assigned To abma => hokomoko
2015-10-08 17:24 hokomoko Note Added: 0015286
2015-10-08 17:24 hokomoko Status resolved => feedback
2015-10-08 17:24 hokomoko Resolution no change required => reopened
2015-10-08 17:24 hokomoko Status feedback => assigned
2015-10-08 19:39 hokomoko Changeset attached => spring develop 2e57d0d3
2015-10-08 19:39 hokomoko Note Added: 0015288
2015-10-08 19:39 hokomoko Status assigned => resolved
2015-10-15 15:27 hokomoko Note Added: 0015302
2015-10-15 15:27 hokomoko Status resolved => feedback
2015-10-16 18:50 hokomoko Changeset attached => spring develop 6b27c464
2015-10-16 18:51 hokomoko Summary Desync between linux and windows players on zk/LongCat32 when placing a mex on a specific spot => Desync from different float printing on linux and windows
2015-10-19 11:20 hokomoko Status feedback => resolved
2015-10-19 11:20 hokomoko Resolution reopened => fixed
+Issue History