View Issue Details

IDProjectCategoryView StatusLast Update
0004152Spring engineGeneralpublic2013-11-30 23:52
Reporteruser744Assigned ToKloot  
PrioritynormalSeveritycrashReproducibilityN/A
Status resolvedResolutionfixed 
Product Version95.0 
Target Version96.0Fixed in Version95.0.1+git 
Summary0004152: Crash with "Failed to allocate memory" (mod: XTA 9.731)
DescriptionAfter 30-45min of playing, spring crashed.
spring.exe closed and messagebox from windows "has stopped worked."
Have not tested yet if demo crashes too.
http://pastebin.com/t1nV0tFY
TagsNo tags attached.
Attached Files
20131113_212307_DeltaSiege_Island_8_Way_95.sdf (Attachment missing)
20131116_125131_The_Road_Final_95.sdf (Attachment missing)
infolog.txt (Attachment missing)
crash 2infolog.txt (Attachment missing)
crash 3infolog.txt (Attachment missing)
springsettings_B.cfg (Attachment missing)
springsettings_A.cfg (Attachment missing)
infolog-manoa.txt (Attachment missing)
20131126_193517_Knockout_v2_95.sdf (Attachment missing)
springsettings-nvidia.cfg (Attachment missing)
Checked infolog.txt for Errors

Activities

user744

2013-11-13 22:12

  ~0012115

looks like same crash: http://springrts.com/phpbb/viewtopic.php?f=11&t=31178

Kloot

2013-11-13 22:25

developer   ~0012118

hmz, stacktrace failed

jK

2013-11-13 22:36

developer   ~0012119

does the replay reproduce the crash for you?

user744

2013-11-14 12:14

  ~0012130

oh sorry 20131113_212307_DeltaSiege_Island_8_Way_95.sdf is actually wrong demo, from previous game. the crashed demo did not get saved..

user744

2013-11-14 12:30

  ~0012131

this is correct demo:
http://replays.springrts.com/replay/e6e083525d31ac471a8917dc79b8f899/

during playing it crashed at ~28min
demo played ok until end.
infolog:
http://pastebin.com/YzwvtTks (no crash)
spring.exe RAM usage in taskmanager steadily goes up, but maybe that is normal for game of such size.
start 530mb
2min 560mb
10min 670mb
13min 720mb
20min 815mb
23min 856mb
26min 890mb
28min 920mb

jK

2013-11-14 16:00

developer   ~0012132

Last edited: 2013-11-14 16:01

I got an endless loop in glSwapBuffers after ~ 31min

Jools

2013-11-14 16:35

reporter   ~0012133

Last edited: 2013-11-14 16:37

That's interesting: at about 31 mins adv.playerlist crashes because it tries to gl.text a name that is nil. I had one crash before also, it was on exactly this map and it seemed to be a GPU timeout. But I played 50 other games with 95 without any crash.

Kloot

2013-11-14 17:00

developer   ~0012134

in that case, see if you can reproduce it by calling gl.Text(nil) from some widget

Jools

2013-11-14 17:44

reporter   ~0012135

That only crashes the widget, but while it's not the cause to the crash, it could be a symptom of it. I watched that replay some times now and APL XTA version always does crash, but it doesn't always crash at the same time. Sometimes it crashes when knorke drops 28min, sometimes it crashes when duckoman gets killed 30min. But maybe it doesn't mean anything: BA version of APL doesn't crash in this demo.


But hey: why isn't APL removed by WH when it crashes? The test widget is.

This is the output from test widget:

[f=0000300] [widgets.lua] Error: false
[f=0000300] [widgets.lua] Error: Error in DrawScreen(): [string "LuaUI\Widgets\dbg_test.lua"]:40: bad argument #1 to 'Text' (string expected, got nil)
[f=0000300] [widgets.lua] Error: Removed widget: Test fields

This is the output from APL:

[f=0051936] Player [2up]knorke has finished resuming
[f=0051975] Error: gl.CreateList: error(2) = [string "LuaUI/Widgets/gui_advplayerslist.lua"]:1227: bad argument #1 to 'gl_Text' (string expected, got nil)
[f=0051978] > <[2up]knorke> think i was dead anyway
[f=0052005] Error: gl.CreateList: error(2) = [string "LuaUI/Widgets/gui_advplayerslist.lua"]:1227: bad argument #1 to 'gl_Text' (string expected, got nil)

Kloot

2013-11-14 18:25

developer   ~0012136

Last edited: 2013-11-14 18:27

because the APL crash happens inside a local function passed to gl.CreateList

user744

2013-11-16 13:36

  ~0012145

i do not use adv.playerlist widget, only /info list.

[f=0051936] Player [2up]knorke has finished resuming
Is that message from engine or lua? I never rejoined the game.

Also see chat:
[f=0051926] > <Thorgasm> knorke are u going to try and rejoin?
[f=0051926] > <[2up]knorke> ah no

Kloot

2013-11-16 13:50

developer   ~0012146

lua, cmd_idle_players

silentwings

2013-11-16 14:33

reporter   ~0012147

Last edited: 2013-11-16 16:37

The mesg "[f=0051936] Player [2up]knorke has finished resuming" is from cmd_idle_players, as kloot says. The fact that it appeared for you there is a bug - fixed yesterday (without having read this thread!) by http://imolarpg.dyndns.org/trac/balatest/changeset/1852. Update XTAs cmd_idle_players from the BA repo.

But I don't think that error has anything to do with this bug. For what it's worth, I also just got one of these crashes (first time ever) while watching a BA game. It was only me that crashed - no one else. The only abnormal thing I can think of about what I did was I spent much of the game alt-tabbed out and doing smth else.

Added infolog + demo. The stacktrace translator failed, but with a "unable to find detailed version info" which makes me think maybe stacktrace translator is broken?

user744

2013-11-17 00:45

  ~0012148

"crash 2infolog.txt" and "crash 3infolog.txt" is two more infologs from similiar crashs. both in FFA on somewhat large maps. After rejoin could continue to play.

silentwings

2013-11-17 01:08

reporter   ~0012149

Last edited: 2013-11-17 01:24

I have watched resource monitor in a game where this happened and it climbed slowly but a at a constant rate. Accordinding to /debug, lua mem usage didn't increase at all. After about 30 mins in game it can be using 3-4Gb of ram. It seems like this bug seems to affect one of my computers all the time and my other one never.

Kloot

2013-11-17 01:46

developer   ~0012150

Last edited: 2013-11-17 02:01

One way to test that would be to join games with both computers simultaneously.

Which settings (if any) are different on the machine that is affected compared to the one that isn't? Are they running the same operating systems / driver versions (assuming both use graphics hardware from the same vendor) / widgets?

silentwings

2013-11-17 02:16

reporter   ~0012151

Last edited: 2013-11-17 02:19

Ok, I did just that. Here's results of laptops A and B

Laptop A crashed after 20 minutes, having run out of ram, with spring.exe having a working set of 3.8Gb. It's an i7-3740QM, win7, 8Gb ram, Nvidia GeForce GT 650M, driver version 8.17.12.9601.

Laptop B made it to the end and was using <650Mb or ram all the way through. It's a i7-M620, win7, 8Gb ram, ATI Radeon HD 6500M, driver version 8.982.

The widget configs between the two are basically identical. On both machines lua mem usage (reported by /debug) never went above 10Mb. The spring configs are different, but mainly because the ATI machine needs some stuff disabled or it crashes in multiple ATI related ways. I'll attach both configs.

Kloot

2013-11-17 02:59

developer   ~0012153

Last edited: 2013-11-17 15:01

Ironically B should be the one to crash because it doesn't force AllowDeferred*Rendering to 0 and 95 contains a bug related to those parameters which ATI's do not appreciate. Otherwise the only significant difference is that A's config defines these values and B's doesn't:

  MultiThreadCount=6
  MultiThreadLua=1
  WorkerThreadCount=6

The first two are no longer relevant (because they correspond to code which is now unmaintained and non-functional) so you can just remove them. OTOH the latter controls the size of the threadpool (new in 95.0) but should not affect Spring's memory footprint. Does setting it to 0 have any measurable impact?

silentwings

2013-11-17 03:05

reporter   ~0012154

Last edited: 2013-11-17 03:07

I followed pretty much the same train of thought and tried setting WorkerThreadCount=0 and -1 on the Nvidia machine - it didn't change anything.

I think the ATI machine is forcing deferred rendering off, because its got advunitshading off (at least, that's my memory of what jK said to me a few nights ago).

Kloot

2013-11-17 03:18

developer   ~0012156

Last edited: 2013-11-17 03:24

Adv*Shading=0 does cause both deferred passes to be skipped but the crashbug existed in code that was always reached.

In any case, synced code can be safely ruled out as a factor.

What happens if you issue /luaui disable on A? (important because widgets can direct *Spring* to allocate memory, not just the Lua VM which has garbage collection)

silentwings

2013-11-17 14:14

reporter   ~0012161

Last edited: 2013-11-17 14:15

Turning off lauui seems to fix it. With luaui off the rate at which it increases memusage is cut by a factor of, at a guess, 10-20x.

Moreover, if I luaui enable, wait a few minutes (during which it quickly soaks up mem) and then /luaui disable, it looks as though the soaked up mem is instantly freed.

Kloot

2013-11-17 15:03

developer   ~0012162

Last edited: 2013-11-17 15:05

Hmm, feel like bisecting the widget list?

ie. start with the top half of all widgets enabled and the bottom half disabled, figure out if it leaks and if so split the top half into an enabled and a disabled part, etc.

nixtux

2013-11-17 15:33

reporter   ~0012163

Here is Infolog and spring setting, replay to follow as it didn't save on players pc, but it's looks like the same mem leak but this time the whole game was played with luaui disabled
http://pastebin.com/zrGSpjae http://pastebin.com/HNxeigeD

silentwings

2013-11-17 18:24

reporter   ~0012168

Im leaving for 2 weeks holiday tonight and won't have time to bisect the widget list before that, but if the bug is still around when I get back I could do it then. Sorry!

nixtux

2013-11-17 19:10

reporter   ~0012171

here replay from above game where player crashed due to mem leak
https://dl.dropboxusercontent.com/u/200970/20131117_133241_Grts_Northernmountains_009_95.sdf

silentwings

2013-11-26 15:45

reporter   ~0012212

So I tried bisecting the widget list but didn't find anything.

It seems as though the rate at which mem is leaking increases as more widgets are enabled and the responsibility is shared across all widgets.

I don't know at what rate spring is supposed to gradually increase mem usage during a game, but even with all luaui off it still increases at maybe 1Mb every 3 seconds.

monohouse

2013-11-26 18:37

reporter   ~0012218

I also got the failed to allocate memory crash but with another mod: Tech Annihilation 2.23.4

Kloot

2013-11-27 22:39

developer   ~0012228

Last edited: 2013-11-27 22:40

Currently my assumption is that the Lua garbage collector cannot keep up in larger games (because it runs for a limited amount of time per invocation in 95) so the rate at which memory is consumed exceeds the rate at which it is freed. Both the amount of time (MaxLuaGarbageCollectionTime) spent and the number of steps (MaxLuaGarbageCollectionSteps) taken per collection call are configurable in 95+, testing if a combination of params exists to stop leakage over the range of expected game sizes and lengths without slowing Spring to a crawl is my main concern now.

1MB every 3 seconds sounds excessive though, does it leak that fast from game start (with all widgets enabled) or does the rate increase over time?

silentwings

2013-11-28 02:08

reporter   ~0012235

I think the rate at which mem is leaked does increase somewhat as the game gets bigger, but the leak is quite big already at the game start (maybe 1Mb every 8 sec).

I have noticed something else though - mem is leaked at a noticeably faster rate when I am fully zoomed out and seeing the whole map than when I am fully zoomed in on some small irrelevent piece of ground. When rejoining a game at high speed, mem is also leaked faster.

user744

2013-11-28 03:33

  ~0012236

[quote]I have noticed something else though - mem is leaked at a noticeably faster rate when I am fully zoomed out and seeing the whole map than when I am fully zoomed in on some small irrelevent piece of ground[/quote]Hm funny, my crashes were when I was moving camera a lot out of boredom/for fun. Hold shift and quickly scrolling around map or zooming around.
But even from the demos I could not reproduce crash, even tried many programs in background (to use RAM) and wild camera movements.

Is there a replay that always (or more reliable) crashes?

silentwings

2013-11-28 04:22

reporter   ~0012237

Last edited: 2013-11-28 04:24

It only affects one of my two laptops but I can crash reliably with all my RAM used up by spectating any BA game that is larger in size than say, 6v6, and lasts longer than say 35 mins. (And its not a bad laptop - i7, 8GB RAM, GT 650M)

The trouble with replays is that with 95.0 they seem not to follow the same course as the game itself so I can't test easily (http://springrts.com/mantis/view.php?id=4140).

silentwings

2013-11-28 16:11

reporter   ~0012253

What should I do to test the lua garbage collection settings?

Kloot

2013-11-28 17:58

developer   ~0012256

Last edited: 2013-11-28 18:09

1) start with build 119-g70cf9b4 using the default* values
2) run a game on the affected machine, preferably at +20 speed to magnify the leak (since it leaks from the start you won't have to actually play much)
3) monitor what your OS has to say about the Spring process footprint
4) slowly increase one or both of MaxLuaGarbageCollection{Time,Steps} and go back to step 2 until the leak stabilizes (this of course assumes that Lua causes it *and* that the memory being leaked is even collectable, ie. garbage)

* MaxLuaGarbageCollectionTime is in milliseconds and defaults to 33, MaxLuaGarbageCollectionSteps defaults to 10000

silentwings

2013-11-30 16:05

reporter   ~0012276

Last edited: 2013-11-30 16:12

I haven't tried the test with the garbage collection time settings yet; BUT I found out which setting triggers the memleak. I had set WorkerThreadCount myself (because I wanted cores free for other things) and this triggers the memleak.

I know iI said above that I had tested WorkerThreadCount etc, but this was during the period where SL was (unbeknowst to me) forcing spring to use a different config file to my 'proper' one (in /my games/spring) - so I suspect that the check I did above was flawed. Sorry.

With WorkerThreadCount = -1, memleak doesn't happen. With WorkerThreadCount = 0, it memleaks.

What shall I do now to test?

Kloot

2013-11-30 20:11

developer   ~0012278

Last edited: 2013-11-30 20:32

Should be fixed as of 129-g0b1cbb8, can you (or anyone else) confirm that?

(and now you know why running spring.exe directly is often the preferred test-route)

silentwings

2013-11-30 23:00

reporter   ~0012279

Last edited: 2013-11-30 23:00

For me -129 doesn't leak with WorkerThreadCount=0.

Kloot

2013-11-30 23:52

developer   ~0012280

good enough for me, thanks

Issue History

Date Modified Username Field Change
2013-11-13 22:06 user744 New Issue
2013-11-13 22:06 user744 File Added: 20131113_212307_DeltaSiege_Island_8_Way_95.sdf
2013-11-13 22:12 user744 Note Added: 0012115
2013-11-13 22:25 Kloot Note Added: 0012118
2013-11-13 22:36 jK Note Added: 0012119
2013-11-14 12:14 user744 Note Added: 0012130
2013-11-14 12:30 user744 Note Added: 0012131
2013-11-14 16:00 jK Note Added: 0012132
2013-11-14 16:00 jK Note Edited: 0012132
2013-11-14 16:00 jK Note Edited: 0012132
2013-11-14 16:01 jK Note Edited: 0012132
2013-11-14 16:35 Jools Note Added: 0012133
2013-11-14 16:36 Jools Note Edited: 0012133
2013-11-14 16:37 Jools Note Edited: 0012133
2013-11-14 17:00 Kloot Note Added: 0012134
2013-11-14 17:44 Jools Note Added: 0012135
2013-11-14 18:25 Kloot Note Added: 0012136
2013-11-14 18:27 Kloot Note Edited: 0012136
2013-11-16 13:36 user744 Note Added: 0012145
2013-11-16 13:50 Kloot Note Added: 0012146
2013-11-16 14:33 silentwings Note Added: 0012147
2013-11-16 14:33 silentwings File Added: 20131116_125131_The_Road_Final_95.sdf
2013-11-16 14:33 silentwings File Added: infolog.txt
2013-11-16 14:35 silentwings Note Edited: 0012147
2013-11-16 14:36 silentwings Note Edited: 0012147
2013-11-16 14:47 silentwings Note Edited: 0012147
2013-11-16 16:37 silentwings Note Edited: 0012147
2013-11-17 00:43 user744 File Added: crash 2infolog.txt
2013-11-17 00:43 user744 File Added: crash 3infolog.txt
2013-11-17 00:45 user744 Note Added: 0012148
2013-11-17 01:08 silentwings Note Added: 0012149
2013-11-17 01:21 silentwings Note Edited: 0012149
2013-11-17 01:24 silentwings Note Edited: 0012149
2013-11-17 01:46 Kloot Note Added: 0012150
2013-11-17 01:51 Kloot Note Edited: 0012150
2013-11-17 02:01 Kloot Note Edited: 0012150
2013-11-17 02:16 silentwings Note Added: 0012151
2013-11-17 02:17 silentwings File Added: springsettings_B.cfg
2013-11-17 02:17 silentwings File Added: springsettings_A.cfg
2013-11-17 02:19 silentwings Note Edited: 0012151
2013-11-17 02:19 silentwings Note Edited: 0012151
2013-11-17 02:59 Kloot Note Added: 0012153
2013-11-17 03:02 Kloot Note Edited: 0012153
2013-11-17 03:03 Kloot Note Edited: 0012153
2013-11-17 03:05 silentwings Note Added: 0012154
2013-11-17 03:05 silentwings Note Edited: 0012154
2013-11-17 03:06 silentwings Note Edited: 0012154
2013-11-17 03:07 silentwings Note Edited: 0012154
2013-11-17 03:18 Kloot Note Added: 0012156
2013-11-17 03:19 Kloot Note Edited: 0012156
2013-11-17 03:21 Kloot Note Edited: 0012156
2013-11-17 03:24 Kloot Note Edited: 0012156
2013-11-17 14:14 silentwings Note Added: 0012161
2013-11-17 14:15 silentwings Note Edited: 0012161
2013-11-17 15:01 Kloot Note Edited: 0012153
2013-11-17 15:03 Kloot Note Added: 0012162
2013-11-17 15:04 Kloot Note Edited: 0012162
2013-11-17 15:05 Kloot Note Edited: 0012162
2013-11-17 15:33 nixtux Note Added: 0012163
2013-11-17 18:24 silentwings Note Added: 0012168
2013-11-17 19:10 nixtux Note Added: 0012171
2013-11-21 01:52 abma Target Version => 96.0
2013-11-26 15:45 silentwings Note Added: 0012212
2013-11-26 18:34 monohouse File Added: infolog-manoa.txt
2013-11-26 18:34 monohouse File Added: 20131126_193517_Knockout_v2_95.sdf
2013-11-26 18:37 monohouse Note Added: 0012218
2013-11-26 18:37 monohouse File Added: springsettings-nvidia.cfg
2013-11-27 22:39 Kloot Note Added: 0012228
2013-11-27 22:39 Kloot Note Edited: 0012228
2013-11-27 22:40 Kloot Note Edited: 0012228
2013-11-28 02:08 silentwings Note Added: 0012235
2013-11-28 03:33 user744 Note Added: 0012236
2013-11-28 04:22 silentwings Note Added: 0012237
2013-11-28 04:23 silentwings Note Edited: 0012237
2013-11-28 04:24 silentwings Note Edited: 0012237
2013-11-28 16:11 silentwings Note Added: 0012253
2013-11-28 17:58 Kloot Note Added: 0012256
2013-11-28 18:04 Kloot Note Edited: 0012256
2013-11-28 18:09 Kloot Note Edited: 0012256
2013-11-30 16:05 silentwings Note Added: 0012276
2013-11-30 16:06 silentwings Note Edited: 0012276
2013-11-30 16:06 silentwings Note Edited: 0012276
2013-11-30 16:08 silentwings Note Edited: 0012276
2013-11-30 16:09 silentwings Note Edited: 0012276
2013-11-30 16:11 silentwings Note Edited: 0012276
2013-11-30 16:12 silentwings Note Edited: 0012276
2013-11-30 20:11 Kloot Note Added: 0012278
2013-11-30 20:32 Kloot Note Edited: 0012278
2013-11-30 23:00 silentwings Note Added: 0012279
2013-11-30 23:00 silentwings Note Edited: 0012279
2013-11-30 23:52 Kloot Note Added: 0012280
2013-11-30 23:52 Kloot Status new => resolved
2013-11-30 23:52 Kloot Fixed in Version => 95.0.1+git
2013-11-30 23:52 Kloot Resolution open => fixed
2013-11-30 23:52 Kloot Assigned To => Kloot