0006431: memleak in unitsync? - MantisBT

2025-08-03 21:48 CEST

View Issue Details [ Jump to Notes ]

[ Issue History ] [ Print ]

ID

Project

Category

View Status

Date Submitted

Last Update

0006431

Spring engine

General

public

2021-01-03 04:26

2021-01-04 05:11

Reporter

FabriceFABS

Assigned To

Priority

normal

Severity

minor

Reproducibility

always

Status

feedback

Resolution

reopened

Product Version

105.0

Target Version

Fixed in Version

Summary

0006431: memleak in unitsync?

Description

FYI, we worked with Bibim a couple of hours on it.
All is in the title... :
Unitsync fails randomly when starting spads multi instances

After some work, I/we've found several tests :

#### TEST 1 ####
If I got my regular ./spring/maps number in it (~ 500), on first instance of spads launch, it runs, for exemple [ACE]Been, no problem.
FYI [SPADS] Spring archives loading process took 16 minutes and 53 seconds to generate its first cache.

A second instance of spads, for exemple [ACE]Mettalic, fails on this step :
******************
NOTICE - [SPADS] Initializing SPADS 0.12.21
NOTICE - [SPADS] Loading Spring archives using unitsync library version 105.0
Processus arrêté (process ended)
******************
No log available, however, bibim modified code that permits to see that it fails during the scan of the maps archives.
We had to remove bilateral.sdz (1st fail «Processus arrêté») and on Eisland.sdz (2nd fail «Processus arrêté»).
We notice that finally this instance of [ACE]Metallic can be run without those 2 maps.
I opened those 2 maps, and didn't found actually anything wrong, to be checked.
Both are in "old" format : ie > only 3 files in it.

#### TEST 2 ####
./spring/maps contains only 6 maps :
Comet Catcher Redux, DeltaSiegeDry, Red Comet, Red Comet v1.3, Throne v5 & TheRockFinal.
Nothing wrong to run several instance, no problem.
However, I found that it stops a long time on the archives, on certains hosts, it seems frozen on this step, even if there are only 5 maps.

#### TEST 3 ####
Remove of .spring[Pool/Rapid/Packages] (mv to Temp)
Launching [ACE]Been => Unbelievable... Took 10s to scan maps, surely due that the cache already done above.
Next instance of spads failed the same.
Replace back [Pool/Rapid/Packages] into ./spring
All spads server has been !quit

#### TEST 4 ####
After this, run only [ACE]Been on 105.0 > OK
Another 105.0 instance is trying to launch on [ACE]Fusion but failed around this map archive scan :
...
mapName=4B_facility
mapName=4_bases_v3
mapName=Aberdeen3v3v3
mapName=Aberdeen6v6_Fix
Processus arrêté

#### TEST 5 ####
Run only [ACE]Been on 105.0 > OK
Archive cash already done and ok, scan took ~1 sec during spads launch.
cp of [ACE]Been/cache/105rel- to [ACE]*/cache/105rel-
Result : All other hosts runs correctly and launched fine and fast like [ACE]Been.

*** CONCLUSION ***
So... I know that #### TEST 5 #### is clearly not the solution, but it works.
There's something wrong with unitsync scans archives when there's already a spads's host running fine with 105.0 engine.

Sorry for my english, don't hesitate to ask me more if I'm not understandable enough or for some extra test.

Steps To Reproduce

1. You need to run more than 1 instance of Spads to reproduce it,
2. Others instances that don't have archives in cache will fail randomly,
3. Others instance that already got the archive (#### TEST 5 ####) cache will run.

Additional Information

Very tired, time to sleep... :-O

Tags

No tags attached.

Checked infolog.txt for Errors

Irrelevant

Attached Files

Capture.jpg (375,220 bytes) 2021-01-04 00:33

Relationships

Relationships

Notes
~0020577 FabriceFABS (reporter) 2021-01-03 14:44	--- 3 new additives tests which deserves all its importance --- ### TEST 6 ### I stopped [ACE]Beep, and placed back Eisland.sd7 and bilateral.sdz in .spring/maps (In fact I forget also to make my tests with those 2x problematics maps but I do it now) When launching back Spads, it stops during map scanning @Eisland.sd7 with the floowing output : * ... mapName=EE-Deserted_Sinkhole-v02 mapName=Earatope_V3 mapName=Easter Bunny v2 Horse mapName=Eisland Processus arrêté $ * Same for bilateral.sdz. ### TEST 7 ### During the file operation above while replacing those 2 problematics maps, Spads currently running on others instances (105) detected new maps, and updates cache. This generated a crash with the following : * ... mapName=EE-Deserted_Sinkhole-v02 mapName=Earatope_V3 mapName=Easter Bunny v2 Horse mapName=Eisland ERROR - [SPADS] Failed to load archives, undefined mods data (unitsync library crash ?) NOTICE - [SPADS] AutoHost shutdown scheduled (reason: Unable to auto-reload Spring archives) Can't use an undefined value as a subroutine reference at /Spads/SimpleEvent.pm line 536. $ * Some hosts crashed, some others seems ok /Spads/SimpleEvent.pm sub _checkSimpleSockets { if(%sockets) { my @pendingSockets=IO::Select->new(keys %sockets)->can_read($conf{timeSlice}); foreach my $pendingSock (@pendingSockets) { &{$sockets{$pendingSock}}($pendingSock); <<<<<<<<< Line 536 } }else{ Time::HiRes::usleep($conf{timeSlice} * 1_000_000); } } ************************ ### TEST 8 ### Bad maps above removed, spads map's cache directory renamed. Running @ the same time [ACE]Been & [ACE]Censeur. They are both generating map cache... Cache done ~ 16 min, 2x hosts running well. So the final issue is : Some maps makes problem with Spring105 unitsync and make it crashes during the (RE)/Generation of the cache.

~0020578 abma (administrator) 2021-01-03 18:29	can you provide unitsync.log after it "crashed" please? it should be in ~/.config/spring yeah, because of the new release, the old cache isn't used any more.

~0020579 abma (administrator) 2021-01-03 18:41	for the reference: https://api.springfiles.com/?filename=bilateral.sdz https://api.springfiles.com/?filename=eisland.sd7

~0020580 abma (administrator) 2021-01-03 19:09	out of memory? apparmor? -> check the ouput of dmesg out of disk space? -> check the output of "df" where does the error message "Failed to load archives, undefined mods data (unitsync library crash ?)" come from? i can't find such a string in spads or springs git repos: https://github.com/Yaribz/SPADS/search?q=Failed+to+load+archives https://github.com/spring/spring/search?q=Failed+to+load+archives

~0020581 abma (administrator) 2021-01-03 19:17	it was OOM: [19:15:41] <[ACE]FabriceFABS> Jan 3 19:05:07 FABS-Serveur kernel: [ 2172.050413] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=perl,pid=70446,uid=1000 [19:15:41] <[ACE]FabriceFABS> Jan 3 19:05:07 FABS-Serveur kernel: [ 2172.050423] Out of memory: Killed process 70446 (perl) total-vm:354776kB, anon-rss:285496kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:660kB oom_score_adj:0

~0020582 FabriceFABS (reporter) 2021-01-04 00:26	However, I allow myself to add a note to this diagnosis. I did monitor seriously the processes, and return the following results : - An instance of Spads consumes ~55MB of RAM without any map cache generate. - An instance of Spads running unitsync for generate the map cache is not the same : During the scan, Spads with Unitsync running uses between 80MB and upto 450MB peaks, but in general it stays around the use of a range of 150MB to 250MB during this process. Once the map scan is performed with the generation of the cached map file from unitsync, Unitsync does not flush memory it used and keep using ~160MB, even after the caching work has been performed. Only way to get back memory is to kill the instance of Spads and restart it. This is not normal and may be problematic. UnitSync won't have to run just after, so it will be ok for normal memory usage.

~0020583 FabriceFABS (reporter) 2021-01-04 00:33	Have a loop of the map memory allocation once the unitsync process ended : Heap : 140MB, normal use is 50MB on any other perl instance that didn't run unitsync.

~0020584 abma (administrator) 2021-01-04 05:10	> Once the map scan is performed with the generation of the cached map file from unitsync, Unitsync does not flush memory it used and keep using ~160MB, even after the caching work has been performed. as the peak usage is much higher than permanent usage when map files are scanned this IMHO is ok. a quick lock into the source code: https://github.com/Yaribz/SPADS/blob/1ab55e3c3598c7305d8b9d0d4ed1feefc52bb80d/src/getDefaultModOptions.pl#L376 PerlUnitSync::RemoveAllArchives(); isn't called before PerlUnitSync::UnInit() for the last "availableMods": this maybe cleans up the 100MB. i had several times run unitsync compiled with -fsanitize=memory which shows memory leaks and the last time i remember had none. as long as it doesn't permanent leak memory and memory usage increases over time, i see this as a minor issue. my time is way to limited to investigate it. if you want to investigate it more: compile unitsync with CMAKE_BUILD_TYPE=DEBUG and CMAKE_CXX_FLAGS=-fsanitize=address and provide the output of "direct leaks".

Notes

Date Modified	Username	Field	Change
Issue History
2021-01-03 04:26	FabriceFABS	New Issue
2021-01-03 14:44	FabriceFABS	Note Added: 0020577
2021-01-03 18:29	abma	Note Added: 0020578
2021-01-03 18:41	abma	Note Added: 0020579
2021-01-03 19:09	abma	Assigned To	=> abma
2021-01-03 19:09	abma	Status	new => feedback
2021-01-03 19:09	abma	Note Added: 0020580
2021-01-03 19:17	abma	Status	feedback => resolved
2021-01-03 19:17	abma	Resolution	open => fixed
2021-01-03 19:17	abma	Note Added: 0020581
2021-01-04 00:26	FabriceFABS	Status	resolved => feedback
2021-01-04 00:26	FabriceFABS	Resolution	fixed => reopened
2021-01-04 00:26	FabriceFABS	Note Added: 0020582
2021-01-04 00:33	FabriceFABS	File Added: Capture.jpg
2021-01-04 00:33	FabriceFABS	Note Added: 0020583
2021-01-04 00:33	FabriceFABS	Status	feedback => assigned
2021-01-04 05:10	abma	Assigned To	abma =>
2021-01-04 05:10	abma	Status	assigned => feedback
2021-01-04 05:10	abma	Note Added: 0020584
2021-01-04 05:11	abma	Summary	Unitsync fails randomly when starting spads multi instances => memleak in unitsync?
2021-01-04 05:11	abma	Checked infolog.txt for Errors	Irrelevant => Irrelevant
2021-01-04 05:11	abma	Priority	high => normal
2021-01-04 05:11	abma	Severity	crash => minor
2021-01-04 05:11	abma	Checked infolog.txt for Errors	Irrelevant => Irrelevant

Issue History