View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0006431 | Spring engine | General | public | 2021-01-03 04:26 | 2021-01-04 05:11 | ||||||||
Reporter | FabriceFABS | ||||||||||||
Assigned To | |||||||||||||
Priority | normal | Severity | minor | Reproducibility | always | ||||||||
Status | feedback | Resolution | reopened | ||||||||||
Product Version | 105.0 | ||||||||||||
Target Version | Fixed in Version | ||||||||||||
Summary | 0006431: memleak in unitsync? | ||||||||||||
Description | FYI, we worked with Bibim a couple of hours on it. All is in the title... : Unitsync fails randomly when starting spads multi instances After some work, I/we've found several tests : #### TEST 1 #### If I got my regular ./spring/maps number in it (~ 500), on first instance of spads launch, it runs, for exemple [ACE]Been, no problem. FYI [SPADS] Spring archives loading process took 16 minutes and 53 seconds to generate its first cache. A second instance of spads, for exemple [ACE]Mettalic, fails on this step : ****************** NOTICE - [SPADS] Initializing SPADS 0.12.21 NOTICE - [SPADS] Loading Spring archives using unitsync library version 105.0 Processus arrêté (process ended) ****************** No log available, however, bibim modified code that permits to see that it fails during the scan of the maps archives. We had to remove bilateral.sdz (1st fail «Processus arrêté») and on Eisland.sdz (2nd fail «Processus arrêté»). We notice that finally this instance of [ACE]Metallic can be run without those 2 maps. I opened those 2 maps, and didn't found actually anything wrong, to be checked. Both are in "old" format : ie > only 3 files in it. #### TEST 2 #### ./spring/maps contains only 6 maps : Comet Catcher Redux, DeltaSiegeDry, Red Comet, Red Comet v1.3, Throne v5 & TheRockFinal. Nothing wrong to run several instance, no problem. However, I found that it stops a long time on the archives, on certains hosts, it seems frozen on this step, even if there are only 5 maps. #### TEST 3 #### Remove of .spring[Pool/Rapid/Packages] (mv to Temp) Launching [ACE]Been => Unbelievable... Took 10s to scan maps, surely due that the cache already done above. Next instance of spads failed the same. Replace back [Pool/Rapid/Packages] into ./spring All spads server has been !quit #### TEST 4 #### After this, run only [ACE]Been on 105.0 > OK Another 105.0 instance is trying to launch on [ACE]Fusion but failed around this map archive scan : ... mapName=4B_facility mapName=4_bases_v3 mapName=Aberdeen3v3v3 mapName=Aberdeen6v6_Fix Processus arrêté #### TEST 5 #### Run only [ACE]Been on 105.0 > OK Archive cash already done and ok, scan took ~1 sec during spads launch. cp of [ACE]Been/cache/105rel- to [ACE]*/cache/105rel- Result : All other hosts runs correctly and launched fine and fast like [ACE]Been. *** CONCLUSION *** So... I know that #### TEST 5 #### is clearly not the solution, but it works. There's something wrong with unitsync scans archives when there's already a spads's host running fine with 105.0 engine. Sorry for my english, don't hesitate to ask me more if I'm not understandable enough or for some extra test. | ||||||||||||
Steps To Reproduce | 1. You need to run more than 1 instance of Spads to reproduce it, 2. Others instances that don't have archives in cache will fail randomly, 3. Others instance that already got the archive (#### TEST 5 ####) cache will run. | ||||||||||||
Additional Information | Very tired, time to sleep... :-O | ||||||||||||
Tags | No tags attached. | ||||||||||||
Checked infolog.txt for Errors | Irrelevant | ||||||||||||
Attached Files |
|
Notes | |
FabriceFABS (reporter) 2021-01-03 14:44 |
--- 3 new additives tests which deserves all its importance --- ### TEST 6 ### I stopped [ACE]Beep, and placed back Eisland.sd7 and bilateral.sdz in .spring/maps (In fact I forget also to make my tests with those 2x problematics maps but I do it now) When launching back Spads, it stops during map scanning @Eisland.sd7 with the floowing output : *** ... mapName=EE-Deserted_Sinkhole-v02 mapName=Earatope_V3 mapName=Easter Bunny v2 Horse mapName=Eisland Processus arrêté $ *** Same for bilateral.sdz. ### TEST 7 ### During the file operation above while replacing those 2 problematics maps, Spads currently running on others instances (105) detected new maps, and updates cache. This generated a crash with the following : *** ... mapName=EE-Deserted_Sinkhole-v02 mapName=Earatope_V3 mapName=Easter Bunny v2 Horse mapName=Eisland ERROR - [SPADS] Failed to load archives, undefined mods data (unitsync library crash ?) NOTICE - [SPADS] AutoHost shutdown scheduled (reason: Unable to auto-reload Spring archives) Can't use an undefined value as a subroutine reference at /Spads/SimpleEvent.pm line 536. $ *** Some hosts crashed, some others seems ok **/Spads/SimpleEvent.pm** sub _checkSimpleSockets { if(%sockets) { my @pendingSockets=IO::Select->new(keys %sockets)->can_read($conf{timeSlice}); foreach my $pendingSock (@pendingSockets) { &{$sockets{$pendingSock}}($pendingSock); <<<<<<<<< Line 536 } }else{ Time::HiRes::usleep($conf{timeSlice} * 1_000_000); } } ************************ ### TEST 8 ### Bad maps above removed, spads map's cache directory renamed. Running @ the same time [ACE]Been & [ACE]Censeur. They are both generating map cache... Cache done ~ 16 min, 2x hosts running well. So the final issue is : Some maps makes problem with Spring105 unitsync and make it crashes during the (RE)/Generation of the cache. |
abma (administrator) 2021-01-03 18:29 |
can you provide unitsync.log after it "crashed" please? it should be in ~/.config/spring yeah, because of the new release, the old cache isn't used any more. |
abma (administrator) 2021-01-03 18:41 |
for the reference: https://api.springfiles.com/?filename=bilateral.sdz https://api.springfiles.com/?filename=eisland.sd7 |
abma (administrator) 2021-01-03 19:09 |
out of memory? apparmor? -> check the ouput of dmesg out of disk space? -> check the output of "df" where does the error message "Failed to load archives, undefined mods data (unitsync library crash ?)" come from? i can't find such a string in spads or springs git repos: https://github.com/Yaribz/SPADS/search?q=Failed+to+load+archives https://github.com/spring/spring/search?q=Failed+to+load+archives |
abma (administrator) 2021-01-03 19:17 |
it was OOM: [19:15:41] <[ACE]FabriceFABS> Jan 3 19:05:07 FABS-Serveur kernel: [ 2172.050413] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=perl,pid=70446,uid=1000 [19:15:41] <[ACE]FabriceFABS> Jan 3 19:05:07 FABS-Serveur kernel: [ 2172.050423] Out of memory: Killed process 70446 (perl) total-vm:354776kB, anon-rss:285496kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:660kB oom_score_adj:0 |
FabriceFABS (reporter) 2021-01-04 00:26 |
However, I allow myself to add a note to this diagnosis. I did monitor seriously the processes, and return the following results : - An instance of Spads consumes ~55MB of RAM without any map cache generate. - An instance of Spads running unitsync for generate the map cache is not the same : During the scan, Spads with Unitsync running uses between 80MB and upto 450MB peaks, but in general it stays around the use of a range of 150MB to 250MB during this process. Once the map scan is performed with the generation of the cached map file from unitsync, Unitsync does not flush memory it used and keep using ~160MB, even after the caching work has been performed. Only way to get back memory is to kill the instance of Spads and restart it. This is not normal and may be problematic. UnitSync won't have to run just after, so it will be ok for normal memory usage. |
FabriceFABS (reporter) 2021-01-04 00:33 |
Have a loop of the map memory allocation once the unitsync process ended : Heap : 140MB, normal use is 50MB on any other perl instance that didn't run unitsync. |
abma (administrator) 2021-01-04 05:10 |
> Once the map scan is performed with the generation of the cached map file from unitsync, Unitsync does not flush memory it used and keep using ~160MB, even after the caching work has been performed. as the peak usage is much higher than permanent usage when map files are scanned this IMHO is ok. a quick lock into the source code: https://github.com/Yaribz/SPADS/blob/1ab55e3c3598c7305d8b9d0d4ed1feefc52bb80d/src/getDefaultModOptions.pl#L376 PerlUnitSync::RemoveAllArchives(); isn't called before PerlUnitSync::UnInit() for the last "availableMods": this maybe cleans up the 100MB. i had several times run unitsync compiled with -fsanitize=memory which shows memory leaks and the last time i remember had none. as long as it doesn't permanent leak memory and memory usage increases over time, i see this as a minor issue. my time is way to limited to investigate it. if you want to investigate it more: compile unitsync with CMAKE_BUILD_TYPE=DEBUG and CMAKE_CXX_FLAGS=-fsanitize=address and provide the output of "direct leaks". |
Issue History | |||
Date Modified | Username | Field | Change |
---|---|---|---|
2021-01-03 04:26 | FabriceFABS | New Issue | |
2021-01-03 14:44 | FabriceFABS | Note Added: 0020577 | |
2021-01-03 18:29 | abma | Note Added: 0020578 | |
2021-01-03 18:41 | abma | Note Added: 0020579 | |
2021-01-03 19:09 | abma | Assigned To | => abma |
2021-01-03 19:09 | abma | Status | new => feedback |
2021-01-03 19:09 | abma | Note Added: 0020580 | |
2021-01-03 19:17 | abma | Status | feedback => resolved |
2021-01-03 19:17 | abma | Resolution | open => fixed |
2021-01-03 19:17 | abma | Note Added: 0020581 | |
2021-01-04 00:26 | FabriceFABS | Status | resolved => feedback |
2021-01-04 00:26 | FabriceFABS | Resolution | fixed => reopened |
2021-01-04 00:26 | FabriceFABS | Note Added: 0020582 | |
2021-01-04 00:33 | FabriceFABS | File Added: Capture.jpg | |
2021-01-04 00:33 | FabriceFABS | Note Added: 0020583 | |
2021-01-04 00:33 | FabriceFABS | Status | feedback => assigned |
2021-01-04 05:10 | abma | Assigned To | abma => |
2021-01-04 05:10 | abma | Status | assigned => feedback |
2021-01-04 05:10 | abma | Note Added: 0020584 | |
2021-01-04 05:11 | abma | Summary | Unitsync fails randomly when starting spads multi instances => memleak in unitsync? |
2021-01-04 05:11 | abma | Checked infolog.txt for Errors | Irrelevant => Irrelevant |
2021-01-04 05:11 | abma | Priority | high => normal |
2021-01-04 05:11 | abma | Severity | crash => minor |
2021-01-04 05:11 | abma | Checked infolog.txt for Errors | Irrelevant => Irrelevant |