@LockFreeLua

Post by jK » 25 Jan 2015, 15:14

First
I assume that mutex is now same as fast on windows as on linux, see:
* window example (source here)

raw: 551 ms
boost::mutex: 4087 ms
boost::recursive_mutex: 7007 ms
critical section: 1682 ms

-> overhead ~300%
* linux example (source here)

raw: 210 ms
spring::mutex: 600 ms
spring::recursive_mutex: 635 ms
boost::mutex: 648 ms
boost::recursive_mutex: 860 ms
std::mutex: 614 ms
std::recursive_mutex: 634 ms
futex: 469 ms

-> overhead ~300%

So cause Spring now uses critsect on windows, we can say any result based on benchmarks done on linux regarding locking are transferable between linux&windows.

Second
So I did a benchmarks of the real overhead of locking by running a demo multiple times, once with locking enabled and once with it disabled. The results are:

While red ones are LockFree. For the FPS this means:
1000/29ms = 34.5FPS (Locking)
1000/27ms = 37.0FPS (LockFree)
So there is a difference of 2.5FPS, hardly `pretty bad` as someone said. Still fanatics might say it would be worth it.

Third
So I made another benchmark of other possible microoptimizations:
* first one is tcmalloc it's a google replacement for the default gcc malloc
* second one is setting the `march` flag, so gcc can optimize the code just for your cpu
The results are:

22.5ms versus 25ms -> nearly the same difference as LockFree

Fourth
So LockFree is not better than other micro-optimizations, and some people expect ways too much from it.

Post by **hokomoko** » 25 Jan 2015, 15:30

a) The difference between 37 and 34 is indeed insignificant, I agree, but these are the values on your machine which is pretty powerful. The difference is both more significant when it's 12 instead of 15 and also may be larger on slower machines.
b) If you compare lengths of slow frames, the difference is more noticeable, and that causes small lag spikes which can't be shown in FPS resolution.

Instead of going out of your way to show why it doesn't have to be removed, can you show what the benefit from it staying is?

Post by **Google_Frog** » 26 Jan 2015, 01:06

I should mention that CAI 3v3 is outdated now. We now use an East vs West 6v6 on CCR with a constant start factory.

https://github.com/ZeroK-RTS/Benchmarks ... iFight.sdd
https://github.com/ZeroK-RTS/Benchmarks ... o_plop.txt

Post by **gajop** » 26 Jan 2015, 01:26

Use the microoptimization without locks 22.5ms thing until Lua multithreading can be added?
I'd like Lua multithreading for more than just optimization reasons, but until we can have that, I'll take speed!

raaar · Post by **raaar** » 26 Jan 2015, 01:59

if you disable the locks for performance reasons, can it cause desyncs between players in the game?

if it's going to cause desyncs it's not worth it.

Super Mario · Post by **Super Mario** » 26 Jan 2015, 02:01

gajop wrote:Use the microoptimization without locks 22.5ms thing until Lua multithreading can be added?
I'd like Lua multithreading for more than just optimization reasons, but until we can have that, I'll take speed!

Are you referring to the JITLua or something else entirely?

Silentwings · Post by **Silentwings** » 26 Jan 2015, 09:05

@raaar: Nothing will ever be included into the engine if it known to cause desyncs.

@SuperMario: No, LuaJIT is something else entirely.

Super Mario · Post by **Super Mario** » 26 Jan 2015, 23:29

Then what is he referring to exactly? I do not recall anything about the eng dev adding lua multthreading.;

Silentwings · Post by **Silentwings** » 27 Jan 2015, 00:15

Then what is he referring to exactly?

http://luajit.org/ and https://github.com/spring/spring/commits/LuaJIT. It's what it says on the tin, unfortunately it doesn't help Spring, which is why the branch was not merged.

gajop is referring to locks currently in Springs lua, which are there for LoadingMT and (maybe, idk) as preparation for future threading.

Silentwings · Post by **Silentwings** » 27 Jan 2015, 00:23

So, I tested this with my recorder/replay tool (http://springrts.com/phpbb/viewtopic.ph ... 35#p563041) and 60 sec of a heavy-ish 8v8 demo on BA 8.07. Results were

98.0: 40fps
98-0.1-353 (current develop): 45fps
98-0.1-360 (current LockFreeLua): 55fps

The results were very reproducible, I ran three times and each time was with 2s of the average. I'm running win7 on a fairly high powered laptop. In all cases the simspeed was 1.00 right through. I haven't followed the discussion (too busy irl) so I won't claim anything about why/what/how the numbers happened. It's not a perfect system for perf testing, but nor is anything else.

I can run other peoples tests if they provide the stuff to do so.

Post by **Google_Frog** » 27 Jan 2015, 01:28

I have run a benchmark and called upon others to do the same. Here is the page https://docs.google.com/spreadsheets/d/ ... edit#gid=0

98.0.1-353-gcae51average FPS = 33.65865888
91.0 average FPS = 39.51242943
98.0.1-360-g0fc313a average FPS = 39.2160327

The difference is 16% (are percentage differences between different FPS ranges even useful?).

Silentwings · Post by **Silentwings** » 27 Jan 2015, 16:40

I have run a benchmark and called upon others to do the same.

I don't know where the files to run your benchmark are kept.

malric · Post by **malric** » 27 Jan 2015, 20:25

It is explained here: https://github.com/ZeroK-RTS/Benchmarks ... k-Jan-2015

abma · Post by **abma** » 27 Jan 2015, 21:57

Google_Frog, Silentwings: i guess you did run the tests on windows?

~15-20% difference doesn't look like results from linux are transferable to windows.

Silentwings · Post by **Silentwings** » 28 Jan 2015, 11:35

My test was on Win7.

Not saying all tests should be relied on but, have there been any results (from anyone) on Windows with Spring itself that showed approximately even performance between current dev and the lock free branch?

Related question: Suppose I just want to test lua locks, isolated from all else, on an empty map with no units around. Is it possible to write an addon to do so or are they buried too deep in engine stuff?

Post by **Google_Frog** » 01 Feb 2015, 03:13

Here is a summary of the recent data. All the data is here. mojjj has windows 8.1 and the rest have windows 7.

FPS is the average number of draw frames per second from 8:00 to 9:00 in the test games. Speedy is the time taken (in seconds) to reach 8:00.

Max DF and Max GF are measures of stuttering. Each engine version was benchmarked with 8 repeated runs. Each of these runs output a list of game frame and draw frame gaps from 8:00 to 9:00. Each of these lists has a maximum entry and that is called the maximum frame gap for the run. Max DF is the average of the maximum draw frame gap for the runs and Max GF is the corresponding value for game frames.

The fps improvement ranges from 5% to 18% so it is fairly inconsistent across machines. But LFL is still an improvement and a significant improvement for some people. Speedy has the same range of improvement which shows that game frames take less time with LFL. The last two values are a bit dodgy but always suggest that LFL runs smoother, this is backed up with Dev vs 91.0 but player feedback.

abma · Post by **abma** » 05 Feb 2015, 02:48

i've merged the LockFreeLua branch into develop. thanks for your help!

*cross fingers that results are the same "in the wild"*

Spring RTS Engine

@LockFreeLua

@LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua

Re: @LockFreeLua