98.0 Performance

98.0 Performance

Discuss the source code and development of Spring Engine in general from a technical point of view. Patches go here too.

Moderator: Moderators

hokomoko
Spring Developer
Posts: 593
Joined: 02 Jun 2014, 00:46

98.0 Performance

Post by hokomoko »

After some profiling

I've pinpointed the main cycles hog to be the lua mutexes, but apprently the worker threads don't run any lua, so is there a need for these mutexes at the moment?

Their removal results in significant performance improvement, and from some tests I haven't seen any desyncs (even in games between a version with mutexes and one without).

Even if they're needed and I've missed something, they can probably be heavily optimised.

Thanks!
User avatar
PicassoCT
Journeywar Developer & Mapper
Posts: 10454
Joined: 24 Jan 2006, 21:12

Re: 98.0 Performance

Post by PicassoCT »

You guys rock :)

Did i mention that jw will quit spring, when you dont squeeze another 10 % from the engine code?
gajop
Moderator
Posts: 3051
Joined: 05 Aug 2009, 20:42

Re: 98.0 Performance

Post by gajop »

Very interesting. Even if it turns out not to be completely correct, it might offer additional insight into how testing is done. Have you tried it? Does it feel faster?
hokomoko
Spring Developer
Posts: 593
Joined: 02 Jun 2014, 00:46

Re: 98.0 Performance

Post by hokomoko »

The tests were conducted on a machine with a decent CPU (i7 2760) and a crappy GPU (Intel), so graphical operations take a long time. Using ZK's Benchmarker which lets 12 AIs duke it out, the average fps for both 91.0 and patched 98.0 was ~12, while for unpatched it was ~6.

Update took ~50 ms in all.
with 10ms per frame (91, no lock 98), you have 300ms taken every second which leaves 700/50=14 update frames.
with 20ms per frame (current 98), you have 600ms taken every second which leaves 400/50 = 8 update frames.
Deduct 100ms for assorted things and you get the aforementioned values.

So yes, if definitely felt faster, and the data agrees.

btw, since I profiled according to prints, I had to comment out the info console update since everything got clogged up if I didn't (I suspect it's to do with console widgets which try to align the text).
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: 98.0 Performance

Post by abma »

hokomoko wrote: I've pinpointed the main cycles hog to be the lua mutexes, but apprently the worker threads don't run any lua, so is there a need for these mutexes at the moment?

jk should answer this. idk.
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: 98.0 Performance

Post by abma »

for the reference, here these locks were added:

https://github.com/spring/spring/commit ... 888174R747

was added from spring 94->95.0 it seems (luasynced split merge)
hokomoko
Spring Developer
Posts: 593
Joined: 02 Jun 2014, 00:46

Re: 98.0 Performance

Post by hokomoko »

relevant #sy discussion:

Code: Select all

[23:29:49] <ashdnazg> I'll leave this here http://springrts.com/phpbb/viewtopic.php?f=12&t=32922
[23:30:11] <ashdnazg> [LCC]jK your feedback is appreciated ^
[23:35:50] <[LCC]jK> when you claim something you always have to say what you did
[23:37:18] <ashdnazg> what did I forget to say?
[23:37:45] <[LCC]jK> raw facts?
[23:37:49] <[LCC]jK> = code?
[23:37:54] <ashdnazg> diffs
[23:37:57] <ashdnazg> are there
[23:38:11] <ashdnazg> http://zero-k.info/Forum/Thread/10027#115894
[23:38:19] <ashdnazg> http://zero-k.info/Forum/Thread/10027#115935
[23:38:25] <[LCC]jK> too less info
[23:38:44] <ashdnazg> what info do you need?
[23:38:57] <ashdnazg> you git pull latest develop
[23:38:59] <ashdnazg> apply patch
[23:39:02] <ashdnazg> and compile
[23:40:45] <[LCC]jK> k you just reverted edited the lua lib
[23:41:20] <ashdnazg> note the second diff
[23:41:49] <ashdnazg> only commented out the lua mutex #define
[23:42:17] <ashdnazg> rest of changes aren't as significant
[23:45:50] <[LCC]jK> it's always said that locking a mutex (when no other has locked it currently) is `for free`
[23:46:04] <[LCC]jK> maybe boost::mutex implementation on windows sucks
[23:46:41] ** [PRO]Jools left the channel (Connection timed out).
[23:47:02] <ashdnazg> possible, but I believe I have data on linux to show similar if somewhat smaller effects
[23:50:17] <[LCC]jK> http://stackoverflow.com/a/878228/3650440
[23:52:34] <ashdnazg> what's the performance implications of that?
[23:53:23] <[LCC]jK> http://stackoverflow.com/questions/9997473/stdmutex-performance-compared-to-win32-critical-section
[23:53:44] <[LCC]jK> that's not gcc, but the numbers are fearing
[23:54:12] <[LCC]jK> for sure it's not for free when calling windows kernel
[23:57:20] <ashdnazg> I have perf logs thanks to ikinz with a big presence for LuaMutexLock and Unlock
[23:57:45] <ashdnazg> but the benchmarks there are less accurate
[23:58:00] <ashdnazg> is there currently a reason for the mutexes to stay?
[23:59:50] <[LCC]jK> yes
[00:02:23] <ashdnazg> elaborate please
[00:04:31] <[LCC]jK> GC & checking if events are used
[00:06:25] <ashdnazg> when can they run in parallel?
[00:07:15] <ashdnazg> or alternatively, can the mutexes be limited to these scenarios?
Kloot
Spring Developer
Posts: 1867
Joined: 08 Oct 2006, 16:58

Re: 98.0 Performance

Post by Kloot »

You can use this branch for testing.

And FTR,
[23:58:00] <ashdnazg> is there currently a reason for the mutexes to stay?
[00:04:31] <[LCC]jK> GC & checking if events are used
doesn't make any sense: GC updates run in the main thread, so do the event checks.

At present LoadingMT=1 is the only case where the locks matter, and even then marginally.
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: 98.0 Performance

Post by abma »

abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: 98.0 Performance

Post by abma »

jk added a small test which tested performance, thats the output for me:

Code: Select all

test_Mutex.exe |more
Running 1 test case...
raw: 241 ms
mutex: 1745 ms
recursive_mutex: 2645 ms
critical section: 683 ms
windows 7/64bit on an amd x4 945

on the buildslave the test is run via wine and outputs this:

Code: Select all

17: raw: 256 ms
17: boost::mutex: 2046 ms
17: boost::recursive_mutex: 3992 ms
17: critical section: 895 ms



if someone wants to test is as well and post the result:

todo, download:
http://springrts.com/dl/buildbot/defaul ... itTests.7z

extract test_Mutex.exe

run

Code: Select all

test_Mutex.exe |more 
test_Mutex.exe is no console app, this is why |more is needed.

but very likely the output is very similar.
Google_Frog
Moderator
Posts: 2464
Joined: 12 Oct 2007, 09:24

Re: 98.0 Performance

Post by Google_Frog »

My output:

Code: Select all

Running 1 test case...
raw: 116 ms
boost::mutex: 1026 ms
boost::recursive_mutex: 1715 ms
critical section: 434 ms
User avatar
very_bad_soldier
Posts: 1397
Joined: 20 Feb 2007, 01:10

Re: 98.0 Performance

Post by very_bad_soldier »

Similar results here:

Code: Select all

Running 1 test case...
raw: 88 ms
boost::mutex: 770 ms
boost::recursive_mutex: 1297 ms
critical section: 332 ms
abma
Spring Developer
Posts: 3798
Joined: 01 Jun 2009, 00:08

Re: 98.0 Performance

Post by abma »

thanks! seems its worth to use critical section on windows.
Kloot
Spring Developer
Posts: 1867
Joined: 08 Oct 2006, 16:58

Re: 98.0 Performance

Post by Kloot »

Code: Select all

raw: 117 ms
boost::mutex: 331 ms
boost::recursive_mutex: 884 ms
std::mutex: 325 ms
std::recursive_mutex: 341 ms
futex: 272 ms
1) so much for the "locking a free mutex is cheap" theory (never believe any claim without profiling it yourself)
2) what the hell is wrong with boost::recursive_mutex?
Super Mario
Posts: 823
Joined: 21 Oct 2008, 02:54

Re: 98.0 Performance

Post by Super Mario »

Kloot wrote: 2) what the hell is wrong with boost::recursive_mutex?
Maybe it has something to do with the bug that has been fix with version 1.5.7.0
User avatar
Silentwings
Posts: 3720
Joined: 25 Oct 2008, 00:23

Re: 98.0 Performance

Post by Silentwings »

Code: Select all

Running 1 test case...
raw: 187 ms
boost::mutex: 1326 ms
boost::recursive_mutex: 2231 ms
critical section: 561 ms
Repeated 3 times with essentially identical results, on Win7, i7-3740QM. I had a look around some blogs etc etc and several places recommended using critical section objects with windows.
gajop
Moderator
Posts: 3051
Joined: 05 Aug 2009, 20:42

Re: 98.0 Performance

Post by gajop »

Shitlaptop:

Code: Select all

Running 1 test case...
raw: 551 ms
boost::mutex: 4087 ms
boost::recursive_mutex: 7007 ms
critical section: 1682 ms
Super Mario
Posts: 823
Joined: 21 Oct 2008, 02:54

Re: 98.0 Performance

Post by Super Mario »

Code: Select all

raw: 218 ms
boost::mutex: 1342 ms
boost::recursive_mutex: 2293 ms
critical section: 515 ms
User avatar
FLOZi
MC: Legacy & Spring 1944 Developer
Posts: 6241
Joined: 29 Apr 2005, 01:14

Re: 98.0 Performance

Post by FLOZi »

Code: Select all

raw: 96 ms
boost::mutex: 798 ms
boost::recursive_mutex: 1336 ms
critical section: 343 ms
i5 3570K
Misolavera
Posts: 8
Joined: 20 Aug 2014, 22:21

Re: 98.0 Performance

Post by Misolavera »

repeated several times:
Win7
AMD Phenom II X4 965

Code: Select all

Running 1 test case...
raw: 203 ms +-10ms
boost::mutex: 1516 ms +-20ms
boost::recursive_mutex: 2240 ms +-80ms
critical section: 492 ms +-10ms
Post Reply

Return to “Engine”