98.0 Performance
Moderator: Moderators
98.0 Performance
After some profiling
I've pinpointed the main cycles hog to be the lua mutexes, but apprently the worker threads don't run any lua, so is there a need for these mutexes at the moment?
Their removal results in significant performance improvement, and from some tests I haven't seen any desyncs (even in games between a version with mutexes and one without).
Even if they're needed and I've missed something, they can probably be heavily optimised.
Thanks!
I've pinpointed the main cycles hog to be the lua mutexes, but apprently the worker threads don't run any lua, so is there a need for these mutexes at the moment?
Their removal results in significant performance improvement, and from some tests I haven't seen any desyncs (even in games between a version with mutexes and one without).
Even if they're needed and I've missed something, they can probably be heavily optimised.
Thanks!
Re: 98.0 Performance
You guys rock :)
Did i mention that jw will quit spring, when you dont squeeze another 10 % from the engine code?
Did i mention that jw will quit spring, when you dont squeeze another 10 % from the engine code?
Re: 98.0 Performance
Very interesting. Even if it turns out not to be completely correct, it might offer additional insight into how testing is done. Have you tried it? Does it feel faster?
Re: 98.0 Performance
The tests were conducted on a machine with a decent CPU (i7 2760) and a crappy GPU (Intel), so graphical operations take a long time. Using ZK's Benchmarker which lets 12 AIs duke it out, the average fps for both 91.0 and patched 98.0 was ~12, while for unpatched it was ~6.
Update took ~50 ms in all.
with 10ms per frame (91, no lock 98), you have 300ms taken every second which leaves 700/50=14 update frames.
with 20ms per frame (current 98), you have 600ms taken every second which leaves 400/50 = 8 update frames.
Deduct 100ms for assorted things and you get the aforementioned values.
So yes, if definitely felt faster, and the data agrees.
btw, since I profiled according to prints, I had to comment out the info console update since everything got clogged up if I didn't (I suspect it's to do with console widgets which try to align the text).
Update took ~50 ms in all.
with 10ms per frame (91, no lock 98), you have 300ms taken every second which leaves 700/50=14 update frames.
with 20ms per frame (current 98), you have 600ms taken every second which leaves 400/50 = 8 update frames.
Deduct 100ms for assorted things and you get the aforementioned values.
So yes, if definitely felt faster, and the data agrees.
btw, since I profiled according to prints, I had to comment out the info console update since everything got clogged up if I didn't (I suspect it's to do with console widgets which try to align the text).
Re: 98.0 Performance
hokomoko wrote: I've pinpointed the main cycles hog to be the lua mutexes, but apprently the worker threads don't run any lua, so is there a need for these mutexes at the moment?
jk should answer this. idk.
Re: 98.0 Performance
for the reference, here these locks were added:
https://github.com/spring/spring/commit ... 888174R747
was added from spring 94->95.0 it seems (luasynced split merge)
https://github.com/spring/spring/commit ... 888174R747
was added from spring 94->95.0 it seems (luasynced split merge)
Re: 98.0 Performance
relevant #sy discussion:
Code: Select all
[23:29:49] <ashdnazg> I'll leave this here http://springrts.com/phpbb/viewtopic.php?f=12&t=32922
[23:30:11] <ashdnazg> [LCC]jK your feedback is appreciated ^
[23:35:50] <[LCC]jK> when you claim something you always have to say what you did
[23:37:18] <ashdnazg> what did I forget to say?
[23:37:45] <[LCC]jK> raw facts?
[23:37:49] <[LCC]jK> = code?
[23:37:54] <ashdnazg> diffs
[23:37:57] <ashdnazg> are there
[23:38:11] <ashdnazg> http://zero-k.info/Forum/Thread/10027#115894
[23:38:19] <ashdnazg> http://zero-k.info/Forum/Thread/10027#115935
[23:38:25] <[LCC]jK> too less info
[23:38:44] <ashdnazg> what info do you need?
[23:38:57] <ashdnazg> you git pull latest develop
[23:38:59] <ashdnazg> apply patch
[23:39:02] <ashdnazg> and compile
[23:40:45] <[LCC]jK> k you just reverted edited the lua lib
[23:41:20] <ashdnazg> note the second diff
[23:41:49] <ashdnazg> only commented out the lua mutex #define
[23:42:17] <ashdnazg> rest of changes aren't as significant
[23:45:50] <[LCC]jK> it's always said that locking a mutex (when no other has locked it currently) is `for free`
[23:46:04] <[LCC]jK> maybe boost::mutex implementation on windows sucks
[23:46:41] ** [PRO]Jools left the channel (Connection timed out).
[23:47:02] <ashdnazg> possible, but I believe I have data on linux to show similar if somewhat smaller effects
[23:50:17] <[LCC]jK> http://stackoverflow.com/a/878228/3650440
[23:52:34] <ashdnazg> what's the performance implications of that?
[23:53:23] <[LCC]jK> http://stackoverflow.com/questions/9997473/stdmutex-performance-compared-to-win32-critical-section
[23:53:44] <[LCC]jK> that's not gcc, but the numbers are fearing
[23:54:12] <[LCC]jK> for sure it's not for free when calling windows kernel
[23:57:20] <ashdnazg> I have perf logs thanks to ikinz with a big presence for LuaMutexLock and Unlock
[23:57:45] <ashdnazg> but the benchmarks there are less accurate
[23:58:00] <ashdnazg> is there currently a reason for the mutexes to stay?
[23:59:50] <[LCC]jK> yes
[00:02:23] <ashdnazg> elaborate please
[00:04:31] <[LCC]jK> GC & checking if events are used
[00:06:25] <ashdnazg> when can they run in parallel?
[00:07:15] <ashdnazg> or alternatively, can the mutexes be limited to these scenarios?
Re: 98.0 Performance
You can use this branch for testing.
And FTR,
At present LoadingMT=1 is the only case where the locks matter, and even then marginally.
And FTR,
doesn't make any sense: GC updates run in the main thread, so do the event checks.[23:58:00] <ashdnazg> is there currently a reason for the mutexes to stay?
[00:04:31] <[LCC]jK> GC & checking if events are used
At present LoadingMT=1 is the only case where the locks matter, and even then marginally.
Re: 98.0 Performance
jk added a small test which tested performance, thats the output for me:
windows 7/64bit on an amd x4 945
on the buildslave the test is run via wine and outputs this:
if someone wants to test is as well and post the result:
todo, download:
http://springrts.com/dl/buildbot/defaul ... itTests.7z
extract test_Mutex.exe
run
test_Mutex.exe is no console app, this is why |more is needed.
but very likely the output is very similar.
Code: Select all
test_Mutex.exe |more
Running 1 test case...
raw: 241 ms
mutex: 1745 ms
recursive_mutex: 2645 ms
critical section: 683 ms
on the buildslave the test is run via wine and outputs this:
Code: Select all
17: raw: 256 ms
17: boost::mutex: 2046 ms
17: boost::recursive_mutex: 3992 ms
17: critical section: 895 ms
if someone wants to test is as well and post the result:
todo, download:
http://springrts.com/dl/buildbot/defaul ... itTests.7z
extract test_Mutex.exe
run
Code: Select all
test_Mutex.exe |more
but very likely the output is very similar.
-
- Moderator
- Posts: 2464
- Joined: 12 Oct 2007, 09:24
Re: 98.0 Performance
My output:
Code: Select all
Running 1 test case...
raw: 116 ms
boost::mutex: 1026 ms
boost::recursive_mutex: 1715 ms
critical section: 434 ms
- very_bad_soldier
- Posts: 1397
- Joined: 20 Feb 2007, 01:10
Re: 98.0 Performance
Similar results here:
Code: Select all
Running 1 test case...
raw: 88 ms
boost::mutex: 770 ms
boost::recursive_mutex: 1297 ms
critical section: 332 ms
Re: 98.0 Performance
thanks! seems its worth to use critical section on windows.
Re: 98.0 Performance
Code: Select all
raw: 117 ms
boost::mutex: 331 ms
boost::recursive_mutex: 884 ms
std::mutex: 325 ms
std::recursive_mutex: 341 ms
futex: 272 ms
2) what the hell is wrong with boost::recursive_mutex?
-
- Posts: 823
- Joined: 21 Oct 2008, 02:54
Re: 98.0 Performance
Maybe it has something to do with the bug that has been fix with version 1.5.7.0Kloot wrote: 2) what the hell is wrong with boost::recursive_mutex?
- Silentwings
- Posts: 3720
- Joined: 25 Oct 2008, 00:23
Re: 98.0 Performance
Code: Select all
Running 1 test case...
raw: 187 ms
boost::mutex: 1326 ms
boost::recursive_mutex: 2231 ms
critical section: 561 ms
Re: 98.0 Performance
Shitlaptop:
Code: Select all
Running 1 test case...
raw: 551 ms
boost::mutex: 4087 ms
boost::recursive_mutex: 7007 ms
critical section: 1682 ms
-
- Posts: 823
- Joined: 21 Oct 2008, 02:54
Re: 98.0 Performance
Code: Select all
raw: 218 ms
boost::mutex: 1342 ms
boost::recursive_mutex: 2293 ms
critical section: 515 ms
Re: 98.0 Performance
Code: Select all
raw: 96 ms
boost::mutex: 798 ms
boost::recursive_mutex: 1336 ms
critical section: 343 ms
-
- Posts: 8
- Joined: 20 Aug 2014, 22:21
Re: 98.0 Performance
repeated several times:
Win7
AMD Phenom II X4 965
Win7
AMD Phenom II X4 965
Code: Select all
Running 1 test case...
raw: 203 ms +-10ms
boost::mutex: 1516 ms +-20ms
boost::recursive_mutex: 2240 ms +-80ms
critical section: 492 ms +-10ms