just another oprofile run

just another oprofile run

Discuss the source code and development of Spring Engine in general from a technical point of view. Patches go here too.

Moderator: Moderators

Post Reply
User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

just another oprofile run

Post by jK » 17 Sep 2012, 21:59

I also marked some very interesting time eaters (in total they make ~20%)

Notes:
  • LocalModelPiece::Draw fixing needs advanced opengl knowledge (vbo, ubo, tbo, shaders)
  • CMinimap::DrawUnit is quite easy to fix (it wastes all its time on glBindTexture)
  • CSolidObject::Block is slow cause it is managed via many std::maps instead of a single container
  • AllowUnitBuildStep & QueryNanoPiece could likely merged somehow or even skipped for most units (e.g. via adding a new unitdef tag/UnitScript function for using a fixed NanoPiece)
  • QTPFS was used, Legacy one is twice slower
Sim was a 15mins ZK 3v3 CAI on crossing4final with `gameplay zoom` (not full map visible) and ~40fps.
Attachments
profile_.png
(2.83 MiB) Downloaded 5 times
0 x

User avatar
smoth
Posts: 22300
Joined: 13 Jan 2005, 00:46

Re: just another oprofile run

Post by smoth » 17 Sep 2012, 22:25

nice
0 x

User avatar
Beherith
Moderator
Posts: 4934
Joined: 26 Oct 2007, 16:21

Re: just another oprofile run

Post by Beherith » 17 Sep 2012, 22:46

I loooove profiles. Could you run one for BA if I gave you a replay?
0 x

Kloot
Spring Developer
Posts: 1865
Joined: 08 Oct 2006, 16:58

Re: just another oprofile run

Post by Kloot » 17 Sep 2012, 23:12

the subtree labeled QTPFS is actually filled with legacy-PFS functions :!:
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: just another oprofile run

Post by jK » 17 Sep 2012, 23:22

Beherith wrote:I loooove profiles. Could you run one for BA if I gave you a replay?
np
Kloot wrote:the subtree labeled QTPFS is actually filled with legacy-PFS functions :!:
Damn my fault. The mod option tag is case-sensitive ...
0 x

User avatar
Beherith
Moderator
Posts: 4934
Joined: 26 Oct 2007, 16:21

Re: just another oprofile run

Post by Beherith » 17 Sep 2012, 23:50

Thanks man, a large dworld FFA replay:
Attachments
20120917_221156_Dworld_V1_91.7z
(1.05 MiB) Downloaded 26 times
0 x

Google_Frog
Moderator
Posts: 2441
Joined: 12 Oct 2007, 09:24

Re: just another oprofile run

Post by Google_Frog » 19 Sep 2012, 16:05

ZK uses AllowBuildStep in a few places and it can't be removed. Also at a glance there is little optimisation to do lua-side.

How is the cost of AllowWeaponTarget? It is now used in ZK to completely rewrite weapon target priorities so it would be nice to see it's impact.
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: just another oprofile run

Post by jK » 19 Sep 2012, 16:25

Beherith wrote:Thanks man, a large dworld FFA replay:
CSolidObject::Block + CMinimap::DrawUnit ~= 15%
Attachments
profile.png
(2.51 MiB) Downloaded 3 times
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: just another oprofile run

Post by jK » 19 Sep 2012, 16:53

Kloot wrote:the subtree labeled QTPFS is actually filled with legacy-PFS functions :!:
k QTPFS has a (massive) problem with my setup

>80% cpu usage after a few minutes & 100% reproduce rate with the script.txt
Attachments
script_benchmark.txt
(3.1 KiB) Downloaded 32 times
profile.png
(1.32 MiB) Downloaded 3 times
0 x

User avatar
Beherith
Moderator
Posts: 4934
Joined: 26 Oct 2007, 16:21

Re: just another oprofile run

Post by Beherith » 19 Sep 2012, 18:19

Thanks jK! It seems we are even more expensive in allowunitbuildstep than ZK. BA does use a lot of nanos....
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: just another oprofile run

Post by jK » 24 Sep 2012, 04:14

QTPFS update

Some things I noticed:
1. QTPFS causes cpu spikes at the beginning (it locks cpu >100ms), may be it is possible to share work across multiple gameframes somehow? e.g. calculating the path iterations could happen in stages?
2. is it possible that stucked units cause massive cpu load?
3. if 2nd issue is not true, later the spikes accumulate to a constant lag so the total cpu usage needs to be decreased too

edit:
I recompiled with RELEASE and noticed it runs MUCH smoother there (note, I cannot create profile graphs with such a build). Seems something in DEBUG2 & PROFILE multiplies the load (some massive recursive functions?). Still QTPFS makes 50% of SimLoad, esp. the spikes are still there. So (1) & (2) are still valid.
Attachments
profile.png
(711.97 KiB) Downloaded 3 times
0 x

SirMaverick
Posts: 834
Joined: 19 May 2009, 21:10

Re: just another oprofile run

Post by SirMaverick » 24 Sep 2012, 22:14

BA 7.72, 15 vehicle constructors.
Everytime you place a new building there is a small spike after the building is created. Does not happen with using just a single builder.
Attachments
qtpfs.sdf
(26.55 KiB) Downloaded 21 times
0 x

Kloot
Spring Developer
Posts: 1865
Joined: 08 Oct 2006, 16:58

Re: just another oprofile run

Post by Kloot » 25 Sep 2012, 01:52

jK wrote:QTPFS update

Some things I noticed:
1. QTPFS causes cpu spikes at the beginning (it locks cpu >100ms), may be it is possible to share work across multiple gameframes somehow? e.g. calculating the path iterations could happen in stages?
2. is it possible that stucked units cause massive cpu load?
3. if 2nd issue is not true, later the spikes accumulate to a constant lag so the total cpu usage needs to be decreased too
1. I considered that, but it's hard to do because each search would need its own context (plus memory) and terrain deformations could invalidate any partial results. In any case the searches themselves are not so expensive, but the extra work being done during each search (updating neighbor references for every explored node) was, which should now be fixed by d51db40a9f0. Might need a new oprofile run though.
2. Unlikely, but units trying to find a path *to* unreachable nodes could cause spikes and AI's often spam such orders.
edit:
I recompiled with RELEASE and noticed it runs MUCH smoother there (note, I cannot create profile graphs with such a build). Seems something in DEBUG2 & PROFILE multiplies the load (some massive recursive functions?). Still QTPFS makes 50% of SimLoad, esp. the spikes are still there. So (1) & (2) are still valid.
QTPFS has its own special debugging #definitions (DEBUG(2) only enables some trivial asserts), so I assume the code is just very cache-inefficient without compiler optimizations.
0 x

User avatar
Beherith
Moderator
Posts: 4934
Joined: 26 Oct 2007, 16:21

Re: just another oprofile run

Post by Beherith » 04 Oct 2012, 17:46

jK, what springsettings did you use for the profile run? Some things that I thought were more expensive didnt show up.
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: just another oprofile run

Post by jK » 04 Oct 2012, 21:56

Code: Select all

AllowAdditionalPlayers = 1
FSAALevel = 8
Fullscreen = 0
GroundDetail = 84
HardwareCursor = 1
LinkIncomingPeakBandwidth = 0
LinkIncomingSustainedBandwidth = 0
LinkOutgoingBandwidth = 0
ReflectiveWater = 4
ShadowMapSize = 4096
Shadows = 1
ShowFPS = 1
ShowPlayerInfo = 0
ShowSpeed = 1
WindowedEdgeMove = 0
0 x

User avatar
Beherith
Moderator
Posts: 4934
Joined: 26 Oct 2007, 16:21

Re: just another oprofile run

Post by Beherith » 05 Oct 2012, 09:18

Thanks man, I suspect BuildingGrounddecals are a bigger load than they seem. Though your config and the default
decalLevel = std::max(0, configHandler->GetInt("GroundDecals"));
makes it seem decals were off.
0 x

User avatar
jK
Spring Developer
Posts: 2299
Joined: 28 Jun 2007, 07:30

Re: just another oprofile run

Post by jK » 05 Oct 2012, 13:21

ah no, they are on, but I disable them from time to time cause I write a shader replacement currently.
0 x

Post Reply

Return to “Engine”

cron