Even the most basic custom unit shaders are very expensive
Moderator: Moderators
Even the most basic custom unit shaders are very expensive
In this example, I performance tested 500 corraid tanks from BAR (4 pieces per model).
My setup is an old i5 750 paired with a gtx970. I am far from GPU limited in either case.
Without customunitshaders 72 FPS
With customunitshaders this drops to 41 FPS. (normalmapping only, nothing in Drawunit Call (just return), no setting of uniforms or anything, just the extra texture)
Is there anything I am doing wrong to get this massive performance impact from from CUS?
Is there a point to using CUS on features such as trees for vertex shading and normal mapping, when they easily number hundreds on screen?
Engine level normal maps are practically free, if an agreed-upon convention for assigning normal maps to models can be implemented. See viewtopic.php?f=12&t=34024
EDIT: so correctly writing down your problems is a solution to them (partly). With the above test, the DrawUnit call had nothing in it, just return false end.
After removing the Drawunit = Drawunit assignment from the material definition itself, the fps goes back up to 62. While retaining the normal mapping (as that is not on a per unit, but a per-material bases).
But doing things that require uniforms to be set on a per-unit basis (e.g. nearly everything that one would do with CUS, vertex animation, flashing lights, custom blending colors, etc), is still very expensive.
My setup is an old i5 750 paired with a gtx970. I am far from GPU limited in either case.
Without customunitshaders 72 FPS
With customunitshaders this drops to 41 FPS. (normalmapping only, nothing in Drawunit Call (just return), no setting of uniforms or anything, just the extra texture)
Is there anything I am doing wrong to get this massive performance impact from from CUS?
Is there a point to using CUS on features such as trees for vertex shading and normal mapping, when they easily number hundreds on screen?
Engine level normal maps are practically free, if an agreed-upon convention for assigning normal maps to models can be implemented. See viewtopic.php?f=12&t=34024
EDIT: so correctly writing down your problems is a solution to them (partly). With the above test, the DrawUnit call had nothing in it, just return false end.
After removing the Drawunit = Drawunit assignment from the material definition itself, the fps goes back up to 62. While retaining the normal mapping (as that is not on a per unit, but a per-material bases).
But doing things that require uniforms to be set on a per-unit basis (e.g. nearly everything that one would do with CUS, vertex animation, flashing lights, custom blending colors, etc), is still very expensive.
Re: Even the most basic custom unit shaders are very expensive
As I have said before: it doesn't really matter what's *in* DrawUnit, the callin itself is the problem...nothing in Drawunit Call
One solution would be to use caching, e.g.
Code: Select all
local unitUniformData = {}
function gadget:DrawUnit(unitID)
glUniform(unitUniformData[unitID].loc, unitUniformData[unitID].val)
spUnitRenderingSetUnitLuaDraw(unitID, false)
end
function gadget:SomeFunction(unitID)
-- assume this callin requires updating uniforms
local unitUniforms = unitUniformData[unitID] or {}
unitUniforms.loc = ...
unitUniforms.val = ...
unitUniforms.frame = unitUniforms.frame or 0
if ((spGetGameFrame() - unitUniforms.frame) > N) then
unitUniforms.frame = spGetGameFrame()
spUnitRenderingSetUnitLuaDraw(unitID, true)
end
end
Re: Even the most basic custom unit shaders are very expensive
Ok, so just to be clear, if I set a unit health uniform, then call spUnitRenderingSetUnitLuaDraw(unitID, false), then that uniform value will be passed as the same as I last set it for all subsequent frames that unit is drawn in? Eg I could just watch all the healths and change as needed?
This would be great for setting the unitID as a random seed.
What about uniform values that change every frame, but are the same for every unit sharing that material, e.g. gameframe?
One can already do some neat things with unitID seed and gameframe :D
This would be great for setting the unitID as a random seed.
What about uniform values that change every frame, but are the same for every unit sharing that material, e.g. gameframe?
One can already do some neat things with unitID seed and gameframe :D
Re: Even the most basic custom unit shaders are very expensive
A lot of times DrawUnit really just needs to do the update of the time (frame) uniform and use it with the unitID/featureID (which are constant for the unit and could probably be set in the shader beforehand).
It would be really beneficial if such a time uniform would be available globally (provided by the engine custom material code) in all shaders without having to update it via DrawUnit/DrawFeature calls for every object. I don't know enough about shaders if global/shared uniforms are a thing.
It would be really beneficial if such a time uniform would be available globally (provided by the engine custom material code) in all shaders without having to update it via DrawUnit/DrawFeature calls for every object. I don't know enough about shaders if global/shared uniforms are a thing.
Re: Even the most basic custom unit shaders are very expensive
Yes, the GPU will cache the last value of each uniform declared in a shader.Beherith wrote:if I set a unit health uniform, then call spUnitRenderingSetUnitLuaDraw(unitID, false), then that uniform value will be passed as the same as I last set it for all subsequent frames that unit is drawn in?
That's the approach you should always take, uniforms should be considered semi-constants and only updated when absolutely necessary.Beherith wrote:I could just watch all the healths and change as needed?
If the shader is shared, then you only need to set the new value for *one* of the units that uses it.Beherith wrote:What about uniform values that change every frame, but are the same for every unit sharing that material, e.g. gameframe?
That code already exists (for parameters like id/speed/health/time/etc), but was never finished by trepan and I got lazy.gajop wrote:It would be really beneficial if such a time uniform would be available globally (provided by the engine custom material code)
Re: Even the most basic custom unit shaders are very expensive
Indeed. It's something I learned recently when doing the sun lighting thing in the engine. Changed my perspective on shaders quite a bit.Kloot wrote:That's the approach you should always take, uniforms should be considered semi-constants and only updated when absolutely necessary.
Is it possible to share them efficiently if you have different unitID constants for each unit? Wouldn't you have to change the ID for each unit? Same for hp, and other instance data.Kloot wrote:If the shader is shared, then you only need to set the new value for *one* of the units that uses it.
Cool. That's the next thing on my wishlist then. It's probably required for efficient & good looking time-based animations (doing it every Nth frame probably gets choppy soon after N>5).Kloot wrote:That code already exists (for parameters like id/speed/health/time/etc), but was never finished by trepan and I got lazy.
Re: Even the most basic custom unit shaders are very expensive
So the above two use cases are mutually exclusive. Because all units share the same material (shader), if I update health for one of them, it will change for all of them. Should each unit have separate material?Kloot wrote:That's the approach you should always take, uniforms should be considered semi-constants and only updated when absolutely necessary.Beherith wrote:I could just watch all the healths and change as needed?
If the shader is shared, then you only need to set the new value for *one* of the units that uses it.Beherith wrote:What about uniform values that change every frame, but are the same for every unit sharing that material, e.g. gameframe?
Am I interpreting the above correctly?
One material for all units
1. Frameloc: cached, great
2. Healthloc: must be changed for each unit
3. unitIDloc: must be changed for each unit
Separate material for each unitID
1. Frameloc: must be changed for each unit
2. healthloc: great, cached
3. unitIDloc: Great cached.
Also, is there anything I can do about the remaining 20% fps drop from 74 to 62, with all drawunit calls disabled?
Re: Even the most basic custom unit shaders are very expensive
Now you know why clever state management is so important in modern engines.
Fortunately, it shouldn't take much to allow materials to register uniforms they want to have automatically updated by the engine (like health) so Draw{Unit,Feature} can be bypassed entirely.
- If you have a shared material, but N>0 per-object parameters (health, speed, ...), you need Draw{Unit,Feature} for all objects sharing it.
- If you have a per-object material, but only common parameters (simframe, sundir, ...), you should use a shared material.
Fortunately, it shouldn't take much to allow materials to register uniforms they want to have automatically updated by the engine (like health) so Draw{Unit,Feature} can be bypassed entirely.
Yup.Beherith wrote:Am I interpreting the above correctly?
One material for all units
1. Frameloc: cached, great
2. Healthloc: must be changed for each unit
3. unitIDloc: must be changed for each unit
Separate material for each unitID
1. Frameloc: must be changed for each unit
2. healthloc: great, cached
3. unitIDloc: Great cached.
Minimize dependence on Draw{Unit,Feature} AMAP, but also group objects where you can to avoid material sorting/switching costs. Finding the right tradeoff might be painful, but TNSTAAFL.Beherith wrote:Should each unit have separate material?
Assuming you have optimized your shaders, probably not.Beherith wrote: Also, is there anything I can do about the remaining 20% fps drop from 74 to 62?
Last edited by Kloot on 08 Mar 2016, 13:40, edited 1 time in total.
Re: Even the most basic custom unit shaders are very expensive
Thank you for the concise explanation, Kloot!
The engine-level id/speed/health/time uniforms would be quite the treat.
Also, if I wish to update, say the gameframe once per material, is putting a function in the material's predl field a sensible solution?
The engine-level id/speed/health/time uniforms would be quite the treat.
Also, if I wish to update, say the gameframe once per material, is putting a function in the material's predl field a sensible solution?
Re: Even the most basic custom unit shaders are very expensive
No; your function would be called only once (and its return value cached) when the list gets compiled.
Re: Even the most basic custom unit shaders are very expensive
+1Beherith wrote:Thank you for the concise explanation, Kloot!
We can just extend it with some other callins, it already supports UnitCreated and UnitDestroyed along with DrawUnit, so I don't see a problem of adding GameFrame (and all other callins available for unsynced gadgets for that matter).Beherith wrote:Also, if I wish to update, say the gameframe once per material, is putting a function in the material's predl field a sensible solution?
WRT per-object materials, is there any limit to how many we can have? What is the memory/performance cost of different materials? Wouldn't rendering them cause shaders to switch, which might be slow or?
Re: Even the most basic custom unit shaders are very expensive
Gajop, I dont think there is a hard limit on the amount of per-object materials you can have, but i'm certain that switching materials costs more than changing a uniform.
Also, that 20% perf loss, I did some profiling on engine version 101.0-63 (its what I had on hand). I gave 500 corraid on BAR and toggled CUS.
It seems that the cost comes from the individual setting of teamcolor uniforms in
Is this even fixable?
If anyone is interested, I will gladly post the full results of the profiler
EDIT: Kloot is my hero! https://github.com/spring/spring/commit ... f6613cc5ad
Also, that 20% perf loss, I did some profiling on engine version 101.0-63 (its what I had on hand). I gave 500 corraid on BAR and toggled CUS.
It seems that the cost comes from the individual setting of teamcolor uniforms in
Code: Select all
<LuaObjectDrawer.cpp> m->SetUniformData(LuaObjectUniforms::UNIFORM_TCOLOR, std::move(IUnitDrawer...
C:/spring101_mine/spring-101.0.1-63-g5ebf047/sauce/spring/rts/Rendering/LuaObjectDrawer.cpp:189
If anyone is interested, I will gladly post the full results of the profiler
EDIT: Kloot is my hero! https://github.com/spring/spring/commit ... f6613cc5ad
Re: Even the most basic custom unit shaders are very expensive
Can you retest with the https://springrts.com/dl/buildbot/defau ... -g9ea3b9b/ build?
note: this changes "/give 500 corraid" from worst to best case (500 to 1 updates per pass), but in real games the ordering of objects will be less favorable so you are unlikely to see the same gain.
note: this changes "/give 500 corraid" from worst to best case (500 to 1 updates per pass), but in real games the ordering of objects will be less favorable so you are unlikely to see the same gain.
Re: Even the most basic custom unit shaders are very expensive
Yes, that was the exact cause of the overhead. CUS and non-CUS are now identical in performance in this edge test case. Thank you very much!
Re: Even the most basic custom unit shaders are very expensive
Kloot, thank you very much for all of your help with CUS, it wouldnt have been remotely possible without you.
I applied the last patch you attacked to my mantis ticket https://springrts.com/mantis/view.php?id=5158, but didnt want to reopen the ticket again (do note if that should have been my course of action).
If there is any change in the value of vec4 vertex in the pixel shader, the fragment is fully shadowed. This is only visible after the game starts, and the FrameLoc starts counting.
It can also be reproduced by pausing the game, then /luarules reload. The trees have correct shadows, but when you unpause, they become fully self-shadowed.
Also, is there any specific path you think we should implement per material frameLoc updating?
Before unpausing:
After unpausing:
I applied the last patch you attacked to my mantis ticket https://springrts.com/mantis/view.php?id=5158, but didnt want to reopen the ticket again (do note if that should have been my course of action).
If there is any change in the value of vec4 vertex in the pixel shader, the fragment is fully shadowed. This is only visible after the game starts, and the FrameLoc starts counting.
It can also be reproduced by pausing the game, then /luarules reload. The trees have correct shadows, but when you unpause, they become fully self-shadowed.
Also, is there any specific path you think we should implement per material frameLoc updating?
Before unpausing:
After unpausing:
Re: Even the most basic custom unit shaders are very expensive
frameLoc can (and will soon) be an engine-side uniform.
Did you look into setting up a shadow material? Self-shadowing is easy to break when moving vertices.
Alternatively you could use the unperturbed vertex positions to generate shadowtex coordinates.
Did you look into setting up a shadow material? Self-shadowing is easy to break when moving vertices.
Alternatively you could use the unperturbed vertex positions to generate shadowtex coordinates.
Re: Even the most basic custom unit shaders are very expensive
I tried using the unperturbed vertex coords to generate the shadow coords, but that had the same result. Note that when the game is paused, the frameLoc is already not zero, the breakage happens when I unpause the game.
I have not yet looked into adding a shadow material.
I have not yet looked into adding a shadow material.
Re: Even the most basic custom unit shaders are very expensive
As it happens this was both a framework and an engine issue (obvious in hindsight).
Engine side is now fixed, BAR's framework just needs to have the attached patch applied.
HF testing.
edit #1: ignore the Spring.Echo diffs, wanted to kill infolog spam and forgot to revert them.
edit #2: plus a fix for the arm_tanks material
Engine side is now fixed, BAR's framework just needs to have the attached patch applied.
HF testing.
edit #1: ignore the Spring.Echo diffs, wanted to kill infolog spam and forgot to revert them.
edit #2: plus a fix for the arm_tanks material
Code: Select all
Index: ModelMaterials/2_arm_tanks.lua
===================================================================
--- ModelMaterials/2_arm_tanks.lua (revision 5212)
+++ ModelMaterials/2_arm_tanks.lua (working copy)
@@ -12,9 +12,15 @@
local GADGET_DIR = "LuaRules/Configs/"
+local etcLoc = -2
+
local function DrawUnit(unitID, material,drawMode)
-- Spring.Echo('Arm Tanks drawmode',drawMode)
if (drawMode ==1)then -- we can skip setting the uniforms as they only affect fragment color, not fragment alpha or vertex positions, so they dont have an effect on shadows, and drawmode 2 is shadows, 1 is normal mode.
+ if (etcLoc == -2) then
+ etcLoc = gl.GetUniformLocation(material.standardShader, "etcLoc")
+ end
+
--Spring.Echo('drawing',UnitDefs[Spring.GetUnitDefID(unitID)].name,GetGameFrame())
local health,maxhealth=GetUnitHealth(unitID)
health= 2*maximum(0, (-2*health)/(maxhealth)+1) --inverse of health, 0 if health is 100%-50%, goes to 1 by 0 health
@@ -21,7 +27,7 @@
local _ , _ , _ , speed = Spring.GetUnitVelocity(unitID)
if speed >0.01 then speed =1 end
local offset= (((GetGameFrame())%9) * (2.0/4096.0))*speed
- glUniform(material.etcLoc, 2* maximum(0,sine((unitID%10)+GetGameFrame()/((unitID%7)+6))), health,offset) --etcloc.z is the track offset pos.
+ glUniform(etcLoc, 2* maximum(0,sine((unitID%10)+GetGameFrame()/((unitID%7)+6))), health,offset) --etcloc.z is the track offset pos.
end
--// engine should still draw it (we just set the uniforms for the shader)
- Attachments
-
- framework.diff
- (10.47 KiB) Downloaded 16 times
Re: Even the most basic custom unit shaders are very expensive
it currently spews countless errors for arm tanks since etcLoc isn't set.
Re: Even the most basic custom unit shaders are very expensive
Thank you Kloot, i've applied your patch! I even glimpsed some frameloc coming soon :D Super excited!