My setup is an old i5 750 paired with a gtx970. I am far from GPU limited in either case.
Without customunitshaders 72 FPS

With customunitshaders this drops to 41 FPS. (normalmapping only, nothing in Drawunit Call (just return), no setting of uniforms or anything, just the extra texture)

Is there anything I am doing wrong to get this massive performance impact from from CUS?
Is there a point to using CUS on features such as trees for vertex shading and normal mapping, when they easily number hundreds on screen?
Engine level normal maps are practically free, if an agreed-upon convention for assigning normal maps to models can be implemented. See viewtopic.php?f=12&t=34024
EDIT: so correctly writing down your problems is a solution to them (partly). With the above test, the DrawUnit call had nothing in it, just return false end.
After removing the Drawunit = Drawunit assignment from the material definition itself, the fps goes back up to 62. While retaining the normal mapping (as that is not on a per unit, but a per-material bases).
But doing things that require uniforms to be set on a per-unit basis (e.g. nearly everything that one would do with CUS, vertex animation, flashing lights, custom blending colors, etc), is still very expensive.