P.O.P.S.- Demo 3.1 available!

Argh · Post by **Argh** » 10 Sep 2009, 09:21

OK... been working on this for about a week(in between other crap and RL) now. Normally wouldn't show this code this rough, etc., but what the hell, this is actually pretty cool.

Basically, this is a prototype of a new particle system I'm working on.

Thought I'd share the early WIP code for this, as maybe other people will join in, help out. That, and this is an open project where I want to make sure that people can understand my sourcecode, because I have a feeling that this will be worthwhile when it's done.

How does it work?

It uses the GPU to do everything, other than iterate through the list of named textures and particle properties. It iterates by texture, because, ironically, gl.Texture callouts are the most expensive by far- that may be an area of the engine that needs some optimization, or maybe it's just plain expensive, IDK enough about Deep OpenGL to know for sure.

All animations are vertex-based, on the GPU, using the timer to determine distance moved, so there are no absolute guarantees that motion will precisely match gamestate, but it should be reasonably close. The goal is to keep all systems in two mat4s (one for ColorMap emulation) for fast transfer. That may be hard to achieve, but I'd like to try.

All particles are independent agents. You can create a single system or 100 or mix them up or throw a lot of randomness in, or even write some custom algorithm that will do funky stuff (that may be a good use for that last matrix position, come to think).

It's not like CEG, where you have a given spawner with a min/max cap, it's pretty much what you want, when you want it.

CPU loading will rise concurrently, of course, but it's almost totally linear. The cool part about that is that you can handle load-balancing however you want. I haven't structured this for that yet, except a really crude example of time-based culling, but basically you're free to implement that how you want. I'm planning for the "stock" version to cull the oldest first, if we're going to exceed the cap, so that new effects play, and we're most likely culling stuff that's nearly done anyhow. May need to add a test for time left, etc., to keep stuff that's supposed to stick around.

It culls for AirLOS. It's going to have the same problems that everything using LOS does- it's checking a sector's state, and I only check the LOS of the starting points. I don't think it's possible to track the current points without eating a lot of CPU, so I'm probably not going to bother.

Anyhow, that's a lot of verbiage, for a relatively short piece of code, heh. I just wanted to make certain things understood, since it looks so simplistic at first. It's Not Done Yet, but I accidently created a really cool goof while playing around with the rudiments of physics logic, thought I'd share it, maybe somebody can find a use for it...

aegis · Post by **aegis** » 10 Sep 2009, 09:24

how's this different from trepan's gpu particles?

Argh · Post by **Argh** » 10 Sep 2009, 09:32

Well, tbh, I couldn't tell you, I only dimly remember the volcano code, which was scary-looking, and the snow code, which was pretty slow and didn't work on ATi until I debugged the GLSL, tbh.

I am mainly building this from scratch, though. Can't say how performance will compare to anything else atm, it's an open question how badly it'll bog down when the GPU has to handle the rest of the physics logic. Thus far it handles loads up to 2500 particles with a World Builder map's CPU load without anywhere near the cost of the equivalent number of CSimpleParticleSystems, though again, I should emphasize that I do not know what the final results will be, until I complete the physics logic.

aegis · Post by **aegis** » 10 Sep 2009, 09:42

trepan wrote:Even without the VBO interface, you could do a lot (and faster
than CEGs), by using vertex textures (as demo'ed by the volcano
widget). The downside is that the technique only works on newer
video cards (so an alternative would have to be provided anyways)

Argh · Post by **Argh** » 10 Sep 2009, 09:59

I'll take a look at VBO, but I have my doubts that's going to help a lot. Practically all of the load is iterating through the list of agents every render frame, and each agent needs to have a fairly big chunk of data assigned. All I do in the render pass on the CPU is dump data into uniforms and call a display list, though, so I just dunno how I'm supposed to make that any cleaner, the only logic is that AirLOS check. But I could very easily run out of floats with two agents per event, I suspect- that would be 4 mat4s. Guess I could look at that, though, as a way to possibly double performance. Gotta sleep first.

It's a lot harder to make use of that kind of trick when you need to build a general-purpose system, basically. Most of the load's CPU-side... the question is how much total performance do I lose when I add more to the GPU side. Right now it's a lot faster than CEG, but that could easily change if GPU load is a significant part of time spent on each render pass.

Post by jK » 10 Sep 2009, 10:31

how does this differs from LUPS' particle class, which is even more optimized and can handle multiple ten thousand particles with <10% fps drop?

Argh · Post by **Argh** » 10 Sep 2009, 23:53

Well, I hadn't done a direct comparision test, for one thing.

For the record, jK is right, LUPS SimpleParticles2 is considerably faster in all comparision tests, and can handle much larger particle counts atm. At least I know where the bar is

On a more serious note: jK, LUPS remains un-portable and is too tightly integrated with camain, etc. And since I have no idea what you changed to address the desync problems with the revision that ships with P.U.R.E., I have to assume that my modularized version is worthless and breaks online play atm. If you'd like to look at the modularized version and fix it up, I'd be more than happy to put this particular project to rest and move on to finding a use for systems that can throw that many particles. Otherwise, I'd rather keep working on this, where it's intended to be completely modular from the start, and I know what it does and why. I am dead serious about the whole issue of portability and modularity, basically- I'd rather have a slower solution than to be stuck with one that I can't maintain or replace if a newer (modular) one exists.

Here's the module version, and what I want, in terms of structure, so that it's totally modular, and game developers can simply add a one-line dependency and start coding their weapons effects, etc., using the example code that's already in P.U.R.E.. Let me know if you're interested. I like LUPS a lot, and would be happy to champion its use, I'm just stuck with some bad choices here because it's simply not structured like I'd like it to be.

Argh · Post by **Argh** » 11 Sep 2009, 07:12

OK, here's a fixed version of the LUPS module, minus the camain, etc., and cleaned up. Tested, seems to be fixed. Only remaining issue is that the manager is a big mess, even after removing all the CA-specific garbage (and replacing it with P.U.R.E.-specific garbage

). I'll have to see if that stuff can all be put into the config instead somehow, it really should not be in the middle of the manager code.

I still don't know what was changed / fixed that addressed the desync problems, though. I'll go take a look at CA's changelog.

<looks>

Whatever was causing desyncs must have been fixed months ago. Care to confirm that, jK? My revision's 7 months old, and I don't see much that's changed since then...

Argh · Post by **Argh** » 11 Sep 2009, 11:33

Wheeeee.

Tried SimpleParticles2 for weapon trails... it's performance when called per-frame is terrible. That, and it's ColorMap emulation is kinda funky. I see borders of fragments overwriting fragments behind, instead of blending properly, etc.

I'll have to test what's wrong with the ColorMaps in more depth before I have a better idea what's going on.

I guess that this might be useful after all, even if I don't go the route jK did to speed it up(using the multitexture coords to store uniforms is pretty clever, btw, props).

I may be able to fix whatever's wrong with the blending, but I can't do anything about the per-event CPU hit, and that's pretty important.

Post by jK » 11 Sep 2009, 12:04

I haven't touched LUPS for over a year now (even when i got many items on my todo list for it).

Also the none of the particle classes is designed for getting created each frame (see CA's flamethrower FX which has problem if the unit turns while shooting etc.). Btw this affects the engine SimpleParticles class too (even as not that hard cause the engine doesn't create a displaylist), because the the number of particles is secondary - in the engine it is still much important than in LUPS, cuz the engine does basic parts (billboarding) on the CPU -, the primary element is the NUMBER of spawned FX (that makes CEGTags very critical!). The reason for that is that a frustum check is done on per-FX level: more FX -> more frustum checks.

Blending in LUPS isn't safe across multiple FX, because it doesn't do any depth sorting yet (I feared the CPU usage of that), instead there is just the level tag, so you can force FXs to get rendered in a specific order.

Using VertexAttributes aka MultiTexCoords instead of uniforms is called pseudo instancing and can double the performance of crowd rendering, it is a pretty neat way to avoid the slow writing of uniforms (at runtime they shouldn't make any difference).

PS: I don't have any experience with desync'ing. I just can say that LUPS runs for a very long time now in CA w/o any desyncs.

Argh · Post by **Argh** » 11 Sep 2009, 21:53

Wait, what? Complete sentences full of information?

Basically, as you probably gathered, I wrote the foundations for a complete replacement of CEG last night. It's still crummy and crude.

I ported all of the stuff that can be a config out to configs, as well, so that code and data don't have to mix so much, for us poor bloody end-users who can't readily tell the difference. I'll post all of this when it's not just an ugly skeleton and has some examples.

But it already meets most of the requirements- what particle system to generate depending on the event type. Still working on all of that, mind you, and it's far from optimized, but I think I see some serious speed opportunities, if I can determine what concept fits best where.

Looks like POPS might be the one for small events with a limited scale, like CEGtag. There, I don't have the CPU load of building the display lists, it's just table entries sent over the sync / unsync and added to the table, so the performance load for a given event should be low. It just won't ramp beyond a certain point count-wise without fairly serious ramifications, so I'll have to put a limit on it and cull older events. Not a perfect solution, but there doesn't appear to be one- each approach has advantages, but only in certain contexts.

I'd have to vote for depth-testing to be on, at least for the fundamental particle systems used commonly for CEG-like effects. That will hurt performance, but it's a beauty requirement, they look ragged if they overwrite each other's fragments. CEG emulation pretty much requires multiple systems with different bitmaps, and you never know when multiple systems will be right next to each other, so there's no way around that, imo.

Argh · Post by **Argh** » 17 Sep 2009, 02:13

Hmmph!

Well, I finished getting POPS set up, to emulate CEG. Problem is, it's still too damn slow for per-frame events (CEGtrail, Unit dust trail effects, etc.), which, now that I'm reasonably certain SimpleParticles2 will do the "heavy lifting" and replace CEG for big particle events, is the major hassle.

Dunno whether it's worth continuing to mess with this. I'm just not sure how I'm supposed to find more speed without a pretty radical alteration to the gameplan.

I've thought about pre-built events in a SimpleParticle2-like framework- i.e. it'd just be setting the position, but the events would be one of several pre-designed ones, so very little new math per creation event. That's going to require an entirely different approach, though. I don't know how to store and call a list of display lists- can they be treated as table entries?

Anyhow, here's the source, in case anybody wants to see particle showers- I got gravity, drag, growth, etc. all working, all on the GPU.

It's aggravating, though. Seems there has to be a good way to quickly schedule small events that have enough particles that they can look good, yet not overwhelm CPUs. Yet this thing creates big problems when merely throwing 1000 particles around, and all it does is make some table entries and write them to uniforms.

It's a lot faster than SimpleParticles2 for that specific purposes, but it's still pathetically slow vs. CEG- a single CEGtrail on a rocket, spewing between 2-4 particles per frame with a lifetime of 30 frames is near that limit, and performs quite a bit faster

Post by **trepan** » 17 Sep 2009, 02:59

radical alteration to the gameplan

Take another look at the volcano code
(hint: segment the vertex texture into
manageable chunks for particle effects).

Argh · Post by **Argh** » 17 Sep 2009, 03:50

I'll take a look, but I'm not sure how to "segment the vertex texture". Sorry, that's entirely gibberish atm.

Argh · Post by **Argh** » 17 Sep 2009, 04:39

<reads>

Ok... so... lemme get this straight. You're using several textures to store values, and reading the pixel value of the texture to give the shader velocity and other information?

Hrmm. So, you basically have no new costs, besides changing texture values. Which you only do when the particle is born or dies.

That's very slick. I can see how that saves a lot of horsepower in the end. And I can see how combining that with this simplistic vertex shader might work out, by cutting out a huge amount of complexity.

But...

I need to alter one particular pixel of the textures. How do I specify that I want to change one pixel of textures or RBOs, but leave the rest of the data intact?

Also... hrmm... how do I avoid accuracy issues? I need to accurately position the objects in space. If I use RGB values, won't that be a fairly inaccurate representation of the position? Won't they get flattened to the nearest integer value when stored in the texture? Or will they remain floats?

<reads more>

Ok, I have that last bit answered. They can be floats. Costly, but meh, only need it for position, maybe velocity.

Hrmm. What about visibility? I don't want them showing up, if they aren't in LOS. Is the LOS texture available, or am I going to have to build it periodically and render to texture?

Argh · Post by **Argh** » 18 Sep 2009, 10:01

Solved the main problems. Drawing to FBO is easy enough, using the textures as control values is straightforward. Even found a solution for LOS, although I should say I haven't tested it yet. One of the big things is that I don't need the fragment shader at all- the entire thing can use that vertex shader I developed, so that part's going to be quite clean.

However... I should say that, even though I don't have all of the algorithm designed yet, I have this sneaking suspicion that it's real-time performance for frequent events will not be fast enough.

The problem there is the math hit per-frame events are going to cause, as opposed to the render-frame hit of iterating through that huge list. As a typical example... if we have 100 robots, and they release a small cloud of, say, 8 particles every 5 frames... that's 4,000 events, or 800 per frame (ideally- usually the load's not evenly distributed, not by a long shot). That's a hell of a lot of math for Lua to handle efficiently. The synced portion must pass a call to unsynced, which will then take the values from synced and input them into the FBO texture objects.

When I look at that kind of math vs. what Spring must be doing with CEG, I am surprised that this isn't more of a performance drag than I know it already is. Even if I group the events and soften the load a bit, by not repeating un-necessary steps (those 8 events all have a common origin, let's say) it's still pretty scary.

IOW, it's very impressive, when it's a static system where the particles are directed mainly by the shader and doesn't touch the CPU, but I can see some problems looming ahead. It's the same problem with SimpleParticles2, where it has to run through a fairly lengthy bit of code per particle system spawned, which means that the costs are pretty damn high, even if the number of entities in a given system is low. This will be a lot lower cost, per event, so... maybe it'll be fast enough. Dunno whether it'll out-perform CEG, though, which means it may still be a waste of my time

It'll perform like hell on wheels once the particle system's set up, though, that I know already- it's considerably cheaper, GPU-side, than the Volcano code or SimpleParticles2. It may end up being a cheap way to do sprays, and save SimpleParticles2 for stuff where I need more flexibility.

I'll report on the results when I'm done talking about it and have actually finished the beast, but that's my prediction, just looking at the number of steps I need to make to initialize a "new" particle. It may be fast enough, just on savings in terms of per-frame costs during the render cycle, but looking at what's going to be required to make it work, I am less than totally sanguine about the results.

manolo_ · Post by **manolo_** » 18 Sep 2009, 11:20

Request:

use ur particle system to make a firework when the game is ended and won

Argh · Post by **Argh** » 22 Sep 2009, 00:37

Hey, when I get it working, I'll be happy to develop some particle systems for fireworks. In fact, that's pretty easy- I'll bet I could set up a fireworks system just using a slightly-modified volcano app, frankly.

It's been slow going, but I've figured out some of what needs to be done. Easiest solution is to pre-compile the various particle cases, so that commands to create a particle agent are basically just, "agentname, position (xyz), time(to death)".

This obviously limits flexibility- you have to pre-design the cases, and you have to plan that really far in advance. But it does appear that that provides enough flexibility. It's cheaper by far to have 10 display lists running through the shader (for the most part, halting when the lifetime check fails), with a single set of uniforms to define colormap emulation, etc. at the beginning of the rendering pass, than it is to define those parameters many times a second when the agents are produced.

But I am still not happy with the performance yet. I have most of it working now, and LOS from a shader works, but it's been a pain in the rear. It really appears to me, though, that there must be some significant costs in the Lua interface to Spring's rendering code, though- if this was being implemented on the engine side, it would probably run about 100X faster overall.

Argh · Post by **Argh** » 22 Sep 2009, 22:49

Ok, more progress, and some source. This is really boring unless you are interested in the algorithms, though.

Basically, I went back to start for a bit, and re-examined the speed issue.

This demo shows a single-texture, single-FBO solution, which is probably fast enough for practical use. It is using simulation of 160 events every 10th of a second, with the lowest CPU use per event I've been able to get thus far. The next stage is to build an equivalent one for the ColorMap emulation (maybe 2, for 8 colors max, dunno yet) and to rebuild the shader.

Argh · Post by **Argh** » 27 Sep 2009, 19:43

OK, I'm almost done, should be able to present the final algorithm design sometime tonight. I'm having some minor issues with texture coordinate stuff that are causing some minor hassles, and various little details are still getting cleaned up.

The final algorithm appears to be fast enough for its intended purpose- per event, it is no more expensive than the second POPS version (and usually less); per render cycle, it is a very short function (on the CPU) with a fairly small data transfer load.

I have tested its "burst" performance, and it is very fast, for large numbers of events- just for testing's sake, I've been doing 512 new particles every 0.3 seconds, and the load is very small- I suspect that it will give LUPS a serious run for it's money in that department, although that's really not what it's for. I think it's fair to say that if that load's spread over several gameframes, it will be OK, although there will be some additional load due to the table operations required.

Once it's done... I guess the next step is creating a demo of use, assuming that the documentation makes sense to people, and then writing some better management systems. Fireworks sounds fun to me, I'll do that.

Spring RTS Engine

P.O.P.S.- Demo 3.1 available!

P.O.P.S.- Demo 3.1 available!

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)

Re: Prototype: P.O.P.S. (and a fun goof)