Phoronix Test Suite

Post by **Tim Blokdijk** » 01 Jan 2012, 22:52

As claimed by http://www.phoronix-test-suite.com:

The Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added.

So I was browsing the Phoronix forum and came across the following discussion: Phoronix forum thread
In which people ask the PTS devs to add Spring as a test.
Michael (PTS lead dev and the Phoronix founder) had this to say about it:

Because I am always busy and have no experience with the Spring engine... If there is some Spring fan that wants to code the patch(es), go right ahead and then it will be in PTS like you guys are wanting.

I'm still busy with that interview I did with Ton Roosendaal (Blender conference), I'm progressing but it's just a lot of work.
But, just as an intellectual exercise, how difficult would it be to make a "nice benchmark / timed demo mode"? What would be the best way to approach this?

My second question is how useful a PTS based test/benchmark would be for Spring development?

The Phoronix Test Suite code can be downloaded with "git clone http://www.phorogit.com/repo/phoronix-test-suite.git". It seems to be written in php cli.

Post by **Tim Blokdijk** » 01 Jan 2012, 23:14

Just to make things a little easier, the thread at Phoronix links to a feature request on this forum: http://springrts.com/phpbb/viewtopic.php?f=21&t=22604
The suggestions there were using a replay or using the save/load system to have something that works across Spring versions.

And there is a link to http://trac.caspring.org/browser/trunk/mods/benchmark with the last commit 2 years ago. I don't know if Zero-K still has this in the current version?

Post by **hoijui** » 01 Jan 2012, 23:15

so if i understand the software right it works like this:
you create a test-scenario/profile, which in our case could be... 1h BA DSD KAIK_vs_KAIK, and this scenario can then be easily run on different systems. when you run it, performance is measured, and then compared with other systems performance.

anyway... as answer, i would say one has to know the software much better to be able to say how much work it is.. or in other words, you would just start doing it, and if you are half done you may know how much time it would need, or that it is not (reasonably) possible.

knorke · Post by **knorke** » 01 Jan 2012, 23:29

My second question is how useful a PTS based test/benchmark would be for Spring development?

if you just need something like the quake timedemos: replay with camera following a fixed path? like http://www.youtube.com/watch?v=oSdvrBcBkU0 or use the camerapath thing from carrepairer.

zwzsg · Post by **zwzsg** » 01 Jan 2012, 23:37

Something like KP menu background?

(Save this in a text file and pass it to Spring as argument.)

dansan · Post by **dansan** » 01 Jan 2012, 23:58

Other related forum thread: http://springrts.com/phpbb/viewtopic.php?f=23&t=26983

Tim Blokdijk wrote:My second question is how useful a PTS based test/benchmark would be for Spring development?

IMO it would not only be a possibility to gain some attention with linux/gfx/power geeks (few new players), but it's primary win for spring could be for the dev-team: the possibility of fully automated tests to monitor the performance of springs pathing, gfx, ai, rendering, etc subsystems, ST/MT-performance, ST/MT-sync'iness etc during development, automated, every day/commit/whatever.

If there is a real request/demand from the devs I could setup a VM with PTS and make it auto-pull from the build bot or auto-build on its own and run relevant test scenarios. The VM can run on Lichos host.
hmm... a VM on a host with lots of other duties... bad for benchmarking... hmm... I think I take that back and think about it again IF there is demand...

Cheesecan · Post by **Cheesecan** » 02 Jan 2012, 00:22

I was searching forums for something like this the other day and all I could find was SpringBench which wasn't a real benchmark. A benchmark would indeed be very useful for optimizing MT.

Post by **Tim Blokdijk** » 02 Jan 2012, 01:04

This post is a joke, don't read it if you are interested in the test/benchmark topic.

hoijui wrote:.. or in other words, you would just start doing it, and if you are half done you may know how much time it would need, or that it is not (reasonably) possible.

That's one way of doing it.. but I have a formal education in this stuff.
I first define a long term vision. In discussion with key people this vision is further enhanced and clarified. Once there's a working agreement and people have clearly communicated their commitment to the vision; I move on to identify key goals that would need to be achieved to give meaning to our shared vision.
Per goal I list the benefits and how they relate to the long term vision.
Then I make an analysis of our current situation. This will include the current strategy, management style, systems, personnel, culture and structure. Based on this I can do a strengths and weaknesses analysis followed by the external and internal analysis. This together will be the basis for a full SWOT analysis.
Next I will write a strategic document that will explain how we can go from the current situation to a situation where we have completed our goals. Several projects will be defined within this strategic document each with SMART defined objectives.
After a formal round of discussion with key people where again I get the necessary commitment I move into a phase where I talk with experts in the specific fields relating to my projects.
This process starts with installing Redmine on a server, defining the projects and creating accounts for the people involved. Then together with the experts I write a document per project that outlines the mayor milestones, per milestone key deliverables are defined. Relations with other milestones, projects, goals and ultimately the vision are clearly made so everybody understands the bigger picture.
Then individual tasks are made within the milestones these are assigned to people with a general estimation of time needed to complete. All tasks combined would need to lead to a completed milestone.
Once all milestones within a project have the necessary tasks defined I multiply the time needed by 2.5 and discuss the full project with key individuals. Once this is signed of I actually tell the people that they can start to do the work. Completed tasks are closed after review, which I may or may not do. Reviewing other peoples work has this "real work" thing to it. I don't do real work.
If I get nervous (for whatever reason) during the reviewing work I semi-randomly reassign people to tasks and hire/fire people.
After a project fails (or is completed) I write an evaluation that's again discussed with key individuals. Then I get a pay raise.

Sorry, derailing my own thread..

But it's nice to see that a lot of fundamental work for a test/benchmark based on the Spring engine is already done before.

abma · Post by **abma** » 02 Jan 2012, 04:56

to get a benchmark imo three things are needed:

1. check if its possible to run spring on different platforms / hardware with the same config (screen-resolution, graphic-decals, ...) which affects performance
2. make some lua-ai / gadget which is deterministic (does every run the same). maybe easiest would be a something like a give all / camera fly over the scene.
3. create a widget that quits the game / dumps stats

it shouldn't be soooo hard to do that, because the validation-test already does something similar.

easiest for 1. would be, to use spring-headless...
for 2. easiest would be to replay a demo, but this would break on each new engine-version, so a lua-widget should be prefered, as it could be fixed if something breaks.
3. already done i guess

edit: where are binaries taken from? would the ppa be used or is self-compiling essential?

dansan · Post by **dansan** » 02 Jan 2012, 10:50

abma wrote:1. check if its possible to run spring on different platforms / hardware with the same config (screen-resolution, graphic-decals, ...) which affects performance
[..]
easiest for 1. would be, to use spring-headless...

Would it be possible to benchmark gfx performance with headless?

abma wrote:edit: where are binaries taken from? would the ppa be used or is self-compiling essential?

The idea would be to automate the tests to run daily or per-commit. IMO daily is enough, if something relevant happens, and there were lots of commits that day, an automated git-bisect can be run by PTS to identify the relevant commit.
Anyway: self-building (or binaries from build-bot?) are necessary for daily tests (or is the ppa updated daily? I don't know - I'm not a Ubuntu user.).
In any case it would be important to know about the versions of the relevant libraries and build-chain, esp. boost, mesa/nv/ati and gcc versions.

abma · Post by **abma** » 02 Jan 2012, 14:04

dansan wrote: Would it be possible to benchmark gfx performance with headless?

no, this is why it's headless (no gfx-output) :)

headless only could check i/o performance + cpu. but as headless behaviour is the same as "normal" spring, it can be easily switched...

what i wanted to say: you need to check that config settings are identical to allow the comparison of the performance on different hardware.

dansan wrote: The idea would be to automate the tests to run daily or per-commit. IMO daily is enough, if something relevant happens, and there were lots of commits that day, an automated git-bisect can be run by PTS to identify the relevant commit.
Anyway: self-building (or binaries from build-bot?) are necessary for daily tests (or is the ppa updated daily? I don't know - I'm not a Ubuntu user.).
In any case it would be important to know about the versions of the relevant libraries and build-chain, esp. boost, mesa/nv/ati and gcc versions.

this is what the validation-tests already does, but every commit currently: see http://buildbot.springrts.com/waterfall the validationtests row. the ppa isn't updated automaticly.
the validation doesn't check performance info, as the test currently just runs an ai vs ai. drawback is also, the output of the validation-test isn't parsed, only ERROR and messages to stderr are covered, other errors have to be found by humans in the output. it just checks, if the engine starts up and doesn't crash.

imo the buildbot seems to be really successful, the development branch was more often and longer broken before doing such extensive compiling / validation tests.

knorke · Post by **knorke** » 02 Jan 2012, 14:26

for 2. easiest would be to replay a demo, but this would break on each new engine-version, so a lua-widget should be prefered, as it could be fixed if something breaks.

with replays it would be simply to make new ones. If engine changes, even games with LuaAIs would play different (eg every change to simulation like pathfinding, collision,...)
Not sure if it is possible to disable sync-check for replays: when a dev is eg changing synced-unrelated stuff like rendering, will this testversion still play replays from other spring version? (with same simulation code)
If no, then the only thing I can think of that would behave the same every time is movectrl-ing units with some testgadget. But seems cumbersome, can still change and is not a realistic game situation. (no projectiles etc)

abma · Post by **abma** » 02 Jan 2012, 14:35

you can't change both if you want compare the performance:

either keep the engine version to get comparable results, or keep the same hardware to see if a commit makes the engine slower or faster.

demos are bound to the engine version: if a command is changed / deprecated / added... bad stuff happens. :)

yes, movectrl sounds unrealistic, so some gadged ai, that has no random() inside could be used.

knorke · Post by **knorke** » 02 Jan 2012, 14:40

so some gadged ai, that has no random() inside could be used.

I think no, because each engine version the simulation is always a bit changed. (for example pathfinding -> units move slightly different -> game plays out different)

demos are bound to the engine version: if a command is changed / deprecated / added... bad stuff happens. :)

yes, change to commands/simulation etc. breaks sync of replays. But it also makes gadgetAI games play out different.
What I meant if somebody was to compile 85.0 with something non-sync change, (like a different water render), would that still play normal 85.0 replays?
Technically it should be able to?

Tobi · Post by **Tobi** » 02 Jan 2012, 14:43

I think both situations are completely different.

For Phoronix I assume they don't want a new engine version every few months, that might have quite different performance characteristics, because then hardware/drivers can not be compared. So for Phoronix it may actually be easy: fixed engine version + fixed game version + replay + some widget that moves the camera around in a predefined manner (one of those auto spec widgets?)

For us (spring devs), I think the most useful would be performance tests coded in Lua that exercise a particular part of the engine. Then, with a bunch more work to store data over time, we can find performance regressions in particular subsystems of the engine. For example, a gadget could create a huge amount of units, give them an order, and somehow measure the CPU time it takes the pathing subsystem to create paths for them all. I think jK has been working on something like this at some point.

(Personally I'd rather first see an exhaustive test suite of the functionality of the Lua API, and only then performance tests though. But still, even performance tests help a bit in testing Lua API functionality.)

Spring RTS Engine

Phoronix Test Suite

Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite

Re: Phoronix Test Suite