Automated Release Test

PicassoCT · Post by **PicassoCT** » 28 Mar 2012, 18:13

Its like a tennis match, but the ball is work..
3:2 Licho. Serve abma.

Only thing that could better this if they both played worser and were hot chicks.

abma · Post by **abma** » 28 Mar 2012, 18:19

Licho wrote:Well there is spring based general mission service
http://zero-k.info/Missions?featured=false

All missions there are always available through standard plasma service download.

- can the gadget easily been customized? (for example, add custom commands / checks / use Spring.Log())
- how can missions be updated/uploaded/downloaded?
- can parts of the "mission" be disabled by a modinfo setting?
- does the mission editor run on linux? (i get "Cannot open assembly 'MissionEditor.exe': File does not contain a valid CIL image.")
- does it work nicely with other games?

for me, it seems like it can't be made with the mission-editor.

to use the plasma-missions much more work has to be made, to use it for regression testing. this is why i would prefer to not use it. if points are wrong, please correct me. if the gadgets are in the game files, the files are for sure updated, on the plasma-service i bet, they won't be updated.

requirements for regression-testing are:
- possibility of disabling a single test
- possibility to run only specific test(s)
- tests should run fast (restarting spring for every test isn't fast)
- every game(-maker) can use it
- game / map version independant (as far as possible)

PicassoCT · Post by **PicassoCT** » 28 Mar 2012, 18:33

abma wrote:
Licho wrote:Well there is spring based general mission service
http://zero-k.info/Missions?featured=false

All missions there are always available through standard plasma service download.
- can the gadget easily been customized? (for example, add custom commands / checks / use Spring.Log())
- how can missions be updated/uploaded/downloaded?
- can parts of the "mission" be disabled by a modinfo setting?
- does the mission editor run on linux? (i get "Cannot open assembly 'MissionEditor.exe': File does not contain a valid CIL image.")
- does it work nicely with other games?

for me, it seems like it can't be made with the mission-editor.

to use the plasma-missions much more work has to be made, to use it for regression testing. this is why i would prefer to not use it. if points are wrong, please correct me. if the gadgets are in the game files, the files are for sure updated, on the plasma-service i bet, they won't be updated.

requirements for regression-testing are:
- possibility of disabling a single test
- possibility to run only specific test(s)
- tests should run fast (restarting spring for every test isn't fast)
- every game(-maker) can use it

.. and thats a brilliant hit by abma, the ball is in Lichos field, how will he react, because even if he wants to return this volley, he has to make a precise argument, which would count as work, and therfore as a point lost. What a game, Ladys and Gentleman, what a game!

abma · Post by **abma** » 28 Mar 2012, 18:35

@PicassoCT:

be a bit more constructive please

PicassoCT · Post by **PicassoCT** » 28 Mar 2012, 18:59

public class Classy {
Classy (boolean amIdoingItright)
{this.tldr=amIdoingItright;} //very contructive, cant deny that

fine: this whole testing game, only works if all games are constantly updated. And after the testversion is in, frozzen. Otherwise, new mistakes might creep in, same goes for lobbys by the way. So you have the "Gatherers Day" and after that a huge group of lobbydevs and moddevs asking every day in sy "Are you done yet" - or "Can you repeat my test with this updated version"..
So how do you make tests for something that is in constant motion? What you basically really need, is a replay of the "perfect" game, a game were every feature is used, every unit is used in every combination, and were the result even if the unitstats are changed are...wow, this is getting nowhere.
Okay, lets step one step back. Lets remember famous "bugs" and would this system detect them?
88.0 vs 88
Yes.
Catapulting units?
No.
Pathing problems?
No.
Certain Cob-amputations, disallowing speedSet?
No. It doesent really get disproportional effectivness drops.

Well, Sir ive done my very best. Seems i cant contribute much.
~Classy();
}

zwzsg · Post by **zwzsg** » 28 Mar 2012, 19:16

My exemple test is about pathfinding, so that would have been tested.

SpeedChange can be tested if a test about it is made. Like, I could make a test where I spawn a flow and many minifacs and check if the flow can travel between A to B within a given time.

But you're right in that writing a test for everything would take forever, and that there would always be bugs creeping in place no one thought about.

abma · Post by **abma** » 28 Mar 2012, 19:18

So how do you make tests for something that is in constant motion? What you basically really need, is a replay of the "perfect" game, a game were every feature is used, every unit is used in every combination, and were the result even if the unitstats are changed are...wow, this is getting nowhere.

by grabbing the latest version of the game through rapid... imo it would make sense to run the tests with gamename:stable + gamename:test tags. Results of :test is interesting for game devs, :stable for engine devs.

and its clear: you can't write a test for everything, thats impossible. but thats not how regression testing works. you write a test, for stuff that broke, to not let it break again.

with the regression-tests i want that every game-maker can easily add a regression test, so he can check, if his game (or at least important things) will still work with the current development-engine with his game.

its clear, that this tests won't detect any visual bugs for example, but they can check if some game-rules still work. the tests are an addition to the "normal" tests by humans, they can't replace manual tests.

PicassoCT · Post by **PicassoCT** » 28 Mar 2012, 19:29

why not let regression tests inherit there testing focus from the elements that were changed on the Masterbranch?

Basically.. if Kloots readys the red hot pathfinding iron, why not put a warning out to all moddevs, that they have to put up a mission, in which every unit occurs and that allows for all sorts of pathfinding checks. If not delivered.. whos to blame?

abma · Post by **abma** » 28 Mar 2012, 19:39

PicassoCT wrote:why not let regression tests inherit there testing focus from the elements that were changed on the Masterbranch?

thats up on the test-writer. in general, tests should check a broad range of stuff in the engine.

PicassoCT wrote: Basically.. if Kloots readys the red hot pathfinding iron, why not put a warning out to all moddevs, that they have to put up a mission, in which every unit occurs and that allows for all sorts of pathfinding checks. If not delivered.. whos to blame?

imo it makes no sense to check every unit, it would be more useful, to check every type of unit. keep in mind, that it doesn't help, if the tests runs a few days. (we also don't have the computing power to do so)

if not delivered: kloot then knows he broke something and can fix it / the gamedev has to check if he broke it. sometimes a breakage is wanted, this is why the possibility to disable a test is needed.

its not useful that all gamedevs write the same check, it would be more useful if every gamedev checks important stuff for his game (to get a broad range of functionailty checked).

PicassoCT · Post by **PicassoCT** » 28 Mar 2012, 20:02

but same goes for testing stuff that remained untouched in the engine? Why test weapons if noone did anything with them? broad range tests definutly use more maschine power. Might aswell just shove some replays through the headless maschine and compare hashes every 3 mins if they go out of sync.

oh, right replay dont exist for mods who are under constant devIng

/^^\
\°v°/

abma · Post by **abma** » 28 Mar 2012, 21:09

PicassoCT wrote:ut same goes for testing stuff that remained untouched in the engine? Why test weapons if noone did anything with them? broad range tests definutly use more maschine power. Might aswell just shove some replays through the headless maschine and compare hashes every 3 mins if they go out of sync.

oh, right replay dont exist for mods who are under constant devIng

power on brain, write tests, ... (writing good tests isn't an easy task)

exactly, replays won't work, this is why some lua gadgets / widgets / (missions?) have to be created for that.

abma · Post by **abma** » 30 Mar 2012, 00:47

hm, maybe i should have noted, that currently on each commit all ai's play a game vs itself (KAIK, RAI, E323, Shard, CAI). see the "validationtests": http://buildbot.springrts.com/waterfall

this is why i want some regression testing and not just some missions...

Post by **gajop** » 30 Mar 2012, 13:54

imo if you want content developers to be able to make tests then you indeed should think about missions (whether made with zero-k mission editor or something else)
i see no concrete limitation of missions for higher level tests assuming you're prepared to wait a bit for them to be executed, also the way dependencies are resolved/updated (including the mission running gadget) can be made according to wishes surely

now while i'm still working on it, i will eventually make an initial release of the mission editor which should run on every spring-compatible system as it's lua-only. i could put some way to allow users to mark certain trigger activation as test passing/failing if you want to run multiple tests with a single spring instance. of course if someone wants to, he could probably do the same to the current zero-k's mission editor, and use that regardless of it's limitations

smoth · Post by **smoth** » 30 Mar 2012, 16:42

I had spoken with someone in #dev, I believe it was abma about this scripted testing thing.

my thoughts are that I would setup a unit to go from point a to point b and along the way it would have to fight things/pass through repair structures. If it doesn't make it to point b then I will know something broke.

knorke · Post by **knorke** » 30 Mar 2012, 18:19

Problem I see with zwzsg's pathfinding test (and similiar tests) is that you can not be sure if the test succeeded/failed because of a improvement in engine, or if it was just random.
Maybe the pathfinding was not improved but just made different (or even worse) but for this specifique scenario it made a positive difference.
Small differences in startconditions sometimes snowball:
Maybe the testresults would be inversed if all units were shuffled a few elmos?
You would have to run lots of tests, and each test with slightly varying unitpositions, different waypoints etc.

on each commit all ai's play a game vs itself (KAIK, RAI, E323, Shard, CAI). see the "validationtests": http://buildbot.springrts.com/waterfall

It seems that only tests "engine problems" like "Waiting packet limit was reached" but not "gameplay problems"? eg if both commanders just bounced up and down on their startpositions, it would go unnoticed?

Post by **gajop** » 30 Mar 2012, 19:58

knorke wrote:Problem I see with zwzsg's pathfinding test (and similiar tests) is that you can not be sure if the test succeeded/failed because of a improvement in engine, or if it was just random.
Maybe the pathfinding was not improved but just made different (or even worse) but for this specifique scenario it made a positive difference.
Small differences in startconditions sometimes snowball:
Maybe the testresults would be inversed if all units were shuffled a few elmos?
You would have to run lots of tests, and each test with slightly varying unitpositions, different waypoints etc.

well, there are few "random" things just "randomly" happening unless you design it to be so
when I had my AI play with each other (with nearly no random elements outside of some algorithm initialization) the games would look exactly the same, and AI skirmish games are much more complex than simple "give move command to unit A to position B"
create simple tests that are to be used for simple scenarios, stuff that should work nearly 100% of the time, if that fails you can take a minute of your time to see if it was really a weird random occurrence or an actual bug

on each commit all ai's play a game vs itself (KAIK, RAI, E323, Shard, CAI). see the "validationtests": http://buildbot.springrts.com/waterfall
It seems that only tests "engine problems" like "Waiting packet limit was reached" but not "gameplay problems"? eg if both commanders just bounced up and down on their startpositions, it would go unnoticed?

that's why you should have AIs fight nullais, as well as other inferiors/superiors
unless AIs act completely randomly, they tend to win their opponents with the same conditions (same map, mod, sides and positions)
when testing my AI using springgrid, as well as other AIs among themselves you could see some patterns that can easily be converted to rules, f.e E323AI and my AI would easily beat RAI and KAIK, every time, which could be converted to a regression test for a while at least

zwzsg · Post by **zwzsg** » 30 Mar 2012, 20:32

I tell units to move from A to B. If they don't make it to B under 2 mins, then the pathfinding is not "different", it is non-functionnal.

abma · Post by **abma** » 30 Mar 2012, 20:36

gajop wrote: that's why you should have AIs fight nullais, as well as other inferiors/superiors
unless AIs act completely randomly, they tend to win their opponents with the same conditions (same map, mod, sides and positions)
when testing my AI using springgrid, as well as other AIs among themselves you could see some patterns that can easily be converted to rules, f.e E323AI and my AI would easily beat RAI and KAIK, every time, which could be converted to a regression test for a while at least

i let the ai's fight against itself, to easily find out, which ai let spring crash. also, if they fight vs itself, it could be an endless game, where the ai should use most of its code.
vs a nullai ai's would build just a few units and the game would be over.

its currently just a validation test, if the engine / ai's don't crash.

zwzsg · Post by **zwzsg** » 30 Mar 2012, 20:45

More precisely I check the presence in a 256x256 square centered on the destination, which I hope is enough tolerancy to accept units ending scattered near the destination.

Most of the tests involve one unit on an empty area of the map, so there should be no distraction, and no chaotic behavior from unit interaction.

There is also one test where I move a small group (nine units) at once. That is much harder to Spring, and many Spring version fails it. But if when you tell a small group of unit to move, some get lost in the way, then again the pathfinding is not "different", it's not "improved" but broken.

The point is that to find this out, instead of having players complain on the forum that pathfinding is broken, and devs ignoring them, you get a simple, repeatable, unambigious test.

knorke · Post by **knorke** » 31 Mar 2012, 02:25

well, there are few "random" things just "randomly" happening unless you design it to be so

Yes, but there is the difference if something always happens the same way or if you can predict how it will happen.
If something is complex enough, the output can be "random" (and hard to predict) if parameters are changed just a bit. I think pathing in spring is like that.
Sometimes a unit will be too stupid to go from A to B, but have it start just a little bit to the side, and it will reach its destination.

But if when you tell a small group of unit to move, some get lost in the way, then again the pathfinding is not "different", it's not "improved" but broken.

Well, I once noticed this in my mod:
2 harvesters would always meet head-on and block each other forever. (one going out to mine, other returning to dropoff/refinery)
->Had this been a test, it would have failed.
Moved the dropoff/refinery a few pixels and they got past each other.
->Had the test been run like that, it would have passed.
So such small (basically random) things can have a huge impact on tests, just like new pathing engine. In some cases the small things might even overshadow the changes in engine: So for pathing it is imo important to run many tests with slightly different coordinates.
(just spawn some more units pseudo-random in 1 minute intervalls or so)

Spring RTS Engine

Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test

Re: Automated Release Test