Failing gracefully

AF · Post by AF » 22 Jul 2009, 15:40

If your code does not work, it should fail gracefully.

Its a nice and simple concept. It allows you to tell the user why it failed ( or that it failed at all ). "My spring crashes randomly" versus "ABCAI does not support this version of BA, XYZAI carrys on working", which sounds more helpful?

If your AI only supports a handful of games, make sure it tells the user when it finds itself running under a non supported game.
Dont crash
Don't continue to run code once you know that a crash is inevitable, check for a boolean if its ok to continue and halt before any AI routines are executed if its not possible.
Self destruct all AI units if you want, its still better than crashing spring.
Write out error logs
Stop continuing execution when an exception is thrown, and catch exceptions, don't let them propagate to the engine
Collect bug reports on a crash

I have seen people see arguements along the following lines used int hese forums to counter this:

We should let spring crash so that the user knows something is wrong because the AI is not going to work anyway

This is nonsense, lazy programming, failure to accept responsibility for your own crashes, ignorance of bugs in your code base, improper debugging, and downright irresponsible development.

A program should never crash due to improper environments or bugs. It should detect or catch the problem in progress, shut itself down, send of error reports for analysis, and give the user details of what happened and tell them the program is unable to continue.

Tobi · Post by **Tobi** » 22 Jul 2009, 16:07

+1

Would help Spring devs too if we get less crashes reported from AIs just cause it was mod they don't know how to play.

Post by **Kloot** » 22 Jul 2009, 16:42

Sorry, but get off your high horse.

AF wrote: * If your AI only supports a handful of games, make sure it tells the user when it finds itself running under a non supported game.

Requires maintaining either a whitelist or a blacklist, both of which are a PITA since games evolve much more quickly than AI's do.

AF wrote: * Dont crash

1. Don't kick down wide-open doors. Your point is analogous to saying "don't ever make programming errors" which we would all like, maybe you're volunteering to formally verify correctness of all AI code?
2. AI's can crash through unknown engine inconsistencies, who'ya gonna call then? (most recent example: E323AI and http://github.com/spring/spring/commit/ ... d25d7d1ae6)

AF wrote: * Don't continue to run code once you know that a crash is inevitable, check for a boolean if its ok to continue and halt before any AI routines are executed if its not possible.

Crashes actually tend to be unpredictable, any non-trivial code will have a chance of breaking even after adding the usual NULL guards or spamming try / catch blocks everywhere (which is itself bad practice). If it could be done so systematically all AI's would be crash-free by now.

AF wrote: * Stop continuing execution when an exception is thrown, and catch exceptions, don't let them propagate to the engine

I'd expect an AI dev who uses exceptions to also catch them, no?

AF wrote: * Write out error logs
* Collect bug reports on a crash

Still depends on the user to provide said logs, which is rarity for Spring crashes even.

Pxtl · Post by **Pxtl** » 22 Jul 2009, 17:07

Correct me if I'm wrong - but doesn't the Spring engine itself handle exceptions thrown by AIs? So the only way an AI should crash the engine altogether is by segfault, isn't it?

And the only excuse for segfault involves cosmic rays flipping bits on your computer.

AF · Post by AF » 22 Jul 2009, 17:20

Kloot wrote:Sorry, but get off your high horse.

Well Im sorry but if this si my high horse then Im going to have to kick a lot of other people off to claim it. Tobi included.

Jeff Attwood argues my case much better though:

May 19, 2008 - Twitter: How Not To Crash Responsibly
August 2, 2007- What's Worse Than Crashing?
May 18, 2008 - Crash Responsibly

Kloot wrote:
AF wrote: * If your AI only supports a handful of games, make sure it tells the user when it finds itself running under a non supported game.
Requires maintaining either a whitelist or a blacklist, both of which are a PITA since games evolve much more quickly than AI's do.

So your saying my ABCAI which plays TA based mods should attempt to play kernel panic despite me having no intention of supporting kernel panic? Perhaps I should let my kernel panic only AI play fibre?

When you know for definite that your AI wont work under a certain game, you shouldn't attempt to play it anyway.

This is like saying we should all use the same lead and connectors for power cords and usb and kettles and microphones because we dont know how fast microphones and kettles are advancing.

Why am I sure that this is not me talking out my arse?

Because I have done it before

Kloot wrote:
AF wrote: * Dont crash
1. Don't kick down wide-open doors. Your point is analogous to saying "don't ever make programming errors" which we would all like, maybe you're volunteering to formally verify correctness of all AI code?
2. AI's can crash through unknown engine inconsistencies, who'ya gonna call then? (most recent example: E323AI and http://github.com/spring/spring/commit/ ... d25d7d1ae6)

When your code crashes, I suggest the following code as a safety net:

Code: Select all

try
{
// code here
}
catch (...)
{
// print out error to log
}

When a device or algorithm fails, and that failure is detectable, isolate and disable it to prevent further damage, and tell the user that something went wrong.

Why am I sure that this is not me talking out my arse?

Because I have done it before

Kloot wrote:
AF wrote: * Don't continue to run code once you know that a crash is inevitable, check for a boolean if its ok to continue and halt before any AI routines are executed if its not possible.
Crashes actually tend to be unpredictable, any non-trivial code will have a chance of breaking even after adding the usual NULL guards or spamming try / catch blocks everywhere (which is itself bad practice). If it could be done so systematically all AI's would be crash-free by now.

NTai was built in such a way that there was a proxy class, a divide. this class was basic, and on one side was the engine, on the other side was the AI. When the AI crashed, the exception propagated up to this class were it was caught and added to an error log. A boolean flag was set and any further calls to the AI from the engine resulted in the following 1 liner at the start of every function failing:

Code: Select all

if(!okaytorun) return;

While this code was simple, yes it could itself have crashed, however the chances of this code crashing were astronomically small compared to the AI code that could be sitting in the other classes. I would rather than code with a tiny chance of crashing caught my exception and printed it to a log, rather than my main code crashing and generating a cryptic message.

Why am I sure that this is not me talking out my arse?

Because I have done it before

Kloot wrote:
AF wrote: * Stop continuing execution when an exception is thrown, and catch exceptions, don't let them propagate to the engine
I'd expect an AI dev who uses exceptions to also catch them, no?

Are you sure it was the AI developer who threw the exception? Or a library? perhaps it was the STL due to an unknown bug? Would I rather catch this bug so I can fix it in the next version? Maybe, or maybe I want an obscure crash and complaining users who give no useful information?

Why am I sure that this is not me talking out my arse?

Because I have done it before

Kloot wrote:
AF wrote: * Write out error logs
* Collect bug reports on a crash
Still depends on the user to provide said logs, which is rarity for Spring crashes even.

Nonsense!!! NTai AAI OTAI JCAI all outputted logs. We AI developers asked for these logs on a regular basis, and these logs were detailed. Since the AI you worked on mainly is rooted in KAI which had most logging code removed or never used, you do not have the same experiences we do, and any AI developer whose written a major AI here will tell you logs are not just important, they are essential.

I personally would have been unable to fix, or severely slowed down if it weren't for logging in NTai, and I know submarine and veylon would say the same thing about AAI and OTAI.

Why am I sure that this is not me talking out my arse?

Because I have done it before

AF · Post by AF » 22 Jul 2009, 17:22

Pxtl wrote:Correct me if I'm wrong - but doesn't the Spring engine itself handle exceptions thrown by AIs? So the only way an AI should crash the engine altogether is by segfault, isn't it?

And the only excuse for segfault involves cosmic rays flipping bits on your computer.

For a while, things like divide by zero would not be caught by the exception handling on the engine side of the divide unless spring was built using Visual Studio. Im not sure what its like now with the latest mingw, but Id rather catch and log the exception AI side so I can implement logging or even file it away automatically on my sites bug tracker.

Error323 · Post by **Error323** » 22 Jul 2009, 17:31

I do agree with Kloot on this. When I find my A.I. is crashing for some reason, I try to fix it properly (or in case of an engine bug, report it), not hack around it with weird code or ehhh self-d the units (wtf)?

I'm not really sure what it is that you actually want to say here. When a SIGSEGV occurs it seems quite clear to me that it should be debugged or be dealt with in some way. Perhaps you wanna share a new method on howto do this? Otherwise I don't understand this thread lol.

IIRC spring devs are working on the AI's being implemented as git-submodules, at which point there will be at least two branches:

stable
another, e.g. master, unstable...

This will hopefully reduce crashes.

AF · Post by AF » 22 Jul 2009, 17:37

In NTai I had my exception proxy class, and a macro switch. When I built release AIs I had exception handling turned on so that when an error occurred on the end users system, the error was caught and the AI wound down and disabled itself.

The player was told the AI had crashed and a report was filed in the error log, and the user told to send the error log to me.

If I was running a build in a debugger, the macro switch would be set to not use exception handling. Crashes would propogate up and get caught in the debugger where I could fix them properly as you suggested.

I am not suggesting we throw away a debugger and rely on ancient methods. What I am saying is that when our code fails in the end users environment, our code should not crash and burn, we need to provide mechanisms to collect information about what happened and wind down operations.

Without this we may be baffled by errors and problems that we cannot fix using a debugger because they are dependent on the end users environment setup, and as such are not reproducible. Figuring out these kinds of bugs is a lot harder when you crash and burn, but thanks to failing gracefully and logging information for later debugging use, I and others have come across these kinds of bugs and quickly figured out whats gone wrong.

AF · Post by AF » 22 Jul 2009, 17:40

As for git submodules, whats wrong with having your own git repository at github and merging in stable builds to master? That's how NTai works. Whenever I consider a build of NTai to be stable or non experimental and Im happy all I have to do is poke someone with access to the spring git repo to merge my changes in.

Error323 · Post by **Error323** » 22 Jul 2009, 17:45

Well that does makes sense indeed. If I understand correctly here, you are talking of a method that allows you to prevent a segfault for end users wherever in the AI this may occur? Can you shed some more light on how to achieve this in detail?

Error323 · Post by **Error323** » 22 Jul 2009, 17:47

AF wrote:As for git submodules, whats wrong with having your own git repository at github and merging in stable builds to master? That's how NTai works. Whenever I consider a build of NTai to be stable or non experimental and Im happy all I have to do is poke someone with access to the spring git repo to merge my changes in.

Well, nothing I guess, if you have the privileges

. But using a stable/non-stable system this could be automated.

AF · Post by AF » 22 Jul 2009, 18:15

In NTai I had a macro switch along the lines of ENABLE_EXCEPTION_HANDLING

then in the proxy class I would have something along the lines of:

Code: Select all

void UnitFinished(int unit){
    START_EXCEPTION_HANDLING
    ai->unitfinished(unit);
    HANDLE_EXCEPTION("exception in ai->unitfinished")
}

Then if ENABLE_EXCEPTION_HANDLING was defined, then START_EXCEPTION_HANDLING etc would be defined e.g. try { etc.

Thats a generic and rather crude example of it, and I'm sure theres a far cleaner way of doing it. In some places I would just call ai->unitfinished() and inside that method I would have multiple exception blocks, some critical, some noncritical, so that the AI didnt always disable itself if trivial code crashes and it were able to pick itself back up and carry on.

I also had extensions ot my logging system that were very verbose, so that if normal logging was insufficient, and there was no chance of a debugger on the end users system, I could rebuild NTai with extra logging to help figure out how to reproduce the problem. I used amcros for that too, but I am thinking that macros are unnecessary for that and I'm considering changing that soon

Post by **hoijui** » 23 Jul 2009, 09:29

I kind of agree with AF and Kloot.

First, to the facts of the AI Interface:
The engine does catch exceptions thrown by AIs, but hte AI should not rely on that. each call from the engine into the AI is an AIEvent. These are handed over to the AI via an int handleEvent(event) call. The return value of this call is meant to be used for error reports, eg 0 is ok, everything else is an error. Legacy C++ AIs can not return this value, as their events are all void functions. As i was too lazy to change this, i made the legacy wrapper catch exceptions and then return an error int to the engine. So legacy AIs have to use exceptions, because of a dirty workaround.
One reason the engine should not catch exceptions is because it is actually a C interface, and not a C++ one. it would need ugly code to allow eg Java or C AIs to throw C++ exceptions.

Also, Kloot is right of course, in that a lot of crashes, the bigger part since the introduction of the new interface, are caused by engine bugs, not AI ones. Before that though, it were more the AIs. The problem there is, as Kloot said already, that the AIs are not as heavyly maintained as the engine, and most AI devs are not as closely in touch with the engine devs as Kloot. Also, AI devs test their AI on one system only. Nearly all of the AI bugs i fixed before the introduction of the new interface, showed up because the AI devs only tested on 32bit and not on 64bit, or becuase they use visual studio and did not get some warnings they would have gotten witt mingw, or similar things. Of course you can not await an AI dev to test his AIs with all compilers and system types spring runs on, so it is ok in my eeyes that an engine dev has to fix these things. well... of course it would be better if AI devs wrote perfect code instead

nearly all AI related crashes happen because of segfaults, and then try-catch around all the AIs events would not really help, as C++ exceptions are things thrown by throw, and segfaults are not exceptions eg (as i got AF, visual studio has an option to handle segfaults as exceptions too).

Git submodules:
This shifts all the responsibility of maintianing the AI over to the AI devs. It is the cleaner method, and overall more flexible. It allows the buildbot to e rewritten to automatically fetch latest stable version of an AI whenever rebuilding, for example. If the AI dev wants the engine devs to be able to maintian the AI too, he can give them commit access to his repo. If even a stable branch of an AI fails to compile, the buildbot could have a switch to exclude this particular or all AIs.

edit: engine does catch AI exceptions, but you should not rely on that

Spring RTS Engine

Failing gracefully

Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully

Re: Failing gracefully