We are developing an AI with RL (Reinforcement Learning). When using RL we need to train the AI throughout several thousand games.
For this purpose we use the headless spring, which works very well. However, it is a problem that you can't directly reset a game from the AI. This means that you need to do a complete restart & reload of the game for every match. This loading takes ~13 seconds, which is alot compared to the actual gametime of 5-10 seconds at speed 120. Waiting the ~13 seconds for every single match is a huge waste of time.
As an attempt to solve this, we self-destruct all our units and add a new commander (using cheat) at the start position. This way we can somewhat start a new game, but it is an ugly approach. It still has the problem that a self-destruct explosion destroys the ground underneath, slowly revealing water and making the area unusable.
So now to our actual feature request Why not either add a remove-unit cheat (without explosion and countdown), or even better add a possibility to reset/restart the game, without having to reload all the spring resrouces again. Such a reset/restart would simply reset the map and reload the AI's.
You have to edit the mod file itself. (Unzip it with 7z, and zip it back with 7z using normal compression mode). Look at luarules\gadgets inside the mod, Those are lua scripts that are, how do I put this, unlimited in their power to change game state.
What RL algorithm are you using specifically? (Q-Learning, Sarsa,...)
This AI is part of a Master Thesis in Software Engineering, which i am doing along with three other people (on this forum: allanmc, initram, jepperc and shredguitar).
Currently we are trying to use BN to classify the opponent, and RL only to build the base (reach many labs as quickly as possible). In this simple RL we just have 3 build actions: Lab, Solar, Mex. The counts of these are used as state variables - Lab and solar can be in the range 0-19, and Lab in the range 0-4. The time used to build any of these buildings is used as negative reward, which ensures that the AI eventually figures out that it needs resources to build quickly. The goal state is reached when 4 labs has been build, and this results in a high positive reward. This has been implemented with standard Q-Learning, and works very well.
Right now we are in the middle of expanding the use of RL to more usefull cases, such as complete base building, attacking, scouting, etc.. You are right that with these bigger problems, the state space, among other things, can become a problem. Therefore we are currently looking into the possibilities with hierarchical RL: http://www.ijcai.org/papers/1552.pdf
Users browsing this forum: Yahoo [Bot] and 2 guests
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum