- How did you define the states?
- What actions have you got?
- What is the reward function?
- What RL algorithm are you using specifically? (Q-Learning, Sarsa,...)
This AI is part of a Master Thesis in Software Engineering, which i am doing along with three other people (on this forum: allanmc, initram, jepperc and shredguitar).
Currently we are trying to use BN to classify the opponent, and RL only to build the base (reach many labs as quickly as possible). In this simple RL we just have 3 build actions: Lab, Solar, Mex. The counts of these are used as state variables - Lab and solar can be in the range 0-19, and Lab in the range 0-4. The time used to build any of these buildings is used as negative reward, which ensures that the AI eventually figures out that it needs resources to build quickly. The goal state is reached when 4 labs has been build, and this results in a high positive reward. This has been implemented with standard Q-Learning, and works very well.
Right now we are in the middle of expanding the use of RL to more usefull cases, such as complete base building, attacking, scouting, etc.. You are right that with these bigger problems, the state space, among other things, can become a problem. Therefore we are currently looking into the possibilities with hierarchical RL: http://www.ijcai.org/papers/1552.pdf