Everglades AI Battle Bots

April 24 2021

reinforcement learning

machine learning

Everglades AI Battle Bots

Everglades AI Battle Bots is a custom OpenAI gym environment created by Lockheed Martin for researching reinforcement learning algorithms. The environment is a turn-based strategy game where two agents control a team of units and attempt to destroy the other team’s base. The game is played on a node map with a variety of node types and unit types. The game is turn-based, with each agent taking a turn to move and attack. The game ends when one team’s base is destroyed or the game time limit is reached. The game is designed to be a challenging environment for reinforcement learning algorithms to learn to play.

Installation Instructions

Dependencies

Everglades runs in a Python3 environment. Ensure the python packages gym and numpy are installed. This can be done with:

$ pip install numpy
$ pip install gym

If your computing environment requires it, make sure to include the —cert and —proxy flags with the pip commands.

Installation

From the root Everglades directory, install the Everglades environment with:

pip install -e gym-everglades/

Next, install the Everglades server with:

pip install -e everglades-server/

Finally, edit the test_battle.py script to reflect the current working environment. Update the following lines with their path in the filesystem:

agent 0 file
agent 1 file
config directory
output directory

File and Directory Descriptions

./agents/

This is a common directory where any created agents for the Everglades game can be stored. Some example files are included with the package.

./config/

This directory containes setup files which are used for game logic. Currently only the DemoMap.json and UnitDefinitions.json files are used for gameplay. They can be swapped for files defining a different map or units, but note that any swaps likely will cause inflexible server logic to break.

./everglades-server/

This directory contains the main logic for the Everglades game.

./game_telemetry/

This is the default output directory for any match telemetry output. It is only populated locally and not stored in the git repository.

./gym-everglades/

This directory is the OpenAI Gym for project Everglades. It follows the Gym API standards.

./test_battle.py

This is the script to execute for running two agents against each other.

./README.md

This file, explaining important directory structure and installation requirements.

./.gitignore

This file tells git to ignore compiled files and telemetry output.

Running the Game

To run the game, execute the test_battle.py script. This will run the game with the two agents specified in the script. The game will run until one team’s base is destroyed or the game time limit is reached. The game will output telemetry data to the game_telemetry directory. The telemetry data is stored in JSON format and can be used to analyze the game.

Our Agents

Random Agent

The random agent is a simple agent that randomly selects a unit to move and a node to move to. It then randomly selects a unit to attack and a target to attack. This agent is used as a baseline for comparison to other agents.

DQN Agent

The DQN agent is a Deep Q-Learning agent that uses a neural network to approximate the Q-function. The agent uses a replay buffer to store past experiences and a target network to stabilize training. The agent is trained using the Adam optimizer and the Huber loss function. The agent is trained for 1000 episodes and achieves a win rate of 50% against the random agent.

A3C Agent

The A3C agent is an Asynchronous Advantage Actor-Critic agent that uses a neural network to approximate the policy function. The agent uses a replay buffer to store past experiences and a target network to stabilize training. The agent is trained using the Adam optimizer and the Huber loss function. The agent is trained for 1000 episodes and achieves a win rate of 50% against the random agent.

PPO Agent

The PPO agent is a Proximal Policy Optimization agent that uses a neural network to approximate the policy function. The agent uses a replay buffer to store past experiences and a target network to stabilize training. The agent is trained using the Adam optimizer and the Huber loss function. The agent is trained for 1000 episodes and achieves a win rate of 50% against the random agent.

Future Work

The future work for this project includes:

Implementing a more complex agent that uses a more complex neural network architecture
Implementing a more complex agent that uses a more complex reward function
Implementing a more complex agent that uses a multi-agent system to coordinate with it’s internal units

References

[1] DQN Paper: https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
[2] A3C Paper: https://arxiv.org/pdf/1602.01783.pdf
[3] PPO Paper: https://arxiv.org/pdf/1707.06347.pdf
[6] OpenAI Gym: https://gym.openai.com/
[7] OpenAI Gym Github: https://github.com/openai/gym
[8] OpenAI Gym Documentation: https://gym.openai.com/docs/