3 Pitches

with No Comments

Machine Learning and Dueling:

A fighters skill comes from their training, their learned experience through failure, and thousands of reps. This is also exactly how reinforcement learning agents gain their skills– I propose a mutli-agent learning experiment where we attempt to teach two agents how to fight with swords and shields.

There are a couple common issues that hinder how AI’s learn in a 3D space. First, multi-agent learning (learning with two or more agents) is notoriously challenging. This is made difficult because an agent not only must learn how to move in general, but also move around another agent. As of 2017, a recent algorithm called MAD-DPG has proposed a solution to this problem, but it’s still buggy and could present issues. Additionally, without proper motion capture data, the agents will learn incorrectly, most commonly seen in issues like body parts artifacting through the ground. Finally, while I have motion capture data for individual fighting movements, such as slashing a sword or blocking with a shield, it is much more difficult to find motion capture data that of two humans sword fighting. Finally, swords and shields will be weighted, throwing off the model’s sense of balance, which will take lots of training to get used to.

Using Unity 3D machine learning agents made possible through PyTorch, CUDA and the unity facing engine, ML-agents, I propose  I propose creating a deep reinforcement learning agent. Our control strategy would have 2 parts, a pretraining stage (a deep learning model used to create a neural network to  solve problem) and a Transfer learning stage, which uses a prebuilt neural network from our pretraining stage to solve a problem.

Steps of pretraining:

  1. Feed the network a collection motion capture data of sword fighting sourced from Adobe Mixamo to understand basics, such as how to hold a sword and shield, how to walk with one, etc.
  2. This motion capture data is passed through a Imitation Policy Learning network, which tries to learns through imitation of the motion captured movement.
  3. The goal from this point is to pass the data is to translate our Imation Policy to a Comptetive Learning Policy, which learns through the goal of winning. To do this, we must preform two steps:
    1. Task encoding, adding understanding to goals of movements
    2. Motor decoding, decoding motor function in a way the competitive Learning Policy can understand

Steps of Transfer Learning:

  1. Turn the Motor Decoding and Task Encoding data into learning policies for both agents (Agent 1/Agent 2)
  2. Run these policies through Competitive Learning Policy, which mimics the way humans learn skill, through trail and error against eachother.

Social Simulation of Scaricity:


Scarcity–how do we divide resources fairly when their isn’t enough for everyone? The entire subject of Economics boils down to this question, and has caused many conflicts throughout history. Sometimes, we have to make tough decisions because of scarcity.

I propose an experiment in a simulated world, with two “tribes” of agents. The agents must eat once a day, which is done by collecting food blocks. Each agent has the ability to move, eat, collect food for their village, ask others to share, and kill eachother. Each agent’s main priority is to eat for as many days as possible. If they go a full day without eating, they starve to death.

Then, we test these tribes under different levels of scarcity:

  1. Plenty: There is enough food for both tribes.
  2. Moderate: There is enough food for 75% of agents, but one tribe will not have enough.
  3. Scarce: There is only enough food for 1 tribe.
  4. Unfair disruption: There is only enough food for 1 tribe, and 1 tribe controls all of it.

If the agents have plenty, will they remain peaceful? If there’s a moderate distribution, will the tribes share with eachother and lose the same number of members? Will they fight the other tribes for resources? If the resources are scare, will a tribe sacrifice members of their own tribe, or steal from the other tribe? Is fighting inevitable with an unfair disruption?

In order to do this experiment, we must allow for a level of teamwork. Each tribe has their own Competive Learning Policy which builds a nueral network, judging its success by how many tribe members are left after a given period of days. These networks may give tasks such as move, attack or collect food to any tribe member. 

Usually, two agents learning against eachother means slower learning, but because the neural network are essentially playing a simple strategy game against eachother, they’re learning might be accelerated due to the competition. 

Creating Unique Architecture with Machine Learning:

Necessity, the mother of invention. From the water carrying aqueducts of Rome to the houses stacked across the mountains of Tibet, people have made unique and intricate structures out of necessity. Typically, neural networks are trained with data sets, which they then attempt to imitate according to a metric of success. However, what they produce isn’t truely unique, it’s an imitation of what a researcher asks them to create. I propose an experiment to create entirely unique, AI generated 3D architecture by providing the necessity to build.

First, we must make environment simple enough for an neural network to learn to create in, but succifciently complex enough to cultivate interesting innovation. Unity 3D is ideal to create this enviroment, as it already has a well-developed physics engine, adding additional enviromental conditions is simple, allows for easy map creation, has many free assets to use, and supports machine learning.

Second, we must create necessity. Two agents will be created, a Creature and a Creator agent. The Creature is a simple bot, who’s only goal is to keep his three needs satisfied. The creature wants to stay warm, dry, and well fed. Warmth is decreased when “wind” comes in contact with our agent, dryness is decreased when rain comes in contact with our agent, and hunger is decreased by collecting food blocks.

The Creator agent has a simple goal, keep the Creature as warm, fed, and dry as possible. The Creator is a neural network with the ability to build, under the conditions of the physics engine. This neural network has permission to use a limited amount of two different resources: Wood and stone. Stone is much stronger then wood, but also much heavier. Wood is weaker, so it breaks under weight,  but also lighter and more accessible. Wood and stone come in blocks, and wood also comes in plank form, which is worse for walls but better for roofs. The Creator has the ability to “nail” materials together, which means the materials may be connected together through touching vectors, unless the weight of materials attached to it is too heavy to hold.

Beyond coming with a well developed and well tested physics engine, the beauty of the Unity engine is it’s ease of customimability. Because it’s a game engine, adding new conditions is made easy. Once our Creator network has learned to build basic structures, we can up the complexity of the structures by adding more needs. Possibilities include:

  1. Pooling water: If rain can collect in pools, structures may have to be constructed with slanted roofs
  2. Stronger winds: The stronger winds destroy less stable structures, structures may have to be built more sturdy.
  3. Creature needs sunlight: The need to have sunlight come into the house may lead to structures having windows, or interesting alternatives

The goal is to create unique and interesting structures, and a simulation made too simple will lead to uninteresting building. Unity 3D’s easy ability to add and remove these conditions will be vital to this experiment, as these new conditions may create innovate, or break and confuse our neural network.

Leave a Reply