Pitches

Use deep reinforcement learning to tune the hyperparameters (learning rate, lambda – regularization parameter, number of layers, number of units in each layer, different activation functions) of a Neural Network. The overall cost function of RL agent will include the metrics such as accuracy of the NN (or F1 score) on training and validation sets, time taken to learn, the measures of over/underfitting. This network would be trained on different types of problems.
For this idea, I’m using the game of Pong (ATARI) as a test environment. My plan is to introduce a specific pipeline in training the AI agent to play the game. Instead of directly using the Policy Gradients, I will train the agent to guess the next frames in the game. First, I will use RNN to learn (approximate) the transition function in an unknown environment. The transition function, modeled by a Recurrent Neural Network, will take previous n states of the game(in raw pixel form) and agent’s action, and output the state representation that corresponds to the future state of the environment. The intuition behind this is that the agent will first learn the ‘laws of physics’ of a certain environment (exploration) and this will help the agent learn how to play the game more efficiently. After learning the weights of the transition function, I will implement the Reinforcement Learning algorithm (Policy Gradients) that reuses the learned weights (transfer learning) and train this deep neural network by letting in play a number of games and learn from experience.
I will train a CNN to be able to verify, given the images of handwritten text, if two handwritings belong to the same person. In order to generate more labeled data, I will use a dataset with images of handwritten texts and break up each image into the windows containing a few words. I will assume that each word written on a single image belongs to one person.

Davit Kvartskhava

Leave a Reply Cancel reply