At a glance: Our goal is to explore models for observation, state estimation, and control in OpenAi Gym's Doom environment

  • First Stage: Observation and Classification
  • Second Stage: Motion Model
  • Third Stage: DQN Learned Policy
  • By 11/15, we intend to complete the observation portion. This entails training a modified version of the You Only Look Once (YOLO) algorithm to predict bounding boxes for monsters based on images provided by the environment.

    For the midterm evaluation on 11/22, we would like to be able to demo the observation portion, along with a conversion from the bounding boxes provided by YOLO to a polar grid with monster probabilities.

    The next step after observation will be to build a model of the state through time (where monsters are in relation to our character). The character has a limited field of view, so it will be necessary to develop a motion model to reason about where monsters that we have turned away from are. This should be completed by 11/29

    Lastly, we plan to use the motion model as input for learning a policy. We will aim to complete this by the final deadline.

    If all of these parts are completed, there are a number of potentially interesting extensions. One would be to compare the results of this end-to-end approach with one similar https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf, in which researches at deepmind used deep reinforcement learning to learn policy using the raw pixels as input. Conversely, there are simplifying changes which could be made should the project prove too difficult. For example, by restricting the enemies to a type that runs at a constant rate towards the player, we could eliminate the need for a motion model.