Off Policy Meta-Reinforcement Learning


  • Our team explored off-policy algorithms for meta-reinforcement learning, benchmarking them on meta-world
  • The two main approaches were based on used latent context variables or off-policy MAML style gradient updates
  • Developed code to emprically test these approaches using rlkit, with results and observations being outlined in the report