This week, me and Prathyush worked on implementing the algorithms we discussed in our blog posts from last week (here and here). So far we have our baseline algorithm from previous research implemented and are in the process of implementing our proposed algorithm. The two main components of the algorithm are the PPO and the PGPE steps. For the PPO steps we are using a version implemented in TensorForce, a reinforcement learning framework implemented on top of TensorFlow. For the PGPE steps we implemented a version of multimodal PGPE in numpy. Now that we have an implementation of the algorithm from the papers we read we hope to have our proposed algorithm implemented soon. In theory this should be completed soon, as all the components of the algorithm are already implemented.