This week, I worked on implementing an API that the PGPE algorithm can use to train the parameters for the neural network that controls the robot as well as the parameters that dictate the robot's structure. Gym is a framework that contains a collection of preset environments that machine learning models can be trained on. This framework provides a few useful functions that create a layer of abstraction between the parameter optimization done by the pgpe algorithm and the task specific simluation and training that need to be done in order to acquire data. Gym, however, has a few shortcomings in that all the robots and worlds it supports are hardcoded, and many of them do not allow the robot to be controlled in a 3d space. To fix this problem, I have been looking into creating an interface tied to Gazebo, a 3d robot simulation software, that offers a gym-like abstraction between the training algorithm and the physical task. For now, I plan to implement the following functions:
- Load: This will load the model into gazebo and intialize the environment so that it is able to accept commands that will change the robot's state.
- Reset: This will reset the environment to the initial state from which the training starts
- Step: This function takes in an action from the half trained neural network, runs the action in simulation,and returns the results of the action in relation to the task in terms of a reward as well as the state that executing the action has placed the world in.
For now, I believe that these functions will be sufficient for the pgpe to optimize the parameters but in the occasion that it is not, I will add more at a later date.