Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning This paper optimizes parameters in a given robot’s structure and neural network based controller concurrently using policy gradients. The work presented in the paper applies this algorithm to legged robots in a uniform flat surface. The policy is first trained for 50 million timesteps, then alternatingly, the structure parameters are trained for 2 iterations and the policy parameters are trained for 50. This method applied on three separate parameterized robots greatly improved rewards.

In contrast, our work aims to concurrently optimize for any parameterized structure with arbitrary parameterized controllers (PID, Neural network, etc.). We also hope to optimize the robots for various tasks in different environments and compare the results in simulation and in the real world.

Concurrent Optimization of Mechanical Design and Locomotion Control of a Legged Robot This work optimizes 6 design parameters and 25 control parameters for the quadruped StarlETH robot. They construct a specific cost function to optimize the robot’s running speed for a set of specified gaits. The solved the parameters in the cost function using evolutionary algorithms and succeeded in improving the robot’s speed.

Joint Optimization of Robot Design and Motion Parameters using the Implicit Function Theorem This paper formulates a method of optimizing the trajectories and structure of end-effectors. They essentially find a set of parameters that would satisfy the constraints on the motion and design imposed by the properties of the robot and use the implicit function theorem to solve for the parameters. However, this assumes that the robot’s characteristics and design are completely known and requires human guidance to come up with the equations that describe the robot.

Functional Co-Optimization Of Articulated Robots This paper optimizes the structure and control of a robot by constructing a parameterized mathematical model of the robot and optimizing the model using nonlinear programming. The parameterized model is constructed using the robot’s state, actuation, and contact forces. The model is strictly dependent on the robot’s morphology, creating a need for a new set of equations to be generated for different type of robot. Furthermore, the optimization, which is done using SNOPT, is only guaranteed to converge if the initial values are close to the optimal values. Otherwise, new random initial values are chose and the algorithm is run again.

Concurrent Design Optimization of Mechanical Structure and Control for High Speed Robots[1].pdf Integrated Structure/Control Design of High Speed Flexible Robots Based on Time Optimal Control Both papers above are quite similar. The papers document the work done to manually optimize the PD gains in a PID controller along with the masses of each of links in a robotic arm. Essentially, a mathematical model for the arm with 2 links is constructed, and a controller that enables high speed positioning is attempted to be found. After doing some math, values for Kd and Kp are found in terms of the masses. The values of the masses are then varied to find the optimal weight for each link of the arm that increases speed for the positioning. All the work presented in this paper was specific to the provided problem but some of the work could potentially be generalized to other robots and controllers.

Integrated structure/control design of mechatronic systems using a recursive experimental optimization method The work presented deals with optimizing the control and structure recursively. At first, the gains for the controller are tuned on a physical robot. Measurements are taken to verify that the required metrics are optimal. Then, the values of the gains are evaluated on the given system. If the best performance of the current structure is not good enough, the design is modified a little bit based on a sensitivity Jacobian. Then, the system is tuned once more to find the optimal controller gains, and new values for the gains are found then evaluated. The process is repeated until the gains and the structure are optimal at which point the recursion ends.,5

Next Post Previous Post