Delta Tolling for Improved Route Planning
Pehuen Moure reinforcement learning multi-agent path planning marginal cost pricing delta tollingDelta-Tolling in Grid-Based Multiagent Systems
For a single agent, the problem of warehouse navigation can be considered a simplistic path finding problem over a gridworld environment. With multiple agents existing and acting in this environment, however, the problem of coordinating actions over all agents becomes more difficult. In order to learn the optimal sequence of actions to reach their individual goals, agents must also learn to plan around the actions of others in order to avoid deadlock. This problem is exacerbated in environment configurations prone to congestion.
This problem of multi-agent pathfinding can be framed as a traffic flow problem in which the delta-tolling framework introduced in [1] can be applied. Under this framework, the environment is characterized by a directed, weighted graph where the goal of each agent is to traverse from its current vertex to some other vertex via a least cost path. Each edge in the graph is associated with a latency l representing the travel time over that edge and a toll t which is a dynamic value estimating cost of traversing that edge. Delta-tolling is a model free method of evaluating marginal-cost tolling (charging each agent a cost equivalent to the cost it inflicts on all other agents) which requires only the latency on each edge and requires no internal traffic model [1]. Delta-tolling is parameterized only by a proportionality constant beta, and a smoothing parameter R. In each iteration, the toll of each edge is updated as the weighted average of the current toll and the difference between the observed and freeflow latencies. The weight assigned to each is dependent on R.
The multi-agent warehouse pathfinding problem can be framed as a degenerate case of this delta-tolling framework. In transforming this gridworld environment into a traffic flow network such as those used in delta-tolling, we construct a graph with vertices represented by reachable (i.e. non-obstacle) states and edges represented by actions that allow agents to transition between these reachable states. In this graph, edges will only be associated with latencies of 0 or 1 representing that the state into which the agent is transitioning is either obstructed or unobstructed respectively. The freeflow latency of each edge in this graph is 1 since only one agent can occupy a state at a given time.
References:
-
Ramos et al., 2018, Learning System-Efficient Equilibria in Route Choice Using Tolls
-
Mirzaei et al., 2018, Enhanced Delta-tolling: Traffic Optimization via Policy Gradient Reinforcement Learning