Title :
Decentralized learning for traffic signal control
Author :
Prabuchandran, K.J. ; Hemanth Kumar, A.N. ; Bhatnagar, Shalabh
Author_Institution :
Dept. of Comput. Sci. & Autom., Indian Inst. of Sci., Bangalore, India
Abstract :
In this paper, we study the problem of obtaining the optimal order of the phase sequence [14] in a road network for efficiently managing the traffic flow. We model this problem as a Markov decision process (MDP). This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller). Each agent optimizes the order of the phase sequence using Q-learning with either ∈-greedy or UCB [3] based exploration strategies. The coordination between the junctions is achieved based on the cost feedback signal received from the neighbouring junctions. The learning algorithm for each agent updates the Q-factors using this feedback signal. We show through simulations over VISSIM that our algorithms perform significantly better than the standard fixed signal timing (FST), the saturation balancing (SAT) [14] and the round-robin multi-agent reinforcement learning algorithms [11] over two real road networks.
Keywords :
Markov processes; decentralised control; decision making; greedy algorithms; learning (artificial intelligence); learning systems; multi-agent systems; network theory (graphs); optimal control; road traffic control; E-greedy; FST; MARL algorithm; MDP; Markov decision process; Q-Iearning; Q-factors; SAT; VCB based exploration strategies; VISSIM; cost feedback signal; decentralized learning; decentralized multiagent reinforcement learning algorithm; learning algorithm; phase sequence; road network junctions; round-robin multiagent reinforcement learning algorithms; saturation balancing; standard fixed signal timing; traffic flow; traffic signal control; Approximation algorithms; Delays; Junctions; Q-factor; Roads; Sensors; Vehicles; Q-learning; UCB; VISSIM; multi-agent reinforcement learning; optimal phase sequence; traffic signal control;
Conference_Titel :
Communication Systems and Networks (COMSNETS), 2015 7th International Conference on
Conference_Location :
Bangalore
DOI :
10.1109/COMSNETS.2015.7098712