University of Tasmania

File(s) under permanent embargo

A reinforcement learning-based vehicle platoon control strategy for reducing energy consumption in traffic oscillations

journal contribution
posted on 2023-05-20, 23:39 authored by Li, M, Cao, Z, Li, Z
The vehicle platoon will be the most dominant driving mode on future roads. To the best of our knowledge, few reinforcement learning (RL) algorithms have been applied in vehicle platoon control, which has large-scale action and state spaces. Some RL-based methods were applied to solve single-agent problems. If we need to tackle multiagent problems, we will use multiagent RL algorithms since the parameters space grows exponentially with the increasing number of agents involved. Previous multiagent RL algorithms generally may provide redundant information to agents, indicating a large amount of useless or unrelated information, which may cause to be difficult for convergence training and pattern extractions from shared information. Also, random actions usually contribute to crashes, especially at the beginning of training. In this study, a communication proximal policy optimization (CommPPO) algorithm was proposed to tackle the above issues. In specific, the CommPPO model adopts a parameter-sharing structure to allow the dynamic variation of agent numbers, which can well handle various platoon dynamics, including splitting and merging. The communication protocol of the CommPPO consists of two parts. In the state part, the widely used predecessor-leader follower typology in the platoon is adopted to transmit global and local state information to agents. In the reward part, a new reward communication channel is proposed to solve the spurious reward and ``lazy agent'' problems in some existing multiagent RLs. Moreover, a curriculum learning approach is adopted to reduce crashes and speed up training. To validate the proposed strategy for platoon control, two existing multiagent RLs and a traditional platoon control strategy were applied in the same scenarios for comparison. Results showed that the CommPPO algorithm gained more rewards and achieved the largest fuel consumption reduction (11.6%).


Publication title

IEEE Transactions on Neural Networks and Learning Systems






School of Information and Communication Technology


Institute of Electrical and Electronics Engineers

Place of publication

United States

Rights statement

Copyright 2021 IEEE

Repository Status

  • Restricted

Socio-economic Objectives

Artificial intelligence

Usage metrics

    University Of Tasmania


    Ref. manager