File(s) under permanent embargo
Concurrent Q-Learning: Reinforcement Learning for Dynamic Goals and Environments
journal contributionposted on 2023-05-16, 17:30 authored by Robert OllingtonRobert Ollington, Vamplew, PW
This article presents a powerful new algorithm for reinforcement learning in problems where the goals and also the environment may change. The algorithm is completely goal independent, allowing the mechanics of the environment to be learned independently of the task that is being undertaken. Conventional reinforcement learning techniques, such as Q-learning, are goal dependent. When the goal or reward conditions change, previous learning interferes with the new task that is being learned, resulting in very poor performance. Previously, the Concurrent Q-Learning algorithm was developed, based on Watkin's Q-learning, which learns the relative proximity of all states simultaneously. This learning is completely independent of the reward experienced at those states and, through a simple action selection strategy, may be applied to any given reward structure. Here it is shown that the extra information obtained may be used to replace the eligibility traces of Watkin's Q-learning, allowing many more value updates to be made at each time step. The new algorithm is compared to the previous version and also to DG-learaing in tasks involving changing goals and environments. The new algorithm is shown to perform significantly better than these alternatives, especially in situations involving novel obstructions. The algorithm adapts quickly and intelligently to changes in both the environment and reward structure, and does not suffer interference from training undertaken prior to those changes.
Publication titleInternational Journal of Intelligent Systems
Department/SchoolSchool of Information and Communication Technology
PublisherJohn Wiley & Sons, Inc.
Place of publicationUnited States
Rights statementThe definitive published version is available online at: http://www3.interscience.wiley.com/