Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

Authors

  • Tim Brys Vrije Universiteit Brussel
  • Ann Nowé Vrije Universiteit Brussel
  • Daniel Kudenko University of York
  • Matthew Taylor Washington State University

DOI:

https://doi.org/10.1609/aaai.v28i1.8998

Keywords:

Reinforcement Learning, Multi-Objectivization, Ensemble Techniques

Abstract

Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multi-objective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective's estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique's decisions, yielding insights into the nature of the problems being solved.

Downloads

Published

2014-06-21

How to Cite

Brys, T., Nowé, A., Kudenko, D., & Taylor, M. (2014). Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). https://doi.org/10.1609/aaai.v28i1.8998

Issue

Section

Main Track: Novel Machine Learning Algorithms