TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration

Bruno da Silva; Andrew Barto

doi:10.1609/aaai.v26i1.8286

TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration

Authors

Bruno da Silva University of Massachusetts Amherst
Andrew Barto University of Massachusetts Amherst

DOI:

https://doi.org/10.1609/aaai.v26i1.8286

Keywords:

reinforcement learning, exploration, markov process, pac-mdp, control

Abstract

We study the problem of finding efficient exploration policies for the case in which an agent is momentarily not concerned with exploiting, and instead tries to compute a policy for later use. We first formally define the Optimal Exploration Problem as one of sequential sampling and show that its solutions correspond to paths of minimum expected length in the space of policies. We derive a model-free, local linear approximation to such solutions and use it to construct efficient exploration policies. We compare our model-free approach to other exploration techniques, including one with the best known PAC bounds, and show that ours is both based on a well-defined optimization problem and empirically efficient.

Downloads

Published

2021-09-20

How to Cite

da Silva, B., & Barto, A. (2021). TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 886-892. https://doi.org/10.1609/aaai.v26i1.8286

Download Citation

Issue

Vol. 26 No. 1 (2012): Twenty-Sixth AAAI Conference on Artificial Intelligence

Section

AAAI Technical Track: Machine Learning

TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription