Reinforcement Learning and the Bayesian Control Rule

Ortega, Pedro Alejandro; Braun, Daniel Alexander; Godsill, Simon

doi:10.1007/978-3-642-22887-2_30

Pedro Alejandro Ortega²²,
Daniel Alexander Braun²² &
Simon Godsill²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6830))

Included in the following conference series:

International Conference on Artificial General Intelligence

2812 Accesses

Abstract

We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intractable planning problem reduces to a simple multi-armed bandit problem, where each lever stands for a potentially arbitrarily complex policy. Furthermore, we use the Bayesian control rule to construct an adaptive bandit player that is universal with respect to a given class of optimal bandit players, thus indirectly constructing an adaptive agent that is universal with respect to a given class of policies.

This research was supported by the European Commission FP7-ICT, “GUIDE—Gentle User Interfaces for Disabled and Elderly citizens”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barto, A., Sutton, R., Anderson, C.: Neuron like elements that can solve difficult learning control problems. IEEE Trans. on Systems, Man and Cybernetics 13 (1983)
Google Scholar
Berry, D.A., Fristedt, B.: Bandit problems: Sequential allocation of experiments. Monographs on Statistics and Applied Probability. Chapman & Hall, London (1985)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Ghavamzadeh, M., Engel, Y.: Bayesian actor-critic algorithms. In: Proc. of the 24th International Conference on Machine Learning (2007)
Google Scholar
Gittins, J.C.: Multi-armed bandit allocation indices. Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd., Chichester (1989)
MATH Google Scholar
Ortega, P.A., Braun, D.A.: A bayesian rule for adaptive control based on causal interventions. In: The Third Conference on Artificial General Intelligence, pp. 121–126. Atlantis Press, Paris (2010)
Google Scholar
Ortega, P.A., Braun, D.A.: A minimum relative entropy principle for learning and acting. Journal of Artificial Intelligence Research 38, 475–511 (2010)
MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK
Pedro Alejandro Ortega, Daniel Alexander Braun & Simon Godsill

Authors

Pedro Alejandro Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Alexander Braun
View author publications
You can also search for this author in PubMed Google Scholar
Simon Godsill
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA), Lugano, Switzerland
Jürgen Schmidhuber
Reykjavik University, CADIA, Menntavegi 1, 101, Reykjavik, Iceland
Kristinn R. Thórisson
Google Research, 1600 Amphitheatre Parkway, Mountain View, 94043, CA, USA
Moshe Looks

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ortega, P.A., Braun, D.A., Godsill, S. (2011). Reinforcement Learning and the Bayesian Control Rule. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds) Artificial General Intelligence. AGI 2011. Lecture Notes in Computer Science(), vol 6830. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22887-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-22887-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22886-5
Online ISBN: 978-3-642-22887-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics