Skip to main content

On the Asymptotic Behavior of a Constant Stepsize Temporal-Difference Learning Algorithm

  • Conference paper
  • First Online:
Computational Learning Theory (EuroCOLT 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1572))

Included in the following conference series:

Abstract

The mean-square asymptotic behavior of constant stepsize temporal-difference algorithms is analyzed in this paper. The analysis is carried out for the case of a linear (cost-to-go) function approximation and for the case of Markov chains with an uncountable state space. An asymptotic upper bound for the mean-square deviation of the algorithm iterations from the optimal value of the parameter of the (cost-to-go) function approximator achievable by temporal-difference learning is determined as a function of stepsize.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Benveniste, M. Metivier, P. Priouret, Adaptive Algorithms and Stochastic Approximation, Springer Verlag, 1990.

    Google Scholar 

  2. D.P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, 1995.

    Google Scholar 

  3. D.P. Bertsekas, J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

    Google Scholar 

  4. P.D. Dayan, The convergence of TD(λ) for general λ, Machine Learning 8 (1992), pp. 341–362.

    MATH  Google Scholar 

  5. P.D. Dayan, T.J. Sejnowski, TD(λ) converges with probability 1, Machine Learning 14 (1994), pp. 295–301.

    Google Scholar 

  6. T. Jaakola, M.I. Jordan, S.P. Singh, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation 6 (1994), pp. 1185–1201.

    Article  Google Scholar 

  7. P.R. Kumar, P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control, Prentice Hall, 1986.

    Google Scholar 

  8. S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability, Springer Verlag, 1993.

    Google Scholar 

  9. R.S. Sutton, Learning to predict by the methods of temporal-differences, Machine Learning 3 (1988), pp. 9–44.

    Google Scholar 

  10. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.

    Google Scholar 

  11. V. TadiĆ, On the convergence of stochastic iterative algorithms and their applications to machine learning, in preparation.

    Google Scholar 

  12. V. TadiĆ, On the robustness of stochastic iterative algorithms and their applications to machine learning, in preparation.

    Google Scholar 

  13. V. TadiĆ, A stabilization of a class of stochastic iterative algorithms and its application to machine learning, in preparation.

    Google Scholar 

  14. V. TadiĆ, Almost sure exponential convergence of constant stepsize temporal-difference learning algorithms, in preparation.

    Google Scholar 

  15. J.N. Tsitsiklis, B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control 42 (1997), pp. 674–690.

    Article  MATH  Google Scholar 

  16. J.N. Tsitsiklis, B. Van Roy, Feature-based methods for large scale dynamic programming, Machine Learning 22 (1996), pp. 59–94.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tadić, V. (1999). On the Asymptotic Behavior of a Constant Stepsize Temporal-Difference Learning Algorithm. In: Fischer, P., Simon, H.U. (eds) Computational Learning Theory. EuroCOLT 1999. Lecture Notes in Computer Science(), vol 1572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49097-3_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-49097-3_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65701-9

  • Online ISBN: 978-3-540-49097-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics