Article PDF
References
Barto, A.G. Bradtke, S.J. & Singh, S.P. (1991).Real-time learning and control using asynchronous dynamic programming (Technical Report 91-57). Amherst, MA: University of Massachusetts, Computer Science Department.
Barto, A.G. & Sutton, R.S. (1981). Landmark learning: An illustration of associative search.Biological Cybernetics, 42 1–8.
Barto, A.G., Sutton, R.S. & Anderson, C.W. (1983). Neuronlike elements that can solve difficult learning control problems.IEEE Trans. on Systems, Man, and Cybernetics, SMC-13 834–846.
Barto, A.G., Sutton, R.S. & Brouwer, P.S. (1981). Associative search network: A reinforcement learning associative memory.Biological Cybernetics, 40 201–211.
Booker, L.B. (1988). Classifier systems that learn world models.Machine Learning, 3 161–192.
Grefenstette, J.J., Ramsey, C.L. & Schultz, A.C. (1990). Learning sequential decision rules using simulation models and competition.Machine Learning, 5 355–382.
Hampson, S.E. (1983).A neural model of adaptive behavior. Ph.D. dissertation, Dept. of Information and Computer Science, Univ. of Calif., Irvine (Technical Report #213). A revised edition appeared asConnectionist Problem Solving, Boston: Birkhäuser, 1990.
Holland, J.H. (1975).Adaptation in natural and artificial systems. Ann Arbor, MI: Univ. of Michigan Press.
Holland, J.H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.),Machine learning, An artificial intelligence approach, Volume II 593–623, Los Altos, CA: Morgan Kaufman.
Kaelbling, L.P. (1990).Learning in embedded systems. Ph.D. dissertation, Computer Science Dept, Stanford University.
Mahadevan, S. & Connell, J. (1990). Automatic programming of behavior-based robots using reinforcement learning. IBM technical report. To appear inArtificial Intelligence.
Minsky, M.L. (1961). Steps toward artificial intelligence.Proceedings IRE, 49 8–30. Reprinted in E.A. Feigenbaum & J. Feldman (Eds.),Computers and Thought, 406–450, New York: McGraw-Hill, 1963.
Narendra, K.S. & Thathachar, M.A.L. (1974). Learning automata—a survey.IEEE Transactions on Systems, Man, and Cybernetics, 4 323–334. (Or see their textbook,Learning Automata: An Introduction, Englewood Cliffs, NJ: Prentice Hall, 1989.)
Samuel, A.L. (1959). Some studies in machine learning using the game of checkers.IBM Journal of Research and Development, 3 210–229. Reprinted in E.A. Feigenbaum & J. Feldman (Eds.),Computers and Thought, 71–105, New York: McGraw-Hill, 1963.
Waltz, M.D. & Fu, K.S. (1965). A heuristic approach to reinforcement learning control systems.IEEE Transactions on Automatic Control, AC-10 390–398.
Watkins, C.J.C.H. (1989).Learning with delayed rewards. Ph.D. dissertation, Psychology Department, Cambridge University.
Werbos, P.J. (1987). Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research.IEEE Transactions on Systems, Man and Cybernetics, Jan–Feb.
Whitehead, S.D. & Ballard, D.H. (1991). Learning to perceive and act by trial and error.Machine Learning, 7 45–84.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sutton, R.S. Introduction: The challenge of reinforcement learning. Mach Learn 8, 225–227 (1992). https://doi.org/10.1007/BF00992695
Issue Date:
DOI: https://doi.org/10.1007/BF00992695