Abstract
We discuss the reinforcement learning from an intertemporal choice perspective. Different from previous research, this paper wants to emphasize the importance of deeper understanding the psychological mechanism of human decision-making. In what follows we aim to improve the previous Q learning algorithm according to the new results of intertemporal choice experiments. We start with a brief introduction to new findings of intertemporal choice theory and reinforcement learning. Then we propose a new reinforcement learning algorithm with selective discount (SD-Q). Experiments show that, SD-Q is superior to both the traditional Q learning algorithm and the reinforcement learning method without considering the discount.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Weber, B.J., Chapman, G.B.: The combined effects of risk and time on choice: Does uncertainty eliminate the immediacy effect? Does delay eliminate the certainty effect? Organizational Behavior and Human Decision Processes 96, 104–118 (2005)
Loewenstein, G., Prelec, D.: Preferences for sequences of outcomes. Psychological Review 100, 91–108 (1993)
Allais, M.: Le comportement de l’homme rationel devant le risque: Critique des postulats et axioms del’école americaine (Rational man’s behavior in face of risk: Critique of the American School’s postulates and axioms). Econometrica 21, 503–546 (1953)
Rao, L.-L., Li, S.: New paradoxes in intertemporal choice. Judgment and Decision Making 6(2), 122–129 (2011)
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 260–268 (1998)
Singh, S.P.: Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning 8, 323–339 (1992)
Moriarty, D., Schultz, A., Grefenstette, J.: Evolutionary algorithms for reforcement learning. Journal of Artficial Intelligence Research 11(1), 241–276 (1999)
Watkins, C., Dayan, P.: Q-Learning. Machine Learning 8(3), 279–292 (1992)
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Dietterich, T.G.: The MAXQ method for hierarchical reinforcement learning. In: Fifteenth International Conference on Machine Learning, pp. 118–126. Morgan Kaufmann, San Francisco (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, F., Qin, Z. (2011). SD-Q: Selective Discount Q Learning Based on New Results of Intertemporal Choice Theory. In: Deng, H., Miao, D., Lei, J., Wang, F.L. (eds) Artificial Intelligence and Computational Intelligence. AICI 2011. Lecture Notes in Computer Science(), vol 7003. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23887-1_88
Download citation
DOI: https://doi.org/10.1007/978-3-642-23887-1_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23886-4
Online ISBN: 978-3-642-23887-1
eBook Packages: Computer ScienceComputer Science (R0)