SD-Q: Selective Discount Q Learning Based on New Results of Intertemporal Choice Theory

Zhao, Fengfei; Qin, Zheng

doi:10.1007/978-3-642-23887-1_88

SD-Q: Selective Discount Q Learning Based on New Results of Intertemporal Choice Theory

Fengfei Zhao²³ &
Zheng Qin²³

Conference paper

2129 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7003))

Abstract

We discuss the reinforcement learning from an intertemporal choice perspective. Different from previous research, this paper wants to emphasize the importance of deeper understanding the psychological mechanism of human decision-making. In what follows we aim to improve the previous Q learning algorithm according to the new results of intertemporal choice experiments. We start with a brief introduction to new findings of intertemporal choice theory and reinforcement learning. Then we propose a new reinforcement learning algorithm with selective discount (SD-Q). Experiments show that, SD-Q is superior to both the traditional Q learning algorithm and the reinforcement learning method without considering the discount.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Weber, B.J., Chapman, G.B.: The combined effects of risk and time on choice: Does uncertainty eliminate the immediacy effect? Does delay eliminate the certainty effect? Organizational Behavior and Human Decision Processes 96, 104–118 (2005)
Article Google Scholar
Loewenstein, G., Prelec, D.: Preferences for sequences of outcomes. Psychological Review 100, 91–108 (1993)
Article Google Scholar
Allais, M.: Le comportement de l’homme rationel devant le risque: Critique des postulats et axioms del’école americaine (Rational man’s behavior in face of risk: Critique of the American School’s postulates and axioms). Econometrica 21, 503–546 (1953)
Article MathSciNet MATH Google Scholar
Rao, L.-L., Li, S.: New paradoxes in intertemporal choice. Judgment and Decision Making 6(2), 122–129 (2011)
Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 260–268 (1998)
Google Scholar
Singh, S.P.: Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning 8, 323–339 (1992)
MATH Google Scholar
Moriarty, D., Schultz, A., Grefenstette, J.: Evolutionary algorithms for reforcement learning. Journal of Artficial Intelligence Research 11(1), 241–276 (1999)
MATH Google Scholar
Watkins, C., Dayan, P.: Q-Learning. Machine Learning 8(3), 279–292 (1992)
MATH Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
MathSciNet MATH Google Scholar
Dietterich, T.G.: The MAXQ method for hierarchical reinforcement learning. In: Fifteenth International Conference on Machine Learning, pp. 118–126. Morgan Kaufmann, San Francisco (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Fengfei Zhao & Zheng Qin

Authors

Fengfei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Qin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business Information Technology, RMIT University, City Campus, 124 La Trobe Street, 3000, Melbourne, VIC, Australia
Hepu Deng
School of Electronics and Information, Tongji University, 201804, Shanghai, China
Duoqian Miao
School of Computer and Information Engineering, Shanghai University of Electric Power, 200090, Shanghai, China
Jingsheng Lei
Department of Business Administration, Caritas Institute of Higher Education, 18 Chui Ling Road, Tseung Kwan O, Hong Kong, China
Fu Lee Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, F., Qin, Z. (2011). SD-Q: Selective Discount Q Learning Based on New Results of Intertemporal Choice Theory. In: Deng, H., Miao, D., Lei, J., Wang, F.L. (eds) Artificial Intelligence and Computational Intelligence. AICI 2011. Lecture Notes in Computer Science(), vol 7003. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23887-1_88

Download citation

DOI: https://doi.org/10.1007/978-3-642-23887-1_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23886-4
Online ISBN: 978-3-642-23887-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics