Skip to main content

A Q-Learning Approach for Investment Decisions

  • Chapter
  • First Online:
Trends in Mathematical Economics

Abstract

This work deals with the application of the Q-learning technique in order to make investment decisions. This implies to give investment recommendations about the convenience of investment on a particular asset. The reinforcement learning system, and particularly Q-learning, allows continuous learning based on decisions proposed by the system itself. This technique has several advantages, like the capability of decision-making independently of the learning stage, capacity of adaptation to the application domain, and a goal-oriented logic. These characteristics are very useful on financial problems. Results of experiments made to evaluate the learning capacity of the method in the mentioned application domain are presented. Decision-making capacity on this domain is also evaluated. As a result, a system based on Q-learning that learns from its own decisions in an investment context is obtained. The system presents some limitations when the space of states is big due to the lack of generalization of the Q-learning variant used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bertsekas, D.P., Yu, H.: Q-Learning and enhanced policy iteration in discounted dynamic programming. Math. Oper. Res. 37 (1), 66–94 (2012)

    Article  Google Scholar 

  • Borodin, A., El-Yaniv, R., Gogan, V.: Can we learn to beat the best stock. J. Artif. Intell. Res. 21, 579–94 (2004)

    Google Scholar 

  • Dorigo, M., Gambardella, L.M.: Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. Evolutionary Computation 1 (1), 53–66 (1997)

    Article  Google Scholar 

  • Fama, E.F.: The behavior of stock-market prices. J. Bus. 38 (1), 34–105 (1965)

    Article  Google Scholar 

  • Fama, E.F.: Efficient capital markets: a review of theory and empirical work. J. Finance 25 (2), 383–417 (1970)

    Article  Google Scholar 

  • Maei, H.R., Szepesvari, C., Bhatnagar, S., Precup, D., Silver, D., Sutton, R.S.: Convergent temporal-difference learning with arbitrary smooth function approximation. In: Advances in Neural Information Processing Systems. 23rd Annual Conference on Neural Information Processing Systems, Vancouver, 7–10 December 2009, pp. 1204–1212. La Jolla (2010)

    Google Scholar 

  • Markowitz, H.M.: Portfolio Selection. Yale University Press, New Haven (1959)

    Google Scholar 

  • Murphy, J.J.: Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications. New York Institute of Finance, New York (1999)

    Google Scholar 

  • Precup, D., Sutton, R.S., Dasgupta, S.: Off-policy Temporal-difference Learning with Function Approximation. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001), June 2001, pp. 417–424. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  • Rafols, E.J., Ring, M.B., Sutton, R.S., Tanner, B.: Using predictive representations to improve generalization in reinforcement learning. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, July 2005, pp. 835–840. Professional Book Center, Denver (2005)

    Google Scholar 

  • Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall/Pearson Education, Upper Saddle River (1995)

    Google Scholar 

  • Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall/Pearson Education, Upper Saddle River (2003)

    Google Scholar 

  • Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson, Boston (2010)

    Google Scholar 

  • Sutton, R.S.: Open theoretical questions in reinforcement learning. In: Computational Learning Theory. Lecture Notes in Computer Science, vol. 1572, pp. 637–638. Springer, Berlin (1999)

    Google Scholar 

  • Sutton, R.S.: Reinforcement learning: past, present and future. In: Simulated Evolution and Learning. Lecture Notes in Computer Science, vol. 1585, pp. 195–197. Springer, Berlin (1999)

    Google Scholar 

  • Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT, Cambridge (2000)

    Google Scholar 

  • Sutton, R.S., McAllester, D. Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in Neural Information Processing Systems, 1999 Conference, vol. 12, pp. 1057–1063. MIT, Cambridge (2000)

    Google Scholar 

  • Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings, Twenty-sixth International Conference on Machine Learning, pp. 993–1000. Omnipress, Madison (2009a)

    Google Scholar 

  • Sutton, R.S., Szepesvari, C., Maei, H.R.: A convergent o(n) algorithm for off-policy temporal difference learning with linear function approximation. In: Advances in Neural Information Processing Systems. 22nd Annual Conference on Neural Information Processing Systems, Vancouver, 8–10 December 2008, vol. 21 pp. 1609–1616, Curran, Red Hook (2009b)

    Google Scholar 

  • Van Hasselt, H.: Reinforcement learning in continuous state and action spaces. In: Reinforcement Learning, pp. 207–251. Springer, Berlin (2012)

    Google Scholar 

  • Watkins, C.J.C.H.: Learning from Delayed Rewards. University of Cambridge, Cambridge (1989)

    Google Scholar 

  • Whiteson, S., Tanner, B., Taylor, M., Stone, P.: Protecting against evaluation overfitting in empirical reinforcement learning. In: Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), pp. 120–127. IEEE, Paris (2011)

    Google Scholar 

  • Xu, X., Zuo, L., Huang, Z.: Reinforcement learning algorithms with function approximation: recent advances and applications. Inform. Sci. 261, 1–31 (2014)

    Article  Google Scholar 

  • Yahoo! Finance - Business Finance, Stock Market, Quotes, News. http://finance.yahoo.com/. Accessed 19 December 2011

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martín Varela .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Varela, M., Viera, O., Robledo, F. (2016). A Q-Learning Approach for Investment Decisions. In: Pinto, A., Accinelli Gamba, E., Yannacopoulos, A., Hervés-Beloso, C. (eds) Trends in Mathematical Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-32543-9_18

Download citation

Publish with us

Policies and ethics