Skip to main content

Risk-Sensitive Markov Decision Processes

  • Living reference work entry
  • First Online:
Encyclopedia of Optimization

Abstract

Traditionally, Markov decision processes are Markov processes whose transition law is controlled by a decision maker aiming at a maximization of expected (total, discounted or average) reward. Risk-sensitive Markov decision processes are a generalization of such models, taking into account higher order moments as well by aiming at a maximization of the expected exponential utility of such rewards. We introduce the main ideas for the finite horizon case, interpret the optimization criterion and give some applications highlighting the effect of risk-sensitivity. Further criteria and extensions as well as other definitions of risk-sensitivity are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Arapostathis A, Borkar VS (2021) Linear and dynamic programs for risk-sensitive cost minimization. https://arxiv.org/pdf/2103.07993.pdf, Accessed 17 Aug 2021

  2. Arrow KJ (1971) Essays in the theory of risk bearing. Markham Publishing Company, Chicago

    MATH  Google Scholar 

  3. Asienkiewicz H, Jaśkiewicz A (2017) A note on a new class of recursive utilities in Markov decision processes. Appl Math 44:149–161

    MathSciNet  MATH  Google Scholar 

  4. Atar R, Goswami A, Shwartz A (2013) Risk-sensitive control for the parallel server model. SIAM J Control Optim 51:4363–4286

    Article  MathSciNet  MATH  Google Scholar 

  5. Barz C, Waldmann KH (2007) Risk-sensitive capacity control in revenue management. Math Meth Oper Res 65:565–579

    Article  MathSciNet  MATH  Google Scholar 

  6. Barz C, Waldmann KH (2017) Risk-sensitive decision support for admission control. In: Köppen V, Müller RM (eds) Business intelligence: methods and applications. Dr. Kovac Verlag, Hamburg, pp 165–174

    Google Scholar 

  7. Bäuerle N, Ott J (2011) Markov decision processes with average-value-at-risk criteria. Math Meth Oper Res 74:361-379

    Article  MathSciNet  MATH  Google Scholar 

  8. Bäuerle N, Rieder U (2014) More risk-sensitive Markov decision processes. Math Oper Res 39:105–120

    Article  MathSciNet  MATH  Google Scholar 

  9. Bäuerle N, Rieder U (2015) Partially observable risk-sensitive stopping problems in discrete time. In: Piunovskiy AB (ed) Modern trends of controlled stochastic processes: theory and applications, vol II. Luniver Press, Frome, pp 12–31

    Google Scholar 

  10. Bäuerle N, Rieder U (2017) Zero-sum risk-sensitive stochastic games. Stoch Process Appl 127:622–642

    Article  MathSciNet  MATH  Google Scholar 

  11. Bäuerle N, Rieder U (2017) Partially observable risk-sensitive Markov decision processes. Math Oper Res 42:1180–1196

    Article  MathSciNet  MATH  Google Scholar 

  12. Bensoussan A, Frehse J, Nagai H (1998) Some results on risk-sensitive control with full observation. Appl Math Optim 37:1–41

    Article  MathSciNet  MATH  Google Scholar 

  13. Bielecki T, Pliska SR (2003) Economic properties of the risk sensitive criterion for portfolio management. Rev Account Financ 2:3–17

    Article  Google Scholar 

  14. Bielecki T, Hernández-Hernández D, Pliska SR (1999) Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math Meth Oper Res 50:167–188

    Article  MathSciNet  MATH  Google Scholar 

  15. Boda K, Filar JA (2006) Time consistent dynamic risk measures. Math Meth Oper Res 63:169–186

    Article  MathSciNet  MATH  Google Scholar 

  16. Borkar VS (2002) Q-learning for risk-sensitive control. Math Oper Res 27:294–311

    Article  MathSciNet  MATH  Google Scholar 

  17. Bouakiz M, Kebir Y (1995) Target-level criterion in Markov decision processes. J Optim Theory Appl 86:1–15

    Article  MathSciNet  MATH  Google Scholar 

  18. Bouakiz M, Sobel MJ (1992) Inventory control with an exponential utility criterion. Oper Res 40:603–608

    Article  MathSciNet  MATH  Google Scholar 

  19. Cavazos-Cadena R, Montes-de-Oca R (2003) The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math Oper Res 28:752–776

    Article  MathSciNet  MATH  Google Scholar 

  20. Chen X, Sim M, Simchi-Levi D, Sun P (2007) Risk aversion in inventory management. Oper Res 55:828–842

    Article  MATH  Google Scholar 

  21. Chung KJ, Sobel MJ (1987) Discounted MDP’s: distribution functions and exponential utility maximization. SIAM J Control Optim 25:49–62

    Article  MathSciNet  MATH  Google Scholar 

  22. Dai Pra P, Meneghini L, Runggaldier WJ (1996) Connections between stochastic control and dynamic games. Math of Control Signals Syst 9:303–326

    Article  MathSciNet  MATH  Google Scholar 

  23. Davis MH, Lleo S (2014) Risk-sensitive investment management. World Scientific, Singapore

    Book  MATH  Google Scholar 

  24. De Finetti B (1940) Il probleme dei pieni. G dell’Istituto Italiano degli Attuari 11:1–88

    MATH  Google Scholar 

  25. Di Masi GB, Stettner L (1999) Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J Control Optim 38:61–78

    Article  MathSciNet  MATH  Google Scholar 

  26. Ermon S, Conrad J, Gomes C, Selman B (2011) Risk-sensitive policies for sustainable renewable resource allocation. In: Walsh T (ed) Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI). AAAI Press, Barcelona, pp 1942–1948

    Google Scholar 

  27. Filar JA, Krass D, Ross KW (1995) Percentile performance criteria for limiting average Markov decision processes. IEEE Trans Autom Control 40:2–10

    Article  MathSciNet  MATH  Google Scholar 

  28. Fleming WH, Hernandez-Hernandez D (1997) Risk-sensitive control of finite state machines on an infinite horizon I. SIAM J Control Optim 35:1790–1810

    Article  MathSciNet  MATH  Google Scholar 

  29. Fleming WH, McEneaney WM (1995) Risk-sensitive control on an infinite time horizon. SIAM J Control Optim 33:1881–1915

    Article  MathSciNet  MATH  Google Scholar 

  30. Föllmer H, Schied A (2016) Stochastic finance. de Gruyter, Oldenburg

    Book  MATH  Google Scholar 

  31. Ghosh MK, Saha S (2014) Risk-sensitive control of continuous time Markov chains. Stoch Int J Probab Stoch Process 86:655–675

    Article  MathSciNet  MATH  Google Scholar 

  32. Gönsch J (2017) A survey on risk-averse and robust revenue management. Eur J Oper Res 263: 337–348

    Article  MathSciNet  MATH  Google Scholar 

  33. Hansen LP, Sargent TJ (1995) Discounted linear exponential quadratic gaussian control. IEEE Trans Autom Control 40:968–971

    Article  MathSciNet  MATH  Google Scholar 

  34. Henig MI (1990) Risk criteria in a stochastic knapsack problem. Oper Res 38:820–825

    Article  MathSciNet  Google Scholar 

  35. Hernández-Hernández D, Marcus SI (1996) Risk sensitive control of Markov processes in countable state space. Syst Control Lett 29:147–155

    Article  MathSciNet  MATH  Google Scholar 

  36. Hernández-Hernández D, Marcus SI, Fard PJ (1999) Analysis of a risk-sensitive control problem for hidden Markov chains. IEEE Trans Autom Control 44:1093–1100

    Article  MathSciNet  MATH  Google Scholar 

  37. Hou P, Yeoh W, Varakantham P (2014) Revisiting risk-sensitive MDPs: new algorithms and results. In: Chien S, Do M, Fern A, Ruml W (eds) Proceedings of the International Conference on Automated Planning and Scheduling, vol 24. pp 136–144

    Google Scholar 

  38. Howard R, Matheson J (1972) Risk-sensitive Markov decision processes. Manag Sci 18:356–369

    Article  MathSciNet  MATH  Google Scholar 

  39. Jacobson D (1973) Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans Autom Control 18:124–131

    Article  MathSciNet  MATH  Google Scholar 

  40. James MR, Baras JS, Elliott RJ (1994) Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems. IEEE Trans Autom Control 39:780–792

    Article  MathSciNet  MATH  Google Scholar 

  41. Jaśkiewicz A (2007) Average optimality for risk-sensitive control with general state space. Ann Appl Probab 17:654–675

    Article  MathSciNet  MATH  Google Scholar 

  42. Jiang DR, Powell WB (2018) Risk-averse approximate dynamic programming with quantile-based risk measures. Math Oper Res 43:554–579

    Article  MathSciNet  MATH  Google Scholar 

  43. Kirkwood CW (1997) Notes on the attitude toward risk taking and the exponential utility function. https://www.public.asu.edu/~kirkwood/DAStuff/refs/risk.pdf, Accessed 17 Aug 2021

  44. Kreps DM (1977) Decision problems with expected utility critera, I: upper and lower convergent utility. Math Oper Res 2:45–53

    Article  MathSciNet  MATH  Google Scholar 

  45. Kumar A, Kavitha V, Hemachandra N (2015) Finite horizon risk sensitive MDP and linear programming. In: 2015 54th IEEE Conference on Decision and Control (CDC). IEEE, Osaka, pp 7826–7831

    Google Scholar 

  46. Li D, Ng WL (2000) Optimal dynamic portfolio selection: multiperiod mean-variance formulation. Math Financ 10:387–406

    Article  MathSciNet  MATH  Google Scholar 

  47. Marcus SI, Fernández-Gaucherand E, Hernández-Hernandez D, Coraluppi S, Fard P (1997) Risk sensitive Markov decision processes. In: Byrnes CI, Datta BN, Martin CF, Gilliam DS (eds) Systems and control in the twenty-first century. Birkhäuser, Boston, pp 263–279

    Chapter  Google Scholar 

  48. Markowitz H (1952) Portfolio selection. J Financ 7:77–91

    Google Scholar 

  49. Minami R, da Silva VF (2012) Shortest stochastic path with risk sensitive evaluation. In: Batyrshin I, González Mendoza M (eds) Advances in artificial intelligence. Springer, Berlin/Heidelberg, pp 371–382

    Google Scholar 

  50. Nagai H (1996) Bellman equations of risk-sensitive control. SIAM J Control Optim 34:74–101

    Article  MathSciNet  MATH  Google Scholar 

  51. Porteus EL (1975) On the optimality of structured policies in countable stage decision processes. Manag Sci 22:148–157

    Article  MathSciNet  MATH  Google Scholar 

  52. Schlosser R (2020) Risk-sensitive control of Markov decision processes: a moment-based approach with target distributions. Comput Oper Res 123:1049975. https://doi.org/10.1016/j.cor.2020.104997

    Article  MathSciNet  MATH  Google Scholar 

  53. Sladký K (1976) On dynamic programming recursions for multiplicative Markov decision chains. In: Wets RJB (ed) Stochastic systems: modeling, identification and optimization. Springer, Berlin/Heidelberg, pp 216–226

    Chapter  Google Scholar 

  54. White DJ (1987) Utility, probabilistic constraints, mean and variance of discounted rewards in Markov decision processes. OR Spektrum 9:13–22

    Article  MathSciNet  MATH  Google Scholar 

  55. White DJ (1988) Mean, variance, and probabilistic criteria in finite Markov decision processes: a review. J Optim Theory Appl 56:1–29

    Article  MathSciNet  MATH  Google Scholar 

  56. Whittle P (1990) Risk-sensitive optimal control, vol 2. Wiley, Chichester

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christiane Barz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Barz, C., Bäuerle, N. (2023). Risk-Sensitive Markov Decision Processes. In: Pardalos, P.M., Prokopyev, O.A. (eds) Encyclopedia of Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-54621-2_819-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-54621-2_819-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-54621-2

  • Online ISBN: 978-3-030-54621-2

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics