Skip to main content

Abstract

In this book, we introduce the relative optimization approach to performance optimization of continuous-time and continuous-state stochastic systems, or otherwise called stochastic control. The central piece of this approach is the performance-difference formula, from which one can find an improved policy by analyzing only the current policy, and the optimality conditions can be derived. In this chapter, we introduce the main features of the relative optimization by a simple example, and show that the main advantage of this approach compared with dynamic programming is that while the latter provides the local information at a particular time and state, the former provides a global information of the performance comparison on the entire time horizon. This advantage leads to new insights and results, including the state classification, multi-class optimization, explicit conditions for optimal policies at non-smooth value functions for stochastic control, optimal stopping, and singular control, and optimal control of degenerate processes, and the gradient-based optimal control of non-linear and non-additive performance measures, etc. The results extend the famous Hamilton-Jacobi-Bellman (HJB) optimality condition from smooth value functions to semi-smooth value functions, and viscosity solutions are not needed. Some philosophical and historical remarks are also given to help in understanding of the content. The details will be discussed in later chapters.

Scientific developments can always be made logical and rational with sufficient hindsight [1].

Richard Bellman

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See Problem 3.16 for a definition of viscosity solution.

  2. 2.

    This section can be omitted for readers who are not concerned about the comparison of the two approaches.

  3. 3.

    In this formulation, \(\alpha \) can be any entity, such as “moving up” or “moving down” etc. This is different from the notation \(f(t, x, \alpha )\) which requires \(\alpha \) to be a number.

  4. 4.

    It is called “performance criteria” in [9], or “payoff” in [11].

  5. 5.

    An irreducible (meaning any state can reach any other state in a finite number of transitions), aperiodic, finite Markov chain is ergodic [21].

  6. 6.

    For the black-white version, just ignore the line colors and follow the capital letter to identify the lines.

  7. 7.

    See [2, 24] for a rigorous proof; its counterpart for continuous-time processes is Eq. (2.47) in Chap. 2.

  8. 8.

    This subsection may be omitted without affecting the understanding of other parts of this book.

  9. 9.

    A degenerate point x of a diffusion process is a point at which the quadratic variation is zero, i.e., the diffusion term \(\sigma (x)=0\); the process behaves deterministically in its neighborhood.

  10. 10.

    This is a very small system, e.g., in a three-queue system (e.g., a bank with three tellers) each with a finite buffer of size 5, the number of states is \(5^3=125\).

References

  1. Dreyfus SE (2002) Richard Bellman on the birth of dynamic programming. Oper Res 50:48–51

    Article  MathSciNet  MATH  Google Scholar 

  2. Cao XR (2007) Stochastic learning and optimization - a sensitivity-based approach. Springer, Berlin

    Book  MATH  Google Scholar 

  3. Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer, Berlin

    Book  MATH  Google Scholar 

  4. Ho YC, Cao XR (1991) Perturbation analysis of discrete-event dynamic systems. Kluwer Academic Publisher, Boston

    Book  MATH  Google Scholar 

  5. Cassandras CG, Lafortune S (2008) Introduction to discrete event systems, 2nd edn. Springer, Berlin

    Book  MATH  Google Scholar 

  6. Glasserman P (1991) Gradient estimation via perturbation analysis. Kluwer Academic Publishers, Boston

    MATH  Google Scholar 

  7. Fu MC, Hu JQ (1997) Conditional Monte Carlo: gradient estimation and optimization applications. Kluwer Academic Publishers, Boston

    Book  MATH  Google Scholar 

  8. Bertsekas DP (2007) Dynamic programming and optimal control, vol I and II. Athena Scientific, Belmont, Massachusetts 1995:2001

    Google Scholar 

  9. Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  10. Kumar PR, Varaiya P (1986) Stochastic systems: estimation, identification, and adaptive control. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  11. Nisio M (2015) Stochastic control theory - dynamic programming principle, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  12. Borkar VS (1989) Optimal control of diffusion processes, vol 203. Pitman research notes in mathematics series. Longman Scientific and Technical, Harlow

    MATH  Google Scholar 

  13. Brockett R (2009) Stochastic control. Lecture notes. Harvard University, Cambridge

    Google Scholar 

  14. Kushner HJ (1977) Probability methods for approximations in stochastic control and for elliptic equations. Academic, New York

    MATH  Google Scholar 

  15. Øksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer, Berlin

    Book  MATH  Google Scholar 

  16. Soner HM (2003) Stochastic optimal control in finance. Cattedra Galileiana, Scuola Normale, Pisa

    MATH  Google Scholar 

  17. Taksar MI (2008) Diffusion optimization models in insurance and finance. Lecture notes. University of Texas, Austin

    Google Scholar 

  18. Yong J, Zhou XY (1999) Stochastic controls - Hamilton systems and HJB equations. Springer, Berlin

    MATH  Google Scholar 

  19. Chong EKP, Zak SH (2008) An introduction to optimization, 3rd edn. Wiley, New York

    Book  MATH  Google Scholar 

  20. Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York

    Book  MATH  Google Scholar 

  21. Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Book  MATH  Google Scholar 

  22. Bryson AE, Ho YC (1969) Applied optimal control: optimization, estimation, and control. Blaisdell, Waltham

    Google Scholar 

  23. Çinlar E (1975) Introduction to stochastic processes. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  24. Cao XR (2015) Optimization of average rewards of time nonhomogeneous Markov chains. IEEE Trans Autom Control 60:1841–1856

    Article  MathSciNet  MATH  Google Scholar 

  25. Folland GB (1984) Real analysis: modern techniques and their applications. Wiley, New York

    MATH  Google Scholar 

  26. Cao XR (2004) The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans Autom Control 49:2129–2142

    Article  MathSciNet  MATH  Google Scholar 

  27. Cao XR (2003) Semi-Markov decision problems and performance sensitivity analysis. IEEE Trans Autom Control 48:758–769

    Article  MathSciNet  MATH  Google Scholar 

  28. Ho YC, Cao XR (1983) Perturbation analysis and optimization of queueing networks. J Optim Theory Appl 40:559–582

    Article  MathSciNet  MATH  Google Scholar 

  29. Wardi Y, Cassandras CG, Cao XR (2018) Perturbation analysis: a framework for data-driven control and optimization of discrete event and hybrid systems. (IFAC) Ann Rev Control 45:267–280

    Google Scholar 

  30. Cao XR (2005) Basic ideas for event-based optimization of Markov systems. Discret Event Dyn Syst: Theory Appl 15:169–197

    Google Scholar 

  31. Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350

    Google Scholar 

  32. Cao XR, Wan YW (1998) Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Trans Control Syst Technol 6:482–494

    Article  Google Scholar 

  33. Li YJ, Cao F, Cao XR (2010) On-line policy gradient estimation with multi-step sampling. Discret Event Dyn Syst: Theory Appl 20:3–17

    Google Scholar 

  34. Marbach P, Tsitsiklis TN (2001) Simulation-based optimization of Markov reward processes. IEEE Trans Autom Control 46:191–209

    Article  MathSciNet  MATH  Google Scholar 

  35. Cao XR (2005) A basic formula for on-line policy-gradient algorithms. IEEE Trans Autom Control 50:696–699

    Article  MATH  Google Scholar 

  36. Cao XR, Wan XW (2017) Sensitivity analysis of nonlinear behavior with distorted probability. Math Financ 27:115–150

    Article  MathSciNet  MATH  Google Scholar 

  37. Cao XR (1994) Realization probabilities: the dynamics of queueing systems. Springer, New York

    Book  Google Scholar 

  38. Cao XR (1985) Convergence of parameter sensitivity estimates in a stochastic experiment. IEEE Trans Autom Control 30:845–853

    Article  MathSciNet  MATH  Google Scholar 

  39. Cao XR (1989) Estimates of performance sensitivity of a stochastic system. IEEE Trans Inf Theory 35:1058–1068

    Article  MathSciNet  MATH  Google Scholar 

  40. Heidelberger P, Cao XR, Zazanis MA, Suri R (1988) Convergence properties of infinitesimal perturbation analysis estimates. Manag Sci 34:1281–1302

    Google Scholar 

  41. Cao XR (2017) Relative time and stochastic control with non-smooth features. IEEE Trans Autom Control 62:837–852

    Article  MathSciNet  MATH  Google Scholar 

  42. Zhang JY, Cao XR (2009) Continuous-time Markov decision processes with \(n\)th-bias optimality criteria. Automatica 45:1628–1638

    Article  MathSciNet  MATH  Google Scholar 

  43. Cao XR (2016) State classification of time nonhomogeneous Markov chains and average reward optimization of multi-chains. IEEE Trans Autom Control 61:3001–3015

    Article  MathSciNet  MATH  Google Scholar 

  44. Cao XR (2020) Foundation of optimization of time nonhomogeneous Markov chains, manuscript

    Google Scholar 

  45. Fang HT, Cao XR (2004) Potential-based online policy iteration algorithms for Markov decision processes. IEEE Trans Autom Control 49:493–505

    Article  MathSciNet  MATH  Google Scholar 

  46. Zhang KJ, Xu YK, Chen X, Cao XR (2008) Policy iteration based feedback control. Automatica 44:1055–1061

    Article  MathSciNet  MATH  Google Scholar 

  47. Ho YC, Zhao QC, Jia QS (2007) Ordinal optimization: soft optimization for hard problems. Springer, Berlin

    Book  MATH  Google Scholar 

  48. Cao XR, Ren ZY, Bhatnagar S, Fu MC, Marcus SI (2002) A time aggregation approach to Markov decision processes. Automatica 38:929–943

    Article  MathSciNet  MATH  Google Scholar 

  49. Cao XR, Wang DX, Qiu L (2014) Partially observable Markov decision processes and separation principle. IEEE Trans Autom Control 59:921–937

    Article  MATH  Google Scholar 

  50. Cao XR, Zhang JY (2008) Event-based optimization of Markov systems. IEEE Trans Autom Control 53:1076–1082

    Article  MathSciNet  MATH  Google Scholar 

  51. Xia L, Jia QS, Cao XR (2014) A tutorial on event-based optimization — a new optimization framework. Discret Event Dyn Syst: Theory Appl, Invited paper 24:103–132

    Google Scholar 

  52. Xu YK, Cao XR (2011) Lebesgue-sample-based optimal control problems with time aggregation. IEEE Trans Autom Control 56:1097–1109

    Article  MATH  Google Scholar 

  53. Xia L (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73:269–278

    Article  MathSciNet  MATH  Google Scholar 

  54. Huang YH, Chen X (2019) A sensitivity-based construction approach to variance minimization of Markov decision processes. Asian J Control 21:1166–1178

    Article  MathSciNet  MATH  Google Scholar 

  55. Cao XR (2017) Stochastic feedback control with one-dimensional degenerate diffusions and non-smooth value functions. IEEE Trans Autom Control 62:6136–6151

    Article  MATH  Google Scholar 

  56. Cao XR (2017) Optimality conditions for long-run average rewards with under selectivity and non-smooth features. IEEE Trans Autom Control 62:4318–4332

    Article  MATH  Google Scholar 

  57. Cao XR (2018) Semi-smooth potentials of stochastic systems with degenerate diffusions. IEEE Trans Autom Control 63:3566–3572

    Article  MATH  Google Scholar 

  58. Cao XR (2019) State classification and multi-class optimization of continuous-time and continuous-state Markov processes. IEEE Trans Autom Control 64:3632–3646

    Article  MATH  Google Scholar 

  59. Cao XR (2020) Stochastic control of multi-dimensional systems with relative optimization. IEEE Trans Autom Control. https://doi.org/10.1109/TAC.2019.2925469

    Article  Google Scholar 

  60. Cao XR, Wang DX, Lu T, Xu YF (2011) Stochastic control via direct comparison. Discret Event Dyn Syst: Theory Appl 21:11–38

    Google Scholar 

  61. Einstein A, Infeld L (1938) The evolution of physics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  62. Ye XS, Xue RB, Gao JJ, Cao XR (2018) Optimization in curbing risk contagion among financial institutes. Automatica 94:214–220

    Article  MathSciNet  MATH  Google Scholar 

  63. Kushner HJ (2001) Heavy traffic analysis of controlled queueing and communication networks. Springer, Berlin

    Book  MATH  Google Scholar 

  64. Cassandras CG, Wardi Y, Melamed B, Sun G, Panayiotou CG (2002) Perturbation analysis for on-line control and optimization of stochastic fluid models. IEEE Trans Autom Control 47(8):1234–1248

    Google Scholar 

  65. Cassandras CG, Wardi Y, Panayiotou CG, Yao C (2010) Perturbation analysis and optimization of stochastic hybrid systems. Eur J Control 16(6), 642–664 (2010)

    Google Scholar 

  66. Zwart AP (2000) A fluid queue with a finite buffer and superexponential input. Ann Probab 32:221–243

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi-Ren Cao .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cao, XR. (2020). Introduction. In: Relative Optimization of Continuous-Time and Continuous-State Stochastic Systems. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-41846-5_1

Download citation

Publish with us

Policies and ethics