Abstract
In this book, we introduce the relative optimization approach to performance optimization of continuous-time and continuous-state stochastic systems, or otherwise called stochastic control. The central piece of this approach is the performance-difference formula, from which one can find an improved policy by analyzing only the current policy, and the optimality conditions can be derived. In this chapter, we introduce the main features of the relative optimization by a simple example, and show that the main advantage of this approach compared with dynamic programming is that while the latter provides the local information at a particular time and state, the former provides a global information of the performance comparison on the entire time horizon. This advantage leads to new insights and results, including the state classification, multi-class optimization, explicit conditions for optimal policies at non-smooth value functions for stochastic control, optimal stopping, and singular control, and optimal control of degenerate processes, and the gradient-based optimal control of non-linear and non-additive performance measures, etc. The results extend the famous Hamilton-Jacobi-Bellman (HJB) optimality condition from smooth value functions to semi-smooth value functions, and viscosity solutions are not needed. Some philosophical and historical remarks are also given to help in understanding of the content. The details will be discussed in later chapters.
Scientific developments can always be made logical and rational with sufficient hindsight [1].
Richard Bellman
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See Problem 3.16 for a definition of viscosity solution.
- 2.
This section can be omitted for readers who are not concerned about the comparison of the two approaches.
- 3.
In this formulation, \(\alpha \) can be any entity, such as “moving up” or “moving down” etc. This is different from the notation \(f(t, x, \alpha )\) which requires \(\alpha \) to be a number.
- 4.
- 5.
An irreducible (meaning any state can reach any other state in a finite number of transitions), aperiodic, finite Markov chain is ergodic [21].
- 6.
For the black-white version, just ignore the line colors and follow the capital letter to identify the lines.
- 7.
- 8.
This subsection may be omitted without affecting the understanding of other parts of this book.
- 9.
A degenerate point x of a diffusion process is a point at which the quadratic variation is zero, i.e., the diffusion term \(\sigma (x)=0\); the process behaves deterministically in its neighborhood.
- 10.
This is a very small system, e.g., in a three-queue system (e.g., a bank with three tellers) each with a finite buffer of size 5, the number of states is \(5^3=125\).
References
Dreyfus SE (2002) Richard Bellman on the birth of dynamic programming. Oper Res 50:48–51
Cao XR (2007) Stochastic learning and optimization - a sensitivity-based approach. Springer, Berlin
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer, Berlin
Ho YC, Cao XR (1991) Perturbation analysis of discrete-event dynamic systems. Kluwer Academic Publisher, Boston
Cassandras CG, Lafortune S (2008) Introduction to discrete event systems, 2nd edn. Springer, Berlin
Glasserman P (1991) Gradient estimation via perturbation analysis. Kluwer Academic Publishers, Boston
Fu MC, Hu JQ (1997) Conditional Monte Carlo: gradient estimation and optimization applications. Kluwer Academic Publishers, Boston
Bertsekas DP (2007) Dynamic programming and optimal control, vol I and II. Athena Scientific, Belmont, Massachusetts 1995:2001
Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer, Berlin
Kumar PR, Varaiya P (1986) Stochastic systems: estimation, identification, and adaptive control. Prentice Hall, Upper Saddle River
Nisio M (2015) Stochastic control theory - dynamic programming principle, 2nd edn. Springer, Berlin
Borkar VS (1989) Optimal control of diffusion processes, vol 203. Pitman research notes in mathematics series. Longman Scientific and Technical, Harlow
Brockett R (2009) Stochastic control. Lecture notes. Harvard University, Cambridge
Kushner HJ (1977) Probability methods for approximations in stochastic control and for elliptic equations. Academic, New York
Øksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer, Berlin
Soner HM (2003) Stochastic optimal control in finance. Cattedra Galileiana, Scuola Normale, Pisa
Taksar MI (2008) Diffusion optimization models in insurance and finance. Lecture notes. University of Texas, Austin
Yong J, Zhou XY (1999) Stochastic controls - Hamilton systems and HJB equations. Springer, Berlin
Chong EKP, Zak SH (2008) An introduction to optimization, 3rd edn. Wiley, New York
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Bryson AE, Ho YC (1969) Applied optimal control: optimization, estimation, and control. Blaisdell, Waltham
Çinlar E (1975) Introduction to stochastic processes. Prentice Hall, Englewood Cliffs
Cao XR (2015) Optimization of average rewards of time nonhomogeneous Markov chains. IEEE Trans Autom Control 60:1841–1856
Folland GB (1984) Real analysis: modern techniques and their applications. Wiley, New York
Cao XR (2004) The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans Autom Control 49:2129–2142
Cao XR (2003) Semi-Markov decision problems and performance sensitivity analysis. IEEE Trans Autom Control 48:758–769
Ho YC, Cao XR (1983) Perturbation analysis and optimization of queueing networks. J Optim Theory Appl 40:559–582
Wardi Y, Cassandras CG, Cao XR (2018) Perturbation analysis: a framework for data-driven control and optimization of discrete event and hybrid systems. (IFAC) Ann Rev Control 45:267–280
Cao XR (2005) Basic ideas for event-based optimization of Markov systems. Discret Event Dyn Syst: Theory Appl 15:169–197
Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350
Cao XR, Wan YW (1998) Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Trans Control Syst Technol 6:482–494
Li YJ, Cao F, Cao XR (2010) On-line policy gradient estimation with multi-step sampling. Discret Event Dyn Syst: Theory Appl 20:3–17
Marbach P, Tsitsiklis TN (2001) Simulation-based optimization of Markov reward processes. IEEE Trans Autom Control 46:191–209
Cao XR (2005) A basic formula for on-line policy-gradient algorithms. IEEE Trans Autom Control 50:696–699
Cao XR, Wan XW (2017) Sensitivity analysis of nonlinear behavior with distorted probability. Math Financ 27:115–150
Cao XR (1994) Realization probabilities: the dynamics of queueing systems. Springer, New York
Cao XR (1985) Convergence of parameter sensitivity estimates in a stochastic experiment. IEEE Trans Autom Control 30:845–853
Cao XR (1989) Estimates of performance sensitivity of a stochastic system. IEEE Trans Inf Theory 35:1058–1068
Heidelberger P, Cao XR, Zazanis MA, Suri R (1988) Convergence properties of infinitesimal perturbation analysis estimates. Manag Sci 34:1281–1302
Cao XR (2017) Relative time and stochastic control with non-smooth features. IEEE Trans Autom Control 62:837–852
Zhang JY, Cao XR (2009) Continuous-time Markov decision processes with \(n\)th-bias optimality criteria. Automatica 45:1628–1638
Cao XR (2016) State classification of time nonhomogeneous Markov chains and average reward optimization of multi-chains. IEEE Trans Autom Control 61:3001–3015
Cao XR (2020) Foundation of optimization of time nonhomogeneous Markov chains, manuscript
Fang HT, Cao XR (2004) Potential-based online policy iteration algorithms for Markov decision processes. IEEE Trans Autom Control 49:493–505
Zhang KJ, Xu YK, Chen X, Cao XR (2008) Policy iteration based feedback control. Automatica 44:1055–1061
Ho YC, Zhao QC, Jia QS (2007) Ordinal optimization: soft optimization for hard problems. Springer, Berlin
Cao XR, Ren ZY, Bhatnagar S, Fu MC, Marcus SI (2002) A time aggregation approach to Markov decision processes. Automatica 38:929–943
Cao XR, Wang DX, Qiu L (2014) Partially observable Markov decision processes and separation principle. IEEE Trans Autom Control 59:921–937
Cao XR, Zhang JY (2008) Event-based optimization of Markov systems. IEEE Trans Autom Control 53:1076–1082
Xia L, Jia QS, Cao XR (2014) A tutorial on event-based optimization — a new optimization framework. Discret Event Dyn Syst: Theory Appl, Invited paper 24:103–132
Xu YK, Cao XR (2011) Lebesgue-sample-based optimal control problems with time aggregation. IEEE Trans Autom Control 56:1097–1109
Xia L (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73:269–278
Huang YH, Chen X (2019) A sensitivity-based construction approach to variance minimization of Markov decision processes. Asian J Control 21:1166–1178
Cao XR (2017) Stochastic feedback control with one-dimensional degenerate diffusions and non-smooth value functions. IEEE Trans Autom Control 62:6136–6151
Cao XR (2017) Optimality conditions for long-run average rewards with under selectivity and non-smooth features. IEEE Trans Autom Control 62:4318–4332
Cao XR (2018) Semi-smooth potentials of stochastic systems with degenerate diffusions. IEEE Trans Autom Control 63:3566–3572
Cao XR (2019) State classification and multi-class optimization of continuous-time and continuous-state Markov processes. IEEE Trans Autom Control 64:3632–3646
Cao XR (2020) Stochastic control of multi-dimensional systems with relative optimization. IEEE Trans Autom Control. https://doi.org/10.1109/TAC.2019.2925469
Cao XR, Wang DX, Lu T, Xu YF (2011) Stochastic control via direct comparison. Discret Event Dyn Syst: Theory Appl 21:11–38
Einstein A, Infeld L (1938) The evolution of physics. Cambridge University Press, Cambridge
Ye XS, Xue RB, Gao JJ, Cao XR (2018) Optimization in curbing risk contagion among financial institutes. Automatica 94:214–220
Kushner HJ (2001) Heavy traffic analysis of controlled queueing and communication networks. Springer, Berlin
Cassandras CG, Wardi Y, Melamed B, Sun G, Panayiotou CG (2002) Perturbation analysis for on-line control and optimization of stochastic fluid models. IEEE Trans Autom Control 47(8):1234–1248
Cassandras CG, Wardi Y, Panayiotou CG, Yao C (2010) Perturbation analysis and optimization of stochastic hybrid systems. Eur J Control 16(6), 642–664 (2010)
Zwart AP (2000) A fluid queue with a finite buffer and superexponential input. Ann Probab 32:221–243
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Cao, XR. (2020). Introduction. In: Relative Optimization of Continuous-Time and Continuous-State Stochastic Systems. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-41846-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-41846-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41845-8
Online ISBN: 978-3-030-41846-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)