Introduction

Cao, Xi-Ren

doi:10.1007/978-3-030-41846-5_1

Xi-Ren Cao ORCID: orcid.org/0000-0001-5165-8804^6,7

Part of the book series: Communications and Control Engineering ((CCE))

308 Accesses

Abstract

In this book, we introduce the relative optimization approach to performance optimization of continuous-time and continuous-state stochastic systems, or otherwise called stochastic control. The central piece of this approach is the performance-difference formula, from which one can find an improved policy by analyzing only the current policy, and the optimality conditions can be derived. In this chapter, we introduce the main features of the relative optimization by a simple example, and show that the main advantage of this approach compared with dynamic programming is that while the latter provides the local information at a particular time and state, the former provides a global information of the performance comparison on the entire time horizon. This advantage leads to new insights and results, including the state classification, multi-class optimization, explicit conditions for optimal policies at non-smooth value functions for stochastic control, optimal stopping, and singular control, and optimal control of degenerate processes, and the gradient-based optimal control of non-linear and non-additive performance measures, etc. The results extend the famous Hamilton-Jacobi-Bellman (HJB) optimality condition from smooth value functions to semi-smooth value functions, and viscosity solutions are not needed. Some philosophical and historical remarks are also given to help in understanding of the content. The details will be discussed in later chapters.

Scientific developments can always be made logical and rational with sufficient hindsight [1].

Richard Bellman

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See Problem 3.16 for a definition of viscosity solution.
2.
This section can be omitted for readers who are not concerned about the comparison of the two approaches.
3.
In this formulation, \(\alpha \) can be any entity, such as “moving up” or “moving down” etc. This is different from the notation \(f(t, x, \alpha )\) which requires \(\alpha \) to be a number.
4.
It is called “performance criteria” in [9], or “payoff” in [11].
5.
An irreducible (meaning any state can reach any other state in a finite number of transitions), aperiodic, finite Markov chain is ergodic [21].
6.
For the black-white version, just ignore the line colors and follow the capital letter to identify the lines.
7.
See [2, 24] for a rigorous proof; its counterpart for continuous-time processes is Eq. (2.47) in Chap. 2.
8.
This subsection may be omitted without affecting the understanding of other parts of this book.
9.
A degenerate point x of a diffusion process is a point at which the quadratic variation is zero, i.e., the diffusion term \(\sigma (x)=0\); the process behaves deterministically in its neighborhood.
10.
This is a very small system, e.g., in a three-queue system (e.g., a bank with three tellers) each with a finite buffer of size 5, the number of states is \(5^3=125\).

References

Dreyfus SE (2002) Richard Bellman on the birth of dynamic programming. Oper Res 50:48–51
Article MathSciNet MATH Google Scholar
Cao XR (2007) Stochastic learning and optimization - a sensitivity-based approach. Springer, Berlin
Book MATH Google Scholar
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer, Berlin
Book MATH Google Scholar
Ho YC, Cao XR (1991) Perturbation analysis of discrete-event dynamic systems. Kluwer Academic Publisher, Boston
Book MATH Google Scholar
Cassandras CG, Lafortune S (2008) Introduction to discrete event systems, 2nd edn. Springer, Berlin
Book MATH Google Scholar
Glasserman P (1991) Gradient estimation via perturbation analysis. Kluwer Academic Publishers, Boston
MATH Google Scholar
Fu MC, Hu JQ (1997) Conditional Monte Carlo: gradient estimation and optimization applications. Kluwer Academic Publishers, Boston
Book MATH Google Scholar
Bertsekas DP (2007) Dynamic programming and optimal control, vol I and II. Athena Scientific, Belmont, Massachusetts 1995:2001
Google Scholar
Fleming WH, Soner HM (2006) Controlled Markov processes and viscosity solutions, 2nd edn. Springer, Berlin
MATH Google Scholar
Kumar PR, Varaiya P (1986) Stochastic systems: estimation, identification, and adaptive control. Prentice Hall, Upper Saddle River
MATH Google Scholar
Nisio M (2015) Stochastic control theory - dynamic programming principle, 2nd edn. Springer, Berlin
MATH Google Scholar
Borkar VS (1989) Optimal control of diffusion processes, vol 203. Pitman research notes in mathematics series. Longman Scientific and Technical, Harlow
MATH Google Scholar
Brockett R (2009) Stochastic control. Lecture notes. Harvard University, Cambridge
Google Scholar
Kushner HJ (1977) Probability methods for approximations in stochastic control and for elliptic equations. Academic, New York
MATH Google Scholar
Øksendal B, Sulem A (2007) Applied stochastic control of jump diffusions. Springer, Berlin
Book MATH Google Scholar
Soner HM (2003) Stochastic optimal control in finance. Cattedra Galileiana, Scuola Normale, Pisa
MATH Google Scholar
Taksar MI (2008) Diffusion optimization models in insurance and finance. Lecture notes. University of Texas, Austin
Google Scholar
Yong J, Zhou XY (1999) Stochastic controls - Hamilton systems and HJB equations. Springer, Berlin
MATH Google Scholar
Chong EKP, Zak SH (2008) An introduction to optimization, 3rd edn. Wiley, New York
Book MATH Google Scholar
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Book MATH Google Scholar
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Book MATH Google Scholar
Bryson AE, Ho YC (1969) Applied optimal control: optimization, estimation, and control. Blaisdell, Waltham
Google Scholar
Çinlar E (1975) Introduction to stochastic processes. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Cao XR (2015) Optimization of average rewards of time nonhomogeneous Markov chains. IEEE Trans Autom Control 60:1841–1856
Article MathSciNet MATH Google Scholar
Folland GB (1984) Real analysis: modern techniques and their applications. Wiley, New York
MATH Google Scholar
Cao XR (2004) The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans Autom Control 49:2129–2142
Article MathSciNet MATH Google Scholar
Cao XR (2003) Semi-Markov decision problems and performance sensitivity analysis. IEEE Trans Autom Control 48:758–769
Article MathSciNet MATH Google Scholar
Ho YC, Cao XR (1983) Perturbation analysis and optimization of queueing networks. J Optim Theory Appl 40:559–582
Article MathSciNet MATH Google Scholar
Wardi Y, Cassandras CG, Cao XR (2018) Perturbation analysis: a framework for data-driven control and optimization of discrete event and hybrid systems. (IFAC) Ann Rev Control 45:267–280
Google Scholar
Cao XR (2005) Basic ideas for event-based optimization of Markov systems. Discret Event Dyn Syst: Theory Appl 15:169–197
Google Scholar
Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350
Google Scholar
Cao XR, Wan YW (1998) Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Trans Control Syst Technol 6:482–494
Article Google Scholar
Li YJ, Cao F, Cao XR (2010) On-line policy gradient estimation with multi-step sampling. Discret Event Dyn Syst: Theory Appl 20:3–17
Google Scholar
Marbach P, Tsitsiklis TN (2001) Simulation-based optimization of Markov reward processes. IEEE Trans Autom Control 46:191–209
Article MathSciNet MATH Google Scholar
Cao XR (2005) A basic formula for on-line policy-gradient algorithms. IEEE Trans Autom Control 50:696–699
Article MATH Google Scholar
Cao XR, Wan XW (2017) Sensitivity analysis of nonlinear behavior with distorted probability. Math Financ 27:115–150
Article MathSciNet MATH Google Scholar
Cao XR (1994) Realization probabilities: the dynamics of queueing systems. Springer, New York
Book Google Scholar
Cao XR (1985) Convergence of parameter sensitivity estimates in a stochastic experiment. IEEE Trans Autom Control 30:845–853
Article MathSciNet MATH Google Scholar
Cao XR (1989) Estimates of performance sensitivity of a stochastic system. IEEE Trans Inf Theory 35:1058–1068
Article MathSciNet MATH Google Scholar
Heidelberger P, Cao XR, Zazanis MA, Suri R (1988) Convergence properties of infinitesimal perturbation analysis estimates. Manag Sci 34:1281–1302
Google Scholar
Cao XR (2017) Relative time and stochastic control with non-smooth features. IEEE Trans Autom Control 62:837–852
Article MathSciNet MATH Google Scholar
Zhang JY, Cao XR (2009) Continuous-time Markov decision processes with \(n\)th-bias optimality criteria. Automatica 45:1628–1638
Article MathSciNet MATH Google Scholar
Cao XR (2016) State classification of time nonhomogeneous Markov chains and average reward optimization of multi-chains. IEEE Trans Autom Control 61:3001–3015
Article MathSciNet MATH Google Scholar
Cao XR (2020) Foundation of optimization of time nonhomogeneous Markov chains, manuscript
Google Scholar
Fang HT, Cao XR (2004) Potential-based online policy iteration algorithms for Markov decision processes. IEEE Trans Autom Control 49:493–505
Article MathSciNet MATH Google Scholar
Zhang KJ, Xu YK, Chen X, Cao XR (2008) Policy iteration based feedback control. Automatica 44:1055–1061
Article MathSciNet MATH Google Scholar
Ho YC, Zhao QC, Jia QS (2007) Ordinal optimization: soft optimization for hard problems. Springer, Berlin
Book MATH Google Scholar
Cao XR, Ren ZY, Bhatnagar S, Fu MC, Marcus SI (2002) A time aggregation approach to Markov decision processes. Automatica 38:929–943
Article MathSciNet MATH Google Scholar
Cao XR, Wang DX, Qiu L (2014) Partially observable Markov decision processes and separation principle. IEEE Trans Autom Control 59:921–937
Article MATH Google Scholar
Cao XR, Zhang JY (2008) Event-based optimization of Markov systems. IEEE Trans Autom Control 53:1076–1082
Article MathSciNet MATH Google Scholar
Xia L, Jia QS, Cao XR (2014) A tutorial on event-based optimization — a new optimization framework. Discret Event Dyn Syst: Theory Appl, Invited paper 24:103–132
Google Scholar
Xu YK, Cao XR (2011) Lebesgue-sample-based optimal control problems with time aggregation. IEEE Trans Autom Control 56:1097–1109
Article MATH Google Scholar
Xia L (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73:269–278
Article MathSciNet MATH Google Scholar
Huang YH, Chen X (2019) A sensitivity-based construction approach to variance minimization of Markov decision processes. Asian J Control 21:1166–1178
Article MathSciNet MATH Google Scholar
Cao XR (2017) Stochastic feedback control with one-dimensional degenerate diffusions and non-smooth value functions. IEEE Trans Autom Control 62:6136–6151
Article MATH Google Scholar
Cao XR (2017) Optimality conditions for long-run average rewards with under selectivity and non-smooth features. IEEE Trans Autom Control 62:4318–4332
Article MATH Google Scholar
Cao XR (2018) Semi-smooth potentials of stochastic systems with degenerate diffusions. IEEE Trans Autom Control 63:3566–3572
Article MATH Google Scholar
Cao XR (2019) State classification and multi-class optimization of continuous-time and continuous-state Markov processes. IEEE Trans Autom Control 64:3632–3646
Article MATH Google Scholar
Cao XR (2020) Stochastic control of multi-dimensional systems with relative optimization. IEEE Trans Autom Control. https://doi.org/10.1109/TAC.2019.2925469
Article Google Scholar
Cao XR, Wang DX, Lu T, Xu YF (2011) Stochastic control via direct comparison. Discret Event Dyn Syst: Theory Appl 21:11–38
Google Scholar
Einstein A, Infeld L (1938) The evolution of physics. Cambridge University Press, Cambridge
MATH Google Scholar
Ye XS, Xue RB, Gao JJ, Cao XR (2018) Optimization in curbing risk contagion among financial institutes. Automatica 94:214–220
Article MathSciNet MATH Google Scholar
Kushner HJ (2001) Heavy traffic analysis of controlled queueing and communication networks. Springer, Berlin
Book MATH Google Scholar
Cassandras CG, Wardi Y, Melamed B, Sun G, Panayiotou CG (2002) Perturbation analysis for on-line control and optimization of stochastic fluid models. IEEE Trans Autom Control 47(8):1234–1248
Google Scholar
Cassandras CG, Wardi Y, Panayiotou CG, Yao C (2010) Perturbation analysis and optimization of stochastic hybrid systems. Eur J Control 16(6), 642–664 (2010)
Google Scholar
Zwart AP (2000) A fluid queue with a finite buffer and superexponential input. Ann Probab 32:221–243
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Xi-Ren Cao
Professor Emeritus, Department of Electrical and Computer Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Xi-Ren Cao

Authors

Xi-Ren Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi-Ren Cao .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cao, XR. (2020). Introduction. In: Relative Optimization of Continuous-Time and Continuous-State Stochastic Systems. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-41846-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-41846-5_1
Published: 14 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41845-8
Online ISBN: 978-3-030-41846-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics