Abstract
This paper studies the constrained (nonhomogeneous) continuous-time Markov decision processes on the finite horizon. The performance criterion to be optimized is the expected total reward on the finite horizon, while N constraints are imposed on similar expected costs. Introducing the appropriate notion of the occupation measures for the concerned optimal control problem, we establish the following under some suitable conditions: (a) the class of Markov policies is sufficient; (b) every extreme point of the space of performance vectors is generated by a deterministic Markov policy; and (c) there exists an optimal Markov policy, which is a mixture of no more than \(N+1\) deterministic Markov policies.
Similar content being viewed by others
References
Altman, E.: Constrained Markov Decision Processes. Chapman & Hall, Boca Raton (1999)
Avrachenkov, K., Habachi, O., Piunovskiy, A., Zhang, Y.: Infinite horizon impulsive optimal control with applications to Internet congestion control. Int. J. Control 88, 703–716 (2015)
Baüerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer, Heidelberg (2011)
Bertsekas, D., Nedíc, A., Ozdaglar, A.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003)
Feinberg, E.: Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Oper. Res. 29, 492–524 (2004)
Feinberg, E., Mandava, M., Shiryayev, A.: On solutions of Kolmogorovs equations for nonhomogeneous jump Markov processes. J. Math. Anal. Appl. 411(1), 261–270 (2014)
Feinberg, E., Rothblum, U.: Splitting randomized stationary policies in total-reward Markov decision processes. Math. Oper. Res. 37, 129–153 (2012)
Ghosh, M.K., Saha, S.: Continuous-time controlled jump Markov processes on the finite horizon. In: Optimization, Control, and Applications of Stochastic Systems, pp. 99–109. Birkhäuser, New York (2012)
Guo, X.P., Hernández-Lerma, O.: Continuous-Time Markov Decision Processes. Springer, Berlin (2009)
Guo, X.P., Hernández-Lerma, O.: Constrained continuous-time Markov controlled processes with discounted criteria. Stoch. Anal. Appl. 21, 379–399 (2003)
Guo, X.P., Huang, X.X., Huang, Y.H.: Finite horizon optimality for continuous-time Markov decision processes with unbounded transition rates. Adv. Appl. Probab. 47, 1–24 (2015)
Guo, X.P., Huang, Y.H., Song, X.Y.: Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J. Control Optim. 50, 23–47 (2012)
Guo, X.P., Song, X.Y.: Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann. Appl. Probab. 21, 2016–2049 (2011)
Guo, X.P., Piunovskiy, A.: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36, 105–132 (2011)
Guo, X.P., Vykertas, M., Zhang, Y.: Absorbing continuous-time Markov decision processes with total cost criteria. Adv. Appl. Probab. 45, 490–519 (2013)
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York (1996)
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (1999)
Huang, Y.H.: Finite horizon continuous-time Markov decision processes with mean and variance criteria. Submitted (2015)
Jacod, J.: Multivariate point processes: predictable projection, Radon–Nicodym derivatives, representation of martingales. Z. Wahrscheinlichkeitstheorie und verwandte Gebiete 31, 235–253 (1975)
Kitaev, M.Y., Rykov, V.V.: Controlled Queueing Systems. CRC Press, New York (1995)
Miller, B.L.: Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J. Control 6, 266–280 (1968)
Miller, B., Miller, G., Siemenikhin, K.: Towards the optimal control of Markov chains with constraints. Automatica 46, 1495–1502 (2010)
Piunovskiy, A.B.: Optimal Control of Random Sequences in Problems with Constraints. Kluwer Academic, Dordrecht (1997)
Piunovskiy, A.: A controlled jump discounted model with constraints. Theory Probab. Appl. 42, 51–71 (1998)
Piunovskiy, A., Zhang, Y.: Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optim. 49, 2032–2061 (2011)
Pliska, S.R.: Controlled jump processes. Stoch. Process. Appl. 3, 259–282 (1975)
Prieto-Rumeau, T., Hernández-Lerma, O.: Selected Topics in Continuous-Time Controlled Markov Chains and Markov Games. Imperial College Press, London (2012)
Yushkevich, A.A.: Controlled Markov models with countable state and continuous time. Theory Probab. Appl. 22, 215–235 (1977)
Zhang, L.L., Guo, X.P.: Constrained continuous-time Markov decision processes with average criteria. Math. Methods Oper. Res. 67, 323–340 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Research supported by NSFC and Guangdong Province Key Laboratory of Computational Science at Sun Yat-Sen University. Y. Zhang’s work was carried out with a financial grant from the Research Fund for Coal and Steel of the European Commission, within the INDUSE-2-SAFETY project (Grant No. RFSR-CT-2014-00025).
Rights and permissions
About this article
Cite this article
Guo, X., Huang, Y. & Zhang, Y. Constrained Continuous-Time Markov Decision Processes on the Finite Horizon. Appl Math Optim 75, 317–341 (2017). https://doi.org/10.1007/s00245-016-9352-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-016-9352-6
Keywords
- Continuous-time Markov decision process
- Constrained-optimality
- Finite horizon
- Mixture of N + 1 deterministic Markov policies
- Occupation measure