Abstract
In Chapter 2, we introduced the basic principles of PA and derived the performance derivative formulas for queueing networks and Markov and semi-Markov systems with these principles. In Chapter 3, we developed sample-path-based (on-line learning) algorithms for estimating the performance derivatives and sample-path-based optimization schemes. In this chapter, we will show that the performance sensitivity based view leads to a unified approach to both PA and Markov decision processes (MDPs).
One of the principal objects of theoretical research in my department of knowledge is to find the point of view from which the subject appears in its greatest simplicity.
Josiah Willard Gibbs American Scientist (1839 – 1903)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
X. R. Cao, “A Unified Approach to Markov Decision Problems and Performance Sensitivity Analysis,” Automatica, Vol. 36, 771-774, 2000.
X. R. Cao and X. P. Guo, “A Unified Approach to Markov Decision Problems and Sensitivity Analysis with Discounted and Average Criteria: Multichain Case,” Automatica, Vol. 40, 1749-1759, 2004.
X. R. Cao and J. Y. Zhang, “The nth-Order Bias Optimality for Multi-chain Markov Decision Processes,” IEEE Transactions on Automatic Control, submitted.
D. P. Bertsekas, Dynamic Programming and Optimal Control, Volumes I and II. Athena Scientific, Belmont, Massachusetts, 1995, 2001, 2007.
D. P. Bertsekas and T. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.
M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, 1994.
H. S. Chang, M. C. Fu, J. Hu and S. I. Marcus, Simulation-Based Algorithms for Markov Decision Processes, Springer, New York, 2007.
H. S. Chang, H. G. Lee, M. C. Fu, and S. I. Marcus, “Evolutionary Policy Iteration for Solving Markov Decision Processes,” IEEE Transactions on Automatic Control, Vol. 50, 1804–1808, 2005.
J. Q. Hu, M. C. Fu, V. R. Ramezani, and S. I. Marcus, “An Evolutionary Random Search Algorithm for Solving Markov Decision Processes,” INFORMS Journal on Computing, to appear, 2006.
M. E. Lewis and M. L. Puterman, “A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes,” IEEE Transactions on Automatic Control, Vol. 46, 96-100, 2001.
M. E. Lewis and M. L. Puterman, “Bias Optimality,” in E. A. Feinberg and A. Shwartz (eds.), The Handbook of Markov Decision Processes: Methods and Applications, Kluwer Academic Publishers, Boston, 89-111, 2002.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Cao, XR. (2007). Markov Decision Processes. In: Stochastic Learning and Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69082-7_4
Download citation
DOI: https://doi.org/10.1007/978-0-387-69082-7_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-36787-3
Online ISBN: 978-0-387-69082-7
eBook Packages: Computer ScienceComputer Science (R0)