Skip to main content
Log in

Algorithms for optimization and stabilization of controlled Markov chains

  • Chance As Necessity
  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

This article reviews some recent results by the author on the optimal control of Markov chains. Two common algorithms for the construction of optimal policies are considered: value iteration and policy iteration.

In either case, it is found that the following hold when the algorithm is properly initialized:

  1. (i)

    A stochastic Lyapunov function exists for each intermediate policy, and hence each policy isregular (a strong stability condition).

  2. (ii)

    Intermediate costs converge to the optimal cost.

  3. (iii)

    Any limiting policy is average cost optimal.

The network scheduling problem is considered in some detail as both an illustration of the theory, and because of the strong conclusions which can be reached for this important example as an application of the general theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arapostathis A, Borkar V S, Fernandez-Gaucherand E, Ghosh M K, Marcus S I 1993 Discrete-time controlled Markov processes with average cost criterion: a survey.SIAM J. Control Optim. 31: 282–344.

    Article  MATH  MathSciNet  Google Scholar 

  • Balaji S, Meyn S P 1998 Multiplicative ergodic theorems for an irreducible Markov chain. (submitted)

  • Borkar V S 1991Topics in controlled Markov chains. Pitman Research Notes in Mathematics Series #240 (London: Longman Scientific & Technical)

    MATH  Google Scholar 

  • Borkar V S, Meyn S P 1998 Risk sensitive optimal control: Existence and synthesis for models with unbounded cost.SIAM J. Control Optim. (submitted)

  • Cao X 1998 The relation among potentials, perturbation analysis, and Markov decision process.J. Discrete Event Dyn. Syst. (to appear)

  • Cavazos-Cadena R 1996 Value iteration in a class of communicating Markov decision chains with the average cost criterion. Technical report, Universidad Autónoma Agraria Anonio Narro

  • Chen R-R, Meyn S P 1997 Value iteration and optimization of multiclass queueing networks.Queueing Syst. (to appear)

  • Dai J G 1995 On the positive Harris recurrence for multiclass queueing networks: A unified approach via fluid limit models.Ann. Appl. Probab. 5: 49–77

    Article  MATH  MathSciNet  Google Scholar 

  • Down D, Meyn S P, Tweedie R L 1996 Geometric and uniform ergodicity of Markov Processes.Ann. Probab. 23: 1671–1691

    Article  MathSciNet  Google Scholar 

  • Glynn P W, Meyn S P 1996 A Lyapunov bound for solutions of Poisson’s equation.Ann. Probab. 24

  • Humphrey J, Eng D, Meyn S P 1996 Fluid network models: Linear programs for control and performance bounds. InProceedings of the 13th IFAC World Congress, San Francisco, CA, (eds) J Cruz, J Gertley, M Peshkin, vol. B, pp 19–24

  • Kumar P R, Meyn S P 1996 Duality and linear programs for stability and performance analysis of queueing networks and scheduling policies.IEEE Trans. Autom. Control AC-41: 4–17

    MathSciNet  Google Scholar 

  • Kumar P R, Seidman T I 1990 Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems.IEEE Trans. Autom. Control AC-35: 289–298

    Article  MathSciNet  Google Scholar 

  • Meyn S P, Tweedie R L 1992Generalized resolvents and Harris recurrence of Markov processes. Lecture Notes in Mathematics. Berlin Springer-Verlag

  • Meyn S P, Tweedie R L 1993Markov chains and stochastic stability. (London: Springer-Verlag)

    MATH  Google Scholar 

  • Meyn S P 1997a The policy improvement algorithm for Markov decision processes with general state space.IEEE Trans. Autom. Control AC-42: 191–196

    MathSciNet  Google Scholar 

  • Meyn S P 1997b Stability and optimization of multiclass queueing networks and their fluid models. InProceedings of the summer seminar on “The Mathematics of Stochastic Manufacturing Systems” (Am. Math. Soc.)

  • Nummelin E 1984General irreducible Markov chains and non-negative operators (Cambridge, MA: University Press)

    MATH  Google Scholar 

  • Puterman M L 1994Markov decision processes (New York, Wiley)

    MATH  Google Scholar 

  • Rybko A N, Stolyar A L 1992 On the ergodicity of stochastic processes describing open queueing networks.Probl. Peredachi Inf. 28: 2–26

    MathSciNet  Google Scholar 

  • Weiss G 1995 Optimal draining of a fluid re-entrant line. InStochastic networks IMA volumes in Mathematics and its Applications, (New York: Springer-Verlag) vol. 71, pp 91–103

    Google Scholar 

  • P Whittle 1996Optimisation: Basics and beyond (Chichester: John Wiley and Sons)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean Meyn.

Additional information

Work supported in part by NSF Grant ECS 940372; JSEP grant N00014-90-J- 1270; and a Fulbright Research Fellowship. This work was completed with the assistance of equipment granted through the IBM Shared University Research program and managed by the Computing and Communications Services Office at the University of Illinois at Urbana-Champaign.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meyn, S. Algorithms for optimization and stabilization of controlled Markov chains. Sadhana 24, 339–367 (1999). https://doi.org/10.1007/BF02823147

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02823147

Keywords

Navigation