Skip to main content
Log in

A survey of algorithmic methods for partially observed Markov decision processes

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

A partially observed Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. The significant applied potential for such processes remains largely unrealized, due to an historical lack of tractable solution methodologies. This paper reviews some of the current algorithmic alternatives for solving discrete-time, finite POMDPs over both finite and infinite horizons. The major impediment to exact solution is that, even with a finite set of internal system states, the set of possible information states is uncountably infinite. Finite algorithms are theoretically available for exact solution of the finite horizon problem, but these are computationally intractable for even modest-sized problems. Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. M. Aoki, Optimal control of partially observable Markovian systems, J. Frankl. I 280 (1965) 367–386.

    Google Scholar 

  2. K.J. Astrom, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10 (1965) 174–205.

    Google Scholar 

  3. J. Bean and R. Smith, Conditions for the existence of planning horizons, Math. Oper. Res. 9 (1984) 391–401.

    Google Scholar 

  4. D. Bertsekas,Dynamic Programming and Stochastic Control Academic Press, 1976).

  5. D. Blackwell, Discounted dynamic programming, Ann. Math. Stat. 36 (1965) 226–235.

    Google Scholar 

  6. H. Cheng, Algorithms for partially observed Markov decision processes, Ph.D. dissertation, Faculty of Commerce and Business Administration, University of British Columbia (1988).

  7. M. DeGroot,Optimal Statistical Decisions (McGraw-Hill, New York, 1970).

    Google Scholar 

  8. J. Eagle, The optimal search for a moving target when the search path is constrained, Oper. Res. 32 (1984) 1107–1115.

    Google Scholar 

  9. B.C. Eaves,A Course in Triangulations for Solving Equations with Deformations (Springer, New York, 1984).

    Google Scholar 

  10. J. Eckles, Optimum replacement of stochastically failing systems, Ph.D. thesis, Department of Engineering-Economic Systems, Stanford University, Stanford CA (1966).

    Google Scholar 

  11. A. Federgruen, P. Schweitzer and H. Tijms, Contraction mappings underlying undiscounted Markov decsion processes, J. Math. Anal. Appl. 65 (1978) 711–730.

    Google Scholar 

  12. D. Heyman and M. Sobel,Stochastic Models in Operations Research, vol. 2 (McGraw-Hill, New York, 1984).

    Google Scholar 

  13. R. Howard,Dynamic Probabilistic Systems (Wiley, New York, 1971).

    Google Scholar 

  14. J. Kakalik, Optimum policies for partially observable Markov systems, Tech. report 18, Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA (1965).

    Google Scholar 

  15. W.S. Lovejoy, Some monotonicity results for partially observed Markov decision processes, Oper. Res. 35 (1987) 736–743.

    Google Scholar 

  16. W.S. Lovejoy, Computationally feasible bounds for partially observed Markov decision processes, Research paper 1024, Stanford University Graduate School of Business, Stanford, CA (1989), to appear in Oper. Res.

    Google Scholar 

  17. J. MacQueen, A test for suboptimal actions in Markov decision processes, Oper. Res. 15 (1967) 559–561.

    Google Scholar 

  18. G. Monahan, A survey of partially observable Markov decision processes, Manag. Sci. 28 (1982) 1–16.

    Google Scholar 

  19. T. Morton and W. Wecker, Discounting, ergodicity, and convergence for Markov decision processes, Manag. Sci. 23 (1977) 890–900.

    Google Scholar 

  20. S. Mukherjee, N. Shahabuddin and K. Seth, Optimal control policies for partially observable Markov processes — A corrected and improved algorithm, unpublished manuscript, Indian Institute of Technology, Delhi, India (1985).

    Google Scholar 

  21. L.K. Platzman, Finite-memory estimation and control of finite probabilistic systems, Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA (1977).

    Google Scholar 

  22. L.K. Platzman, Optimal infinite-horizon undiscounted control of finite probabilistic systems, SIAM. J. Control Opt. 18 (1980) 362–380.

    Google Scholar 

  23. L.K. Platzman, A feasible computational approach to infinite-horizon partially observed Markov decision processes, Technical note J-81-2, Georgia Institute of Technology, Atlanta, GA (1981).

    Google Scholar 

  24. R.T. Rockafellar,Convex Analysis (Princeton University Press, Princeton, NJ, 1970).

    Google Scholar 

  25. D. Rosenfield, Markovian deterioration with uncertain information, Oper. Res. 24 (1976) 141–155.

    Google Scholar 

  26. S. Ross, Quality control under Markovian deterioration, Manag. Sci. 17 (1971) 587–596.

    Article  Google Scholar 

  27. K. Sawaki and A. Ichikawa, Optimal control for partially observable Markov decision processes over an infinite horizon, J. Oper. Res. Soc. Japan 21 (1978) 1–15.

    Google Scholar 

  28. K. Sawaki, Piecewise linear Markov decision processes with an application to partially observable models, in:Recent Developments in Markov Decision Processes, R. Hartley et al. (eds.) (Academic Press, New York, 1980).

    Google Scholar 

  29. R.D. Smallwood and E.J. Sondik, Optimal control of partially observable processes over the finite horizon, Oper. Res. 21 (1973) 1071–1088.

    Article  Google Scholar 

  30. E.J. Sondik, The optimal control of partially observable Markov processes, Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA (1971).

    Google Scholar 

  31. E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: discounted case, Oper. Res. 26 (1978) 282–304.

    Article  Google Scholar 

  32. E.J. Sondik and R. Mendelssohn, Information seeking in Markov decision processes, Southwest Fisheries Center Administrative Resport H-79-13, Southwest Fisheries Center, National Marine Fisheries Service, NOAA, Honolulu, HI (1979).

    Google Scholar 

  33. A. Wald,Sequential Analysis, (Wiley, London, 1947).

    Google Scholar 

  34. C.C. White, Optimal control limit strategies for a partially observed replacement problem, Int. J. Sys. Sci. 10 (1979) 321–331.

    Google Scholar 

  35. C.C. White, Monotone control laws for noisy, countable-state Markov chains, Eur. J. Oper. Res. 5 (1980) 124–132.

    Google Scholar 

  36. C.C. White and W. Scherer, Solution procedures for partially observed Markov decision processes, Oper. Res. 37 (1985) 791–797.

    Google Scholar 

  37. C.C. White and W. Scherer, Finite memory suboptimal design for partially observed Markov decision processes, Technical report, Department of Systems Engineering, University of Virginia, Charlottesville, VA (1989), submitted to Oper. Res.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lovejoy, W.S. A survey of algorithmic methods for partially observed Markov decision processes. Ann Oper Res 28, 47–65 (1991). https://doi.org/10.1007/BF02055574

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02055574

Keywords

Navigation