A survey of algorithmic methods for partially observed Markov decision processes

Lovejoy, William S.

doi:10.1007/BF02055574

A survey of algorithmic methods for partially observed Markov decision processes

Published: December 1991

Volume 28, pages 47–65, (1991)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

William S. Lovejoy¹

2314 Accesses
322 Citations
1 Altmetric
Explore all metrics

Abstract

A partially observed Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. The significant applied potential for such processes remains largely unrealized, due to an historical lack of tractable solution methodologies. This paper reviews some of the current algorithmic alternatives for solving discrete-time, finite POMDPs over both finite and infinite horizons. The major impediment to exact solution is that, even with a finite set of internal system states, the set of possible information states is uncountably infinite. Finite algorithms are theoretically available for exact solution of the finite horizon problem, but these are computationally intractable for even modest-sized problems. Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

M. Aoki, Optimal control of partially observable Markovian systems, J. Frankl. I 280 (1965) 367–386.
Google Scholar
K.J. Astrom, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10 (1965) 174–205.
Google Scholar
J. Bean and R. Smith, Conditions for the existence of planning horizons, Math. Oper. Res. 9 (1984) 391–401.
Google Scholar
D. Bertsekas,Dynamic Programming and Stochastic Control Academic Press, 1976).
D. Blackwell, Discounted dynamic programming, Ann. Math. Stat. 36 (1965) 226–235.
Google Scholar
H. Cheng, Algorithms for partially observed Markov decision processes, Ph.D. dissertation, Faculty of Commerce and Business Administration, University of British Columbia (1988).
M. DeGroot,Optimal Statistical Decisions (McGraw-Hill, New York, 1970).
Google Scholar
J. Eagle, The optimal search for a moving target when the search path is constrained, Oper. Res. 32 (1984) 1107–1115.
Google Scholar
B.C. Eaves,A Course in Triangulations for Solving Equations with Deformations (Springer, New York, 1984).
Google Scholar
J. Eckles, Optimum replacement of stochastically failing systems, Ph.D. thesis, Department of Engineering-Economic Systems, Stanford University, Stanford CA (1966).
Google Scholar
A. Federgruen, P. Schweitzer and H. Tijms, Contraction mappings underlying undiscounted Markov decsion processes, J. Math. Anal. Appl. 65 (1978) 711–730.
Google Scholar
D. Heyman and M. Sobel,Stochastic Models in Operations Research, vol. 2 (McGraw-Hill, New York, 1984).
Google Scholar
R. Howard,Dynamic Probabilistic Systems (Wiley, New York, 1971).
Google Scholar
J. Kakalik, Optimum policies for partially observable Markov systems, Tech. report 18, Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA (1965).
Google Scholar
W.S. Lovejoy, Some monotonicity results for partially observed Markov decision processes, Oper. Res. 35 (1987) 736–743.
Google Scholar
W.S. Lovejoy, Computationally feasible bounds for partially observed Markov decision processes, Research paper 1024, Stanford University Graduate School of Business, Stanford, CA (1989), to appear in Oper. Res.
Google Scholar
J. MacQueen, A test for suboptimal actions in Markov decision processes, Oper. Res. 15 (1967) 559–561.
Google Scholar
G. Monahan, A survey of partially observable Markov decision processes, Manag. Sci. 28 (1982) 1–16.
Google Scholar
T. Morton and W. Wecker, Discounting, ergodicity, and convergence for Markov decision processes, Manag. Sci. 23 (1977) 890–900.
Google Scholar
S. Mukherjee, N. Shahabuddin and K. Seth, Optimal control policies for partially observable Markov processes — A corrected and improved algorithm, unpublished manuscript, Indian Institute of Technology, Delhi, India (1985).
Google Scholar
L.K. Platzman, Finite-memory estimation and control of finite probabilistic systems, Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA (1977).
Google Scholar
L.K. Platzman, Optimal infinite-horizon undiscounted control of finite probabilistic systems, SIAM. J. Control Opt. 18 (1980) 362–380.
Google Scholar
L.K. Platzman, A feasible computational approach to infinite-horizon partially observed Markov decision processes, Technical note J-81-2, Georgia Institute of Technology, Atlanta, GA (1981).
Google Scholar
R.T. Rockafellar,Convex Analysis (Princeton University Press, Princeton, NJ, 1970).
Google Scholar
D. Rosenfield, Markovian deterioration with uncertain information, Oper. Res. 24 (1976) 141–155.
Google Scholar
S. Ross, Quality control under Markovian deterioration, Manag. Sci. 17 (1971) 587–596.
Article Google Scholar
K. Sawaki and A. Ichikawa, Optimal control for partially observable Markov decision processes over an infinite horizon, J. Oper. Res. Soc. Japan 21 (1978) 1–15.
Google Scholar
K. Sawaki, Piecewise linear Markov decision processes with an application to partially observable models, in:Recent Developments in Markov Decision Processes, R. Hartley et al. (eds.) (Academic Press, New York, 1980).
Google Scholar
R.D. Smallwood and E.J. Sondik, Optimal control of partially observable processes over the finite horizon, Oper. Res. 21 (1973) 1071–1088.
Article Google Scholar
E.J. Sondik, The optimal control of partially observable Markov processes, Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA (1971).
Google Scholar
E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: discounted case, Oper. Res. 26 (1978) 282–304.
Article Google Scholar
E.J. Sondik and R. Mendelssohn, Information seeking in Markov decision processes, Southwest Fisheries Center Administrative Resport H-79-13, Southwest Fisheries Center, National Marine Fisheries Service, NOAA, Honolulu, HI (1979).
Google Scholar
A. Wald,Sequential Analysis, (Wiley, London, 1947).
Google Scholar
C.C. White, Optimal control limit strategies for a partially observed replacement problem, Int. J. Sys. Sci. 10 (1979) 321–331.
Google Scholar
C.C. White, Monotone control laws for noisy, countable-state Markov chains, Eur. J. Oper. Res. 5 (1980) 124–132.
Google Scholar
C.C. White and W. Scherer, Solution procedures for partially observed Markov decision processes, Oper. Res. 37 (1985) 791–797.
Google Scholar
C.C. White and W. Scherer, Finite memory suboptimal design for partially observed Markov decision processes, Technical report, Department of Systems Engineering, University of Virginia, Charlottesville, VA (1989), submitted to Oper. Res.
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Business, Stanford University, 94305-5015, Stanford, CA, USA
William S. Lovejoy

Authors

William S. Lovejoy
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lovejoy, W.S. A survey of algorithmic methods for partially observed Markov decision processes. Ann Oper Res 28, 47–65 (1991). https://doi.org/10.1007/BF02055574

Download citation

Issue Date: December 1991
DOI: https://doi.org/10.1007/BF02055574

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of algorithmic methods for partially observed Markov decision processes

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

A Stochastic Model of Mathematics and Science

A simple introduction to Markov Chain Monte–Carlo sampling

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of algorithmic methods for partially observed Markov decision processes

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

A Stochastic Model of Mathematics and Science

A simple introduction to Markov Chain Monte–Carlo sampling

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation