Introduction
The availability of large, heterogenous patient datasets has recently enabled the development of data-driven methods for improving healthcare. Many such efforts have combined medical data with machine learning for predictive purposes, using large patient datasets to diagnose diseases [5] or forecast patients’ disease trajectories [21].
Patient data can also be used for prescriptive purposes, by optimizing the type or timing of treatment that patients receive. Because patients’ medical conditions unfold over time, such prescriptive analytics must take into account patients’ transitions among disease states, and how treatment actions in the present can affect future health outcomes.
Markov decision processes(MDPs) are a powerful tool for such modeling. An MDP characterizes a system (such as a patient) that transitions among states over time. State transition probabilities depend both on the current state of the system and the action (such as a treatment) taken in that...
References
Alagoz O, Maillart L, Schaefer A, Roberts M (2004) The optimal timing of living-donor liver transplantation. Manag Sci 50(10):1420–1430
Ayer T, Alagoz O, Stout N (2012) Or forum—a pomdp approach to personalize mammography screening decisions. Oper Res 60(5):1019–1034
Baucum M, Khojandi A, Vasudevan R (2020) Improving deep reinforcement learning with transitional variational autoencoders: A healthcare application. IEEE J Biomed Health Inform 25(6):2273–2280
Baucum M, Khojandi A, Vasudevan R, Ramdhani R (2020) Optimizing patient-specific medication regimen policies using wearable sensors in parkinson’s disease. https://bmjopen.bmj.com/content/bmjopen/7/11/e018374.full.pdf, preprint
Fatima M, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 9(01):1
Feinberg E (2011) Total expected discounted reward MDPs: existence of optimal policies. In: Wiley encyclopedia of operations research and management science. Wiley, Hoboken, NJ
Feinberg E, Shwartz A (2002) Handbook of Markov decision processes: methods and applications. Kluwer Academic Publishers, Boston, MA
Grand-Clément J, Chan C, Goyal V, Chuang E (2021) Interpretable machine learning for resource allocation with application to ventilator triage. ArXiv preprint 2110.10994
Ibrahim R, Kucukyazici B, Verter V, Gendreau M, Blostein M (2016) Designing personalized treatment: an application to anticoagulation therapy. Prod Oper Manag 25(5):902–918
Komorowski M, Celi L, Badawi O, Gordon A, Faisal A (2018) The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med 24(11):1716–1720
Lakkaraju H, Rudin C (2017) Learning cost-effective and interpretable treatment regimes. In: Artificial Intelligence and Statistics. PMLR, pp 166–175
Liu Y, Li S, Li F, Song L, Rehg J (2015) Efficient learning of continuous-time hidden markov models for disease progression. Adv Neural Inf Process Syst 28:3600–3608
Lu M, Shahn Z, Sow D, Doshi-Velez F, Lehman L (2020) Is deep reinforcement learning ready for practical applications in healthcare? A sensitivity analysis of duel-ddqn for hemodynamic management in sepsis patients. In: AMIA Annual Symposium Proceedings, vol 2020. American Medical Informatics Association, p 773
Nemati S, Ghassemi M, Clifford G (2016) Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, pp 2978–2981
Parbhoo S, Bogojeska J, Zazzi M, Roth V, Doshi-Velez F (2017) Combining kernel and model based learning for hiv therapy selection. AMIA Summits Transl Sci Proc 2017:239
Peine A, Hallawa A, Bickenbach J, Dartmann G, Fazlic L, Schmeink A, Ascheid G, Thiemermann C, Schuppert A, Kindle R (2021) Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care. NPJ Digit Med 4(1):1–12
Prasad N, Cheng L, Chivers C, Draugelis M, Engelhardt B (2017) A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. ArXiv preprint 1704.06300
Puterman M (1990) Markov decision processes. Handb Oper Rese Manag Sci 2:331–434
Puterman M (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken, NJ
Raghu A, Komorowski M, Ahmed I, Celi L, Szolovits P, Ghassemi M (2017) Deep reinforcement learning for sepsis treatment. ArXiv preprint 1711.09602
Saranya G, Pravin A (2020) A comprehensive study on disease risk predictions in machine learning. Int J Electr Comput Eng 10(4):4217
Schaefer A, Bailey M, Shechter S, Roberts M (2005) Modeling medical treatment using markov decision processes. In: Operations research and health care. Springer, Boston, MA, pp 593–612
Schell G, Marrero W, Lavieri M, Sussman J, Hayward R (2016) Data-driven markov decision process approximations for personalized hypertension treatment planning. MDM Policy Pract 1(1), https://doi.org/10.1177/2381468316674214
Shechter S, Bailey M, Schaefer A, Roberts M (2008) The optimal time to initiate hiv therapy under ordered health states. Oper Res 56(1):20–33
Steimle L, Denton B (2017) Markov decision processes for screening and treatment of chronic diseases. In: Markov decision processes in practice. Springer, Cham, Switzerland, pp 189–222
Tang S, Modi A, Sjoding M, Wiens J (2020) Clinician-in-the-loop decision making: Reinforcement learning with near-optimal set-valued policies. In: International Conference on Machine Learning. PMLR, pp 9387–9396
White III C, White D (1989) Markov decision processes. Eur J Oper Res 39(1):1–16
Wiesemann W, Kuhn D, Rustem B (2013) Robust markov decision processes. Math Oper Res 38(1):153–183
Xu H, Mannor S (2010) Distributionally robust markov decision processes. Adv Neural Inf Process Syst 23:2505–2513
Yu C, Liu J, Nemati S, Yin G (2021) Reinforcement learning in healthcare: a survey. ACM Comput Surv (CSUR) 55(1):1–36
Zhou Z, Wang Y, Mamani H, Coffey D (2019) How do tumor cytogenetics inform cancer treatments? Dynamic risk stratification and precision medicine using multi-armed bandits. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3405082, preprint
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this entry
Cite this entry
Baucum, M., Khojandi, A. (2023). Markov Decision Processes: Application to Treatment Planning. In: Pardalos, P.M., Prokopyev, O.A. (eds) Encyclopedia of Optimization. Springer, Cham. https://doi.org/10.1007/978-3-030-54621-2_844-1
Download citation
DOI: https://doi.org/10.1007/978-3-030-54621-2_844-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54621-2
Online ISBN: 978-3-030-54621-2
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering