Abstract
An effective way of limiting the diffusion of viruses when vaccines are unavailable or insufficiently potent to eradicate them is through running widespread “test and trace” programmes. Although these have been instrumental during the COVID-19 pandemic, they also lead to significant increases in public spending and societal disruptions caused by the numerous isolation requirements. What is more, after the health measures were relaxed across the world, these programmes were unable to prevent substantial upsurges in infections. Here we propose an alternative approach to conducting pathogen testing and contact tracing that is adaptable to the budgeting requirements and risk tolerances of regional policy makers, while still breaking the high risk transmission chains. To that end, we propose several agents that rank individuals based on the role they possess in their interaction network and the epidemic state over which this diffuses, showing that testing or isolating just the top ranked can achieve adequate levels of containment without incurring the costs associated with standard strategies. Additionally, we extensively compare all the policies we derive, and show that a reinforcement learning actor based on graph neural networks outcompetes the more competitive heuristics by up to 15% in the containment rate, while far surpassing the standard random samplers by margins of 50% or more. Finally, we clearly demonstrate the versatility of the learned policies by appraising the decisions taken by the deep learning agent in different contexts using a diverse set of prediction explanation and state visualization techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
COVID-19 Attitudes Survey by YouGov: https://tinyurl.com/yougov-attitudes.
- 2.
COVID-19 Infection Survey by ONS: https://tinyurl.com/ons-covid19.
References
Abueg, M., et al.: Modeling the combined effect of digital exposure notification and non-pharmaceutical interventions on the COVID-19 epidemic in Washington state. In: medRxiv, p. 2020.08.29.20184135. Cold Spring Harbor Laboratory Press (2020). https://doi.org/10.1101/2020.08.29.20184135
Alon, U., Yahav, E.: On the bottleneck of graph neural networks and its practical implications. In: International Conference on Learning Representations (2022)
Andrews, N., et al.: COVID-19 vaccine effectiveness against the omicron (B.1.1.529) variant. New Engl. J. Med. 386(16), 1532–1546 (2022). https://doi.org/10.1056/NEJMoa2119451
Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers (2021). https://doi.org/10.48550/arXiv.2106.08254
Bastani, H., et al.: Efficient and targeted COVID-19 border testing via reinforcement learning. Nature 599(7883), 108–113 (2021). https://doi.org/10.1038/s41586-021-04014-z, https://www.nature.com/articles/s41586-021-04014-z
Beaini, D., Passaro, S., Létourneau, V., Hamilton, W.L., Corso, G., Liò, P.: Directional Graph Networks (2021). https://doi.org/10.48550/arXiv.2010.02863
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning (2017). https://doi.org/10.48550/arXiv.1611.09940
Bodnar, C., Di Giovanni, F., Chamberlain, B.P., Liò, P., Bronstein, M.M.: Neural sheaf diffusion: a topological perspective on heterophily and oversmoothing in GNNs (2022). https://doi.org/10.48550/arXiv.2202.04579
Braha, D., Bar-Yam, Y.: From centrality to temporary fame: dynamic centrality in complex networks. Complexity 12(2), 59–63 (2006). https://doi.org/10.1002/cplx.20156
Brody, S., Alon, U., Yahav, E.: How attentive are graph attention networks? (2022). https://doi.org/10.48550/arXiv.2105.14491
Bronstein, M.: Deep learning on graphs: successes, challenges, and next steps (2022). https://towardsdatascience.com/deep-learning-on-graphs-successes-challenges-and-next-steps-7d9ec220ba8
Bruxvoort, K.J., et al.: Effectiveness of mRNA-1273 against delta, mu, and other emerging variants of SARS-CoV-2: test negative case-control study. BMJ 375, e068848 (2021). https://doi.org/10.1136/bmj-2021-068848
Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2009, p. 199. ACM Press, Paris (2009). https://doi.org/10.1145/1557019.1557047
Chung, F., Horn, P., Tsiatas, A.: Distributing antidote using PageRank vectors. Internet Math. 6(2), 237–254 (2009). https://doi.org/10.1080/15427951.2009.10129184
Clair, R., Gordon, M., Kroon, M., Reilly, C.: The effects of social isolation on well-being and life satisfaction during pandemic. Humanit. Soc. Sci. Commun. 8(1), 1–6 (2021). https://doi.org/10.1057/s41599-021-00710-3
Cohen, R., Havlin, S., ben-Avraham, D.: Efficient immunization strategies for computer networks and populations. Phys. Rev. Lett. 91(24), 247901 (2003). https://doi.org/10.1103/PhysRevLett.91.247901
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs (2018)
Davis, E.L., et al.: Contact tracing is an imperfect tool for controlling COVID-19 transmission and relies on population adherence. Nat. Commun. 12(1), 5412 (2021). https://doi.org/10.1038/s41467-021-25531-5
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (2019)
Di Domenico, L., Pullano, G., Sabbatini, C.E., Boëlle, P.Y., Colizza, V.: Impact of lockdown on COVID-19 epidemic in Île-de-France and possible exit strategies. BMC Med. 18(1), 240 (2020). https://doi.org/10.1186/s12916-020-01698-4
Dighe, A., et al.: Response to COVID-19 in South Korea and implications for lifting stringent interventions. BMC Med. 18(1), 321 (2020). https://doi.org/10.1186/s12916-020-01791-8
Erdös, P., Rényi, A.: On random graphs I. Publicationes Mathematicae Debrecen 6, 290 (1959)
Farrahi, K., Emonet, R., Cebrian, M.: Epidemic contact tracing via communication traces. PLoS ONE 9(5), e95133 (2014). https://doi.org/10.1371/journal.pone.0095133
Ferdinands, J.M.: Waning 2-Dose and 3-dose effectiveness of mrna vaccines against COVID-19–associated emergency department and urgent care encounters and hospitalizations among adults during periods of delta and omicron variant predominance—VISION network, 10 States, August 2021–January 2022. MMWR Morbidity Mortality Weekly Rep. 71 (2022). https://doi.org/10.15585/mmwr.mm7107e2
Ferguson, N., et al.: Report 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. Technical report, Imperial College London (2020). https://doi.org/10.25561/77482
Ferretti, L., et al.: Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 368 (2020). https://doi.org/10.1126/science.abb6936
Fung, V., Zhang, J., Juarez, E., Sumpter, B.: Benchmarking graph neural networks for materials chemistry. NPJ Comput. Mater. 7, 84 (2021). https://doi.org/10.1038/s41524-021-00554-0
Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977). https://doi.org/10.1021/j100540a008
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry (2017). https://doi.org/10.48550/arXiv.1704.01212
Gori, M., Monfardini, G., Scarselli, F.: A new model for earning in graph domains. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2, pp. 729–734 (2005). https://doi.org/10.1109/IJCNN.2005.1555942
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018)
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. arXiv:1706.02216 [cs, stat] (2018)
He, J., et al.: Deep reinforcement learning with a combinatorial action space for predicting popular reddit threads. In: EMNLP (2019)
Henley, J.: COVID surges across Europe as experts warn not let guard down. The Guardian (2022). https://www.theguardian.com/world/2022/jun/21/covid-surges-europe-ba4-ba5-cases
Hinch, R., et al.: Effective configurations of a digital contact tracing app: a report to NHSX. Technical report (2020)
Hoang, N., Maehara, T.: Revisiting graph neural networks: all we have is low-pass filters. arXiv:1905.09550 [cs, math, stat] (2019)
Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983). https://doi.org/10.1016/0378-8733(83)90021-7
Holme, P., Kim, B.J.: Growing scale-free networks with tunable clustering. Phys. Rev. E 65(2), 026107 (2002). https://doi.org/10.1103/PhysRevE.65.026107
Huang, Q., Yamada, M., Tian, Y., Singh, D., Yin, D., Chang, Y.: GraphLIME: local interpretable model explanations for graph neural networks (2020). https://doi.org/10.48550/arXiv.2001.06216
Huerta, R., Tsimring, L.S.: Contact tracing and epidemics control in social networks. Phys. Rev. E 66(5), 056115 (2002). https://doi.org/10.1103/PhysRevE.66.056115
Jhun, B.: Effective vaccination strategy using graph neural network ansatz (2021). https://doi.org/10.48550/arXiv.2111.00920
Joffe, A.R.: COVID-19: rethinking the lockdown groupthink. Front. Public Health 9 (2021)
Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem (2019)
Kapoor, A., et al.: Examining COVID-19 forecasting using spatio-temporal graph neural networks. arXiv:2007.03113 [cs] (2020)
Kermack, W.O., McKendrick, A.G., Walker, G.T.: A contribution to the mathematical theory of epidemics. Proc. Roy. Soc. London Ser. A Containing Pap. Math. Phys. Character 115(772), 700–721 (1927). https://doi.org/10.1098/rspa.1927.0118
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. ACM Comput. Surv. 3505244 (2022). https://doi.org/10.1145/3505244
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017). https://doi.org/10.48550/arXiv.1412.6980
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. Conference Track Proceedings (2017). OpenReview.net
Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 23(6), 4909–4926 (2022). https://doi.org/10.1109/TITS.2021.3054625
Kobayashi, T.: Adaptive and multiple time-scale eligibility traces for online deep reinforcement learning. Robot. Auton. Syst. 151, 104019 (2022). https://doi.org/10.1016/j.robot.2021.104019
Kojaku, S., Hébert-Dufresne, L., Mones, E., Lehmann, S., Ahn, Y.Y.: The effectiveness of backward contact tracing in networks. Nat. Phys. 17(5), 652–658 (2021). https://doi.org/10.1038/s41567-021-01187-2
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, vol. 12. MIT Press (1999)
Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! (2019). https://doi.org/10.48550/arXiv.1803.08475
Lazaridis, A., Fachantidis, A., Vlahavas, I.: Deep reinforcement learning: a state-of-the-art walkthrough. J. Artif. Intell. Res. 69, 1421–1471 (2020). https://doi.org/10.1613/jair.1.12412
Leung, K., Wu, J.T.: Managing waning vaccine protection against SARS-CoV-2 variants. Lancet 399(10319), 2–3 (2022). https://doi.org/10.1016/S0140-6736(21)02841-5
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. Paper presented at 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico (2016)
Liu, D., Jing, Y., Zhao, J., Wang, W., Song, G.: A fast and efficient algorithm for mining top-k nodes in complex networks. Sci. Rep. 7(1), 43330 (2017). https://doi.org/10.1038/srep43330
Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions (2017). https://doi.org/10.48550/arXiv.1705.07874
Madan, A., Cebrian, M., Moturu, S., Farrahi, K., Pentland, A.S.: Sensing the “health state’’ of a community. IEEE Pervasive Comput. 11(4), 36–45 (2012). https://doi.org/10.1109/MPRV.2011.79
Martinez-Garcia, M., Sansano-Sansano, E., Castillo-Hornero, A., Femenia, R., Roomp, K., Oliver, N.: Social isolation during the COVID-19 pandemic in Spain: a population study (2022). https://doi.org/10.1101/2022.01.22.22269682
Mason, R., Allegretti, A., Devlin, H., Sample, I.: UK treasury pushes to end most free Covid testing despite experts’ warnings. The Guardian (2022)
Masuda, N.: Immunization of networks with community structure. New J. Phys. 11(12), 123018 (2009). https://doi.org/10.1088/1367-2630/11/12/123018
Matrajt, L., Leung, T.: Evaluating the effectiveness of social distancing interventions to delay or flatten the epidemic curve of coronavirus disease. Emerg. Infect. Dis. 26(8), 1740–1748 (2020). https://doi.org/10.3201/eid2608.201093
Mei, J., Xiao, C., Dai, B., Li, L., Szepesvari, C., Schuurmans, D.: Escaping the gravitational pull of softmax. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21130–21140. Curran Associates, Inc. (2020)
Meirom, E., Maron, H., Mannor, S., Chechik, G.: Controlling graph dynamics with reinforcement learning and graph neural networks. In: Proceedings of the 38th International Conference on Machine Learning, pp. 7565–7577. PMLR (2021)
Meirom, E., Milling, C., Caramanis, C., Mannor, S., Shakkottai, S., Orda, A.: Localized epidemic detection in networks with overwhelming noise. ACM SIGMETRICS Perform. Eval. Rev. 43(1), 441–442 (2015). https://doi.org/10.1145/2796314.2745883
Mercer, T.R., Salit, M.: Testing at scale during the COVID-19 pandemic. Nat. Rev. Genet. 22(7), 415–426 (2021). https://doi.org/10.1038/s41576-021-00360-w
Miller, J.C., Hyman, J.M.: Effective vaccination strategies for realistic social networks. Phys. A 386(2), 780–785 (2007). https://doi.org/10.1016/j.physa.2007.08.054
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). https://doi.org/10.48550/arXiv.1312.5602
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Morris, C., et al.: Weisfeiler and leman go neural: higher-order graph neural networks. arXiv:1810.02244 [cs, stat] (2020)
Moshiri, N.: The dual-Barabási-Albert model (2018)
Murata, T., Koga, H.: Extended methods for influence maximization in dynamic networks. Comput. Soc. Netw. 5(1), 1–21 (2018). https://doi.org/10.1186/s40649-018-0056-8
Oono, K., Suzuki, T.: Graph neural networks exponentially lose expressive power for node classification. arXiv:1905.10947 [cs, stat] (2021)
Panagopoulos, G., Nikolentzos, G., Vazirgiannis, M.: Transfer graph neural networks for pandemic forecasting. arXiv:2009.08388 [cs, stat] (2021)
Pandit, J.A., Radin, J.M., Quer, G., Topol, E.J.: Smartphone apps in the COVID-19 pandemic. Nat. Biotechnol. 40(7), 1013–1022 (2022). https://doi.org/10.1038/s41587-022-01350-x
Preciado, V.M., Zargham, M., Enyioha, C., Jadbabaie, A., Pappas, G.J.: Optimal resource allocation for network protection against spreading processes. IEEE Trans. Control Netw. Syst. 1(1), 99–108 (2014). https://doi.org/10.1109/TCNS.2014.2310911
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, USA (1994)
Rayner, D.C., Sturtevant, N.R., Bowling, M.: Subset selection of search heuristics. In: IJCAI (2019)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier (2016). https://doi.org/10.48550/arXiv.1602.04938
Rimmer, A.: Sixty seconds on . . . the pingdemic. BMJ 374, n1822 (2021). https://doi.org/10.1136/bmj.n1822
Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166 (1994)
Rusu, A., Farrahi, K., Emonet, R.: Modelling digital and manual contact tracing for COVID-19 Are low uptakes and missed contacts deal-breakers? Preprint. Epidemiology (2021). https://doi.org/10.1101/2021.04.29.21256307
Salathé, M., Jones, J.H.: Dynamics and control of diseases in networks with community structure. PLOS Comput. Biol. 6(4), e1000736 (2010). https://doi.org/10.1371/journal.pcbi.1000736, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000736
Sato, R., Yamada, M., Kashima, H.: Random features strengthen graph neural networks (2021). https://doi.org/10.48550/arXiv.2002.03155
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009). https://doi.org/10.1109/TNN.2008.2005605
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation (2018). https://doi.org/10.48550/arXiv.1506.02438
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 [cs] (2017)
Serafino, M., et al.: Digital contact tracing and network theory to stop the spread of COVID-19 using big-data on human mobility geolocalization. PLOS Comput. Biol. 18(4), e1009865 (2022). https://doi.org/10.1371/journal.pcbi.1009865
Shah, C., et al.: Finding patient zero: learning contagion source with graph neural networks (2020)
Sigal, A.: Milder disease with Omicron: is it the virus or the pre-existing immunity? Nat. Rev. Immunol. 22(2), 69–71 (2022). https://doi.org/10.1038/s41577-022-00678-4
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961
Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017). https://doi.org/10.48550/arXiv.1712.01815
Smith, J.: Demand for Covid vaccines falls amid waning appetite for booster shots. Financial Times (2022). https://www.ft.com/content/9ac9f8fc-1ab3-4cb2-81bf-259ba612f600
Smith, R.L., et al.: Longitudinal assessment of diagnostic test performance over the course of acute SARS-CoV-2 infection. J. Infect. Dis. 224(6), 976–982 (2021). https://doi.org/10.1093/infdis/jiab337
Song, H., et al.: Solving continual combinatorial selection via deep reinforcement learning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 3467–3474 (2019). https://doi.org/10.24963/ijcai.2019/481
Su, Z., Cheshmehzangi, A., McDonnell, D., da Veiga, C.P., Xiang, Y.T.: Mind the “Vaccine Fatigue”. Front. Immunol. 13 (2022)
Sukumar, S.R., Nutaro, J.J.: Agent-based vs. equation-based epidemiological models: a model selection case study. In: 2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom), pp. 74–79 (2012). https://doi.org/10.1109/BioMedCom.2012.19
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988). https://doi.org/10.1007/BF00115009
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning Series, 2nd edn. The MIT Press, Cambridge (2018)
Tian, S., Mo, S., Wang, L., Peng, Z.: Deep reinforcement learning-based approach to tackle topic-aware influence maximization. Data Sci. Eng. 5(1), 1–11 (2020). https://doi.org/10.1007/s41019-020-00117-1
Tomy, A., Razzanelli, M., Di Lauro, F., Rus, D., Della Santina, C.: Estimating the state of epidemics spreading with graph neural networks. Nonlinear Dyn. 109(1), 249–263 (2022). https://doi.org/10.1007/s11071-021-07160-1
van Hasselt, H., Madjiheurem, S., Hessel, M., Silver, D., Barreto, A., Borsa, D.: Expected eligibility traces. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 9997–10005 (2021). https://doi.org/10.1609/aaai.v35i11.17200
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. arXiv:1710.10903 [cs, stat] (2018)
Watkins, C.: Learning from delayed rewards (1989)
Wymant, C., et al.: The epidemiological impact of the NHS COVID-19 app. Nature 594(7863), 408–412 (2021). https://doi.org/10.1038/s41586-021-03606-z
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv:1810.00826 [Cs, Stat] (2019)
Yamada, M., Jitkrittum, W., Sigal, L., Xing, E.P., Sugiyama, M.: High-dimensional feature selection by feature-wise kernelized lasso. Neural Comput. 26(1), 185–207 (2014). https://doi.org/10.1162/NECO_a_00537
Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020). https://doi.org/10.1016/j.aiopen.2021.01.001
Acknowledgements
We thank Dr Eli Meirom for correspondence clarifying certain aspects from their study. MN was funded by EPSRC grant EP/S000356/1 Artificial and Augmented Intelligence for Automated Scientific Discovery. AR was funded by the EPSRC via a scholarship from the University of Southampton.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Performance Analysis
We compare the mean total elapsed time for running epidemics using each of our testing agents in Table A1. These results corresponds to the wall clock time recorded on an average Windows machine equipped with an Intel i7-7700 CPU, an NVIDIA RTX 3060 GPU and 32 GB of random access memory.
1.2 A.2 Epidemic Modelling
All our epidemic models rely on the SEIR compartmental formulation, but the diffusion process remains bound by the interaction network configuration.
The multi-site mean-field models considered in this work rely on exponential waiting times sampled via Gillespie’s algorithm to obtain subsequent events, with the state transition probabilities defined as follows:
where b is the base transmission rate, \(w_j\) are time-dependent edge weights, e is the exposed state duration, \(\rho \) is the recovery rate, while \(\mathop {}\!\mathcal {4}t\) is a time interval.
In contrast, the agent-based model loops through every node i at every time step, executing the appropriate transition events when one of the normally-distributed samples (\(d_i\) or \(r_i\)) decreases to 0. Concurrently, every edge j is visited to check whether an infection event occurs over that connection, according to the transmission probability defined in Eq 2.
1.3 A.3 Algorithmic Details for the Proximal Policy Optimization
We start by reminding the reader about some general reinforcement learning quantities and relations:
In Eq 3, \(R_t\) is the reward obtained by the agent after taking action \(a_t \sim \pi (a|s_t;\theta _k)\) and transitioning from state \(s_t\) to \(s_{t+1}\). The value of a given state s is approximated using a neural network \(V(s,\theta )\), which, together with \(R_t\) and the discount factor \(\gamma \), determines the TD-error \(\delta ^\gamma \); \(\theta _k\) parameterizes the acting policy, \(\theta _T\) is a delayed state of \(\theta _k\) that parameterizes the regression target in online learning [69], \(r_t^{ORIG}(\theta )\) denotes the ratio between a policy parameterized by a given \(\theta \) and the acting policy, while \(r_t^{SARSA}(\theta )\) represents an alternative formulation for the latter that replaces the numerator with the policy of \(\theta \) evaluated at the next state \(s_{t+1}\). Finally, the \(\hat{A}_t^{(\gamma ,\lambda )}\) terms represent different forms of the advantage function, as given by [87], with the special cases \(\lambda =0\), when the advantage is equal to the TD-error, and \(\lambda =1\), when the minuend of the RHS equation is the discounted return of the episode, \(G_t\).
We rewrite the Proximal Policy Optimization (PPO) equations in terms of the quantities above in Eq 4, where \(\mathcal {E}\), \(c_1\) and \(c_2\) are hyperparameters, clip(.) is a function that clips its argument to the specified range, transform(.) is a function that modifies the gradient descent update according to a specific optimizer (e.g. Adam [47]), while \(\mathcal {H}_t(\theta )\) is an entropy regularizer [31]. In contrast to the original formulation, Eq 5 describes our proposed modification of PPO to allow for optimizing the objective in a memory-efficient online manner. In particular, we rewrite the loss terms using the one-step advantage function \(\hat{A}_t^{(\gamma ,0)}(a_t;\theta )\), and introduce an intermediary operation that accumulates the gradients of our modified loss using a unified eligibility trace [100], in a similar fashion to the methodology employed by [50], obtaining a backward-view approximation of the generalized advantage estimate \(\hat{A}_t^{(\gamma ,\lambda )}\) in the process [87]. We note that, by setting \(r_t=r_t^{SARSA}\), we can eliminate the requirement of storing \(s_t\) in memory for the subsequent timestamp, while retaining the benefits of ratio clipping. This works well empirically since major shifts between \(s_{t+1}\) and \(s_t\) are not common in our environment. Based on previous work and our own assessment, we set \(\gamma =0.99\), \(\lambda =0.97\), \(\mathcal {E}=0.2\), \(c_1=0.5\), \(c_2=0.01\), and update the target value network every 5 episodes across all our experiments.
1.4 A.4 Supporting Figures
1.5 A.5 Control Framework
The logic behind our epidemic control framework in the continuous-time simulation scenario is outlined in Algorithm 1. The class hierarchy of the agents, together with their logic, can be consulted in Algorithm 2. Refer to Table A2 for details about the variables involved in these.
Rights and permissions
Copyright information
© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Rusu, A.C., Farrahi, K., Niranjan, M. (2023). Flattening the Curve Through Reinforcement Learning Driven Test and Trace Policies. In: Tsanas, A., Triantafyllidis, A. (eds) Pervasive Computing Technologies for Healthcare. PH 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-031-34586-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-34586-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34585-2
Online ISBN: 978-3-031-34586-9
eBook Packages: Computer ScienceComputer Science (R0)