Flattening the Curve Through Reinforcement Learning Driven Test and Trace Policies

Rusu, Andrei C.; Farrahi, Katayoun; Niranjan, Mahesan

doi:10.1007/978-3-031-34586-9_14

Andrei C. Rusu¹⁷,
Katayoun Farrahi¹⁷ &
Mahesan Niranjan¹⁷

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 488))

Included in the following conference series:

International Conference on Pervasive Computing Technologies for Healthcare

467 Accesses

Abstract

An effective way of limiting the diffusion of viruses when vaccines are unavailable or insufficiently potent to eradicate them is through running widespread “test and trace” programmes. Although these have been instrumental during the COVID-19 pandemic, they also lead to significant increases in public spending and societal disruptions caused by the numerous isolation requirements. What is more, after the health measures were relaxed across the world, these programmes were unable to prevent substantial upsurges in infections. Here we propose an alternative approach to conducting pathogen testing and contact tracing that is adaptable to the budgeting requirements and risk tolerances of regional policy makers, while still breaking the high risk transmission chains. To that end, we propose several agents that rank individuals based on the role they possess in their interaction network and the epidemic state over which this diffuses, showing that testing or isolating just the top ranked can achieve adequate levels of containment without incurring the costs associated with standard strategies. Additionally, we extensively compare all the policies we derive, and show that a reinforcement learning actor based on graph neural networks outcompetes the more competitive heuristics by up to 15% in the containment rate, while far surpassing the standard random samplers by margins of 50% or more. Finally, we clearly demonstrate the versatility of the learned policies by appraising the decisions taken by the deep learning agent in different contexts using a diverse set of prediction explanation and state visualization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
COVID-19 Attitudes Survey by YouGov: https://tinyurl.com/yougov-attitudes.
2.
COVID-19 Infection Survey by ONS: https://tinyurl.com/ons-covid19.

References

Abueg, M., et al.: Modeling the combined effect of digital exposure notification and non-pharmaceutical interventions on the COVID-19 epidemic in Washington state. In: medRxiv, p. 2020.08.29.20184135. Cold Spring Harbor Laboratory Press (2020). https://doi.org/10.1101/2020.08.29.20184135
Alon, U., Yahav, E.: On the bottleneck of graph neural networks and its practical implications. In: International Conference on Learning Representations (2022)
Google Scholar
Andrews, N., et al.: COVID-19 vaccine effectiveness against the omicron (B.1.1.529) variant. New Engl. J. Med. 386(16), 1532–1546 (2022). https://doi.org/10.1056/NEJMoa2119451
Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers (2021). https://doi.org/10.48550/arXiv.2106.08254
Bastani, H., et al.: Efficient and targeted COVID-19 border testing via reinforcement learning. Nature 599(7883), 108–113 (2021). https://doi.org/10.1038/s41586-021-04014-z, https://www.nature.com/articles/s41586-021-04014-z
Beaini, D., Passaro, S., Létourneau, V., Hamilton, W.L., Corso, G., Liò, P.: Directional Graph Networks (2021). https://doi.org/10.48550/arXiv.2010.02863
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning (2017). https://doi.org/10.48550/arXiv.1611.09940
Bodnar, C., Di Giovanni, F., Chamberlain, B.P., Liò, P., Bronstein, M.M.: Neural sheaf diffusion: a topological perspective on heterophily and oversmoothing in GNNs (2022). https://doi.org/10.48550/arXiv.2202.04579
Braha, D., Bar-Yam, Y.: From centrality to temporary fame: dynamic centrality in complex networks. Complexity 12(2), 59–63 (2006). https://doi.org/10.1002/cplx.20156
Article Google Scholar
Brody, S., Alon, U., Yahav, E.: How attentive are graph attention networks? (2022). https://doi.org/10.48550/arXiv.2105.14491
Bronstein, M.: Deep learning on graphs: successes, challenges, and next steps (2022). https://towardsdatascience.com/deep-learning-on-graphs-successes-challenges-and-next-steps-7d9ec220ba8
Bruxvoort, K.J., et al.: Effectiveness of mRNA-1273 against delta, mu, and other emerging variants of SARS-CoV-2: test negative case-control study. BMJ 375, e068848 (2021). https://doi.org/10.1136/bmj-2021-068848
Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2009, p. 199. ACM Press, Paris (2009). https://doi.org/10.1145/1557019.1557047
Chung, F., Horn, P., Tsiatas, A.: Distributing antidote using PageRank vectors. Internet Math. 6(2), 237–254 (2009). https://doi.org/10.1080/15427951.2009.10129184
Article Google Scholar
Clair, R., Gordon, M., Kroon, M., Reilly, C.: The effects of social isolation on well-being and life satisfaction during pandemic. Humanit. Soc. Sci. Commun. 8(1), 1–6 (2021). https://doi.org/10.1057/s41599-021-00710-3
Article Google Scholar
Cohen, R., Havlin, S., ben-Avraham, D.: Efficient immunization strategies for computer networks and populations. Phys. Rev. Lett. 91(24), 247901 (2003). https://doi.org/10.1103/PhysRevLett.91.247901
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs (2018)
Google Scholar
Davis, E.L., et al.: Contact tracing is an imperfect tool for controlling COVID-19 transmission and relies on population adherence. Nat. Commun. 12(1), 5412 (2021). https://doi.org/10.1038/s41467-021-25531-5
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (2019)
Di Domenico, L., Pullano, G., Sabbatini, C.E., Boëlle, P.Y., Colizza, V.: Impact of lockdown on COVID-19 epidemic in Île-de-France and possible exit strategies. BMC Med. 18(1), 240 (2020). https://doi.org/10.1186/s12916-020-01698-4
Article Google Scholar
Dighe, A., et al.: Response to COVID-19 in South Korea and implications for lifting stringent interventions. BMC Med. 18(1), 321 (2020). https://doi.org/10.1186/s12916-020-01791-8
Article Google Scholar
Erdös, P., Rényi, A.: On random graphs I. Publicationes Mathematicae Debrecen 6, 290 (1959)
Google Scholar
Farrahi, K., Emonet, R., Cebrian, M.: Epidemic contact tracing via communication traces. PLoS ONE 9(5), e95133 (2014). https://doi.org/10.1371/journal.pone.0095133
Ferdinands, J.M.: Waning 2-Dose and 3-dose effectiveness of mrna vaccines against COVID-19–associated emergency department and urgent care encounters and hospitalizations among adults during periods of delta and omicron variant predominance—VISION network, 10 States, August 2021–January 2022. MMWR Morbidity Mortality Weekly Rep. 71 (2022). https://doi.org/10.15585/mmwr.mm7107e2
Ferguson, N., et al.: Report 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. Technical report, Imperial College London (2020). https://doi.org/10.25561/77482
Ferretti, L., et al.: Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 368 (2020). https://doi.org/10.1126/science.abb6936
Fung, V., Zhang, J., Juarez, E., Sumpter, B.: Benchmarking graph neural networks for materials chemistry. NPJ Comput. Mater. 7, 84 (2021). https://doi.org/10.1038/s41524-021-00554-0
Article Google Scholar
Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977). https://doi.org/10.1021/j100540a008
Article Google Scholar
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry (2017). https://doi.org/10.48550/arXiv.1704.01212
Gori, M., Monfardini, G., Scarselli, F.: A new model for earning in graph domains. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2, pp. 729–734 (2005). https://doi.org/10.1109/IJCNN.2005.1555942
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018)
Google Scholar
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. arXiv:1706.02216 [cs, stat] (2018)
He, J., et al.: Deep reinforcement learning with a combinatorial action space for predicting popular reddit threads. In: EMNLP (2019)
Google Scholar
Henley, J.: COVID surges across Europe as experts warn not let guard down. The Guardian (2022). https://www.theguardian.com/world/2022/jun/21/covid-surges-europe-ba4-ba5-cases
Hinch, R., et al.: Effective configurations of a digital contact tracing app: a report to NHSX. Technical report (2020)
Google Scholar
Hoang, N., Maehara, T.: Revisiting graph neural networks: all we have is low-pass filters. arXiv:1905.09550 [cs, math, stat] (2019)
Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983). https://doi.org/10.1016/0378-8733(83)90021-7
Article Google Scholar
Holme, P., Kim, B.J.: Growing scale-free networks with tunable clustering. Phys. Rev. E 65(2), 026107 (2002). https://doi.org/10.1103/PhysRevE.65.026107
Huang, Q., Yamada, M., Tian, Y., Singh, D., Yin, D., Chang, Y.: GraphLIME: local interpretable model explanations for graph neural networks (2020). https://doi.org/10.48550/arXiv.2001.06216
Huerta, R., Tsimring, L.S.: Contact tracing and epidemics control in social networks. Phys. Rev. E 66(5), 056115 (2002). https://doi.org/10.1103/PhysRevE.66.056115
Jhun, B.: Effective vaccination strategy using graph neural network ansatz (2021). https://doi.org/10.48550/arXiv.2111.00920
Joffe, A.R.: COVID-19: rethinking the lockdown groupthink. Front. Public Health 9 (2021)
Google Scholar
Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem (2019)
Google Scholar
Kapoor, A., et al.: Examining COVID-19 forecasting using spatio-temporal graph neural networks. arXiv:2007.03113 [cs] (2020)
Kermack, W.O., McKendrick, A.G., Walker, G.T.: A contribution to the mathematical theory of epidemics. Proc. Roy. Soc. London Ser. A Containing Pap. Math. Phys. Character 115(772), 700–721 (1927). https://doi.org/10.1098/rspa.1927.0118
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. ACM Comput. Surv. 3505244 (2022). https://doi.org/10.1145/3505244
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017). https://doi.org/10.48550/arXiv.1412.6980
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. Conference Track Proceedings (2017). OpenReview.net
Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 23(6), 4909–4926 (2022). https://doi.org/10.1109/TITS.2021.3054625
Kobayashi, T.: Adaptive and multiple time-scale eligibility traces for online deep reinforcement learning. Robot. Auton. Syst. 151, 104019 (2022). https://doi.org/10.1016/j.robot.2021.104019
Kojaku, S., Hébert-Dufresne, L., Mones, E., Lehmann, S., Ahn, Y.Y.: The effectiveness of backward contact tracing in networks. Nat. Phys. 17(5), 652–658 (2021). https://doi.org/10.1038/s41567-021-01187-2
Article Google Scholar
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, vol. 12. MIT Press (1999)
Google Scholar
Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! (2019). https://doi.org/10.48550/arXiv.1803.08475
Lazaridis, A., Fachantidis, A., Vlahavas, I.: Deep reinforcement learning: a state-of-the-art walkthrough. J. Artif. Intell. Res. 69, 1421–1471 (2020). https://doi.org/10.1613/jair.1.12412
Article Google Scholar
Leung, K., Wu, J.T.: Managing waning vaccine protection against SARS-CoV-2 variants. Lancet 399(10319), 2–3 (2022). https://doi.org/10.1016/S0140-6736(21)02841-5
Article Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. Paper presented at 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico (2016)
Google Scholar
Liu, D., Jing, Y., Zhao, J., Wang, W., Song, G.: A fast and efficient algorithm for mining top-k nodes in complex networks. Sci. Rep. 7(1), 43330 (2017). https://doi.org/10.1038/srep43330
Article Google Scholar
Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions (2017). https://doi.org/10.48550/arXiv.1705.07874
Madan, A., Cebrian, M., Moturu, S., Farrahi, K., Pentland, A.S.: Sensing the “health state’’ of a community. IEEE Pervasive Comput. 11(4), 36–45 (2012). https://doi.org/10.1109/MPRV.2011.79
Article Google Scholar
Martinez-Garcia, M., Sansano-Sansano, E., Castillo-Hornero, A., Femenia, R., Roomp, K., Oliver, N.: Social isolation during the COVID-19 pandemic in Spain: a population study (2022). https://doi.org/10.1101/2022.01.22.22269682
Mason, R., Allegretti, A., Devlin, H., Sample, I.: UK treasury pushes to end most free Covid testing despite experts’ warnings. The Guardian (2022)
Google Scholar
Masuda, N.: Immunization of networks with community structure. New J. Phys. 11(12), 123018 (2009). https://doi.org/10.1088/1367-2630/11/12/123018
Matrajt, L., Leung, T.: Evaluating the effectiveness of social distancing interventions to delay or flatten the epidemic curve of coronavirus disease. Emerg. Infect. Dis. 26(8), 1740–1748 (2020). https://doi.org/10.3201/eid2608.201093
Article Google Scholar
Mei, J., Xiao, C., Dai, B., Li, L., Szepesvari, C., Schuurmans, D.: Escaping the gravitational pull of softmax. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21130–21140. Curran Associates, Inc. (2020)
Google Scholar
Meirom, E., Maron, H., Mannor, S., Chechik, G.: Controlling graph dynamics with reinforcement learning and graph neural networks. In: Proceedings of the 38th International Conference on Machine Learning, pp. 7565–7577. PMLR (2021)
Google Scholar
Meirom, E., Milling, C., Caramanis, C., Mannor, S., Shakkottai, S., Orda, A.: Localized epidemic detection in networks with overwhelming noise. ACM SIGMETRICS Perform. Eval. Rev. 43(1), 441–442 (2015). https://doi.org/10.1145/2796314.2745883
Article Google Scholar
Mercer, T.R., Salit, M.: Testing at scale during the COVID-19 pandemic. Nat. Rev. Genet. 22(7), 415–426 (2021). https://doi.org/10.1038/s41576-021-00360-w
Article Google Scholar
Miller, J.C., Hyman, J.M.: Effective vaccination strategies for realistic social networks. Phys. A 386(2), 780–785 (2007). https://doi.org/10.1016/j.physa.2007.08.054
Article Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). https://doi.org/10.48550/arXiv.1312.5602
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar
Morris, C., et al.: Weisfeiler and leman go neural: higher-order graph neural networks. arXiv:1810.02244 [cs, stat] (2020)
Moshiri, N.: The dual-Barabási-Albert model (2018)
Google Scholar
Murata, T., Koga, H.: Extended methods for influence maximization in dynamic networks. Comput. Soc. Netw. 5(1), 1–21 (2018). https://doi.org/10.1186/s40649-018-0056-8
Article Google Scholar
Oono, K., Suzuki, T.: Graph neural networks exponentially lose expressive power for node classification. arXiv:1905.10947 [cs, stat] (2021)
Panagopoulos, G., Nikolentzos, G., Vazirgiannis, M.: Transfer graph neural networks for pandemic forecasting. arXiv:2009.08388 [cs, stat] (2021)
Pandit, J.A., Radin, J.M., Quer, G., Topol, E.J.: Smartphone apps in the COVID-19 pandemic. Nat. Biotechnol. 40(7), 1013–1022 (2022). https://doi.org/10.1038/s41587-022-01350-x
Article Google Scholar
Preciado, V.M., Zargham, M., Enyioha, C., Jadbabaie, A., Pappas, G.J.: Optimal resource allocation for network protection against spreading processes. IEEE Trans. Control Netw. Syst. 1(1), 99–108 (2014). https://doi.org/10.1109/TCNS.2014.2310911
Article Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, USA (1994)
Book Google Scholar
Rayner, D.C., Sturtevant, N.R., Bowling, M.: Subset selection of search heuristics. In: IJCAI (2019)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier (2016). https://doi.org/10.48550/arXiv.1602.04938
Rimmer, A.: Sixty seconds on . . . the pingdemic. BMJ 374, n1822 (2021). https://doi.org/10.1136/bmj.n1822
Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166 (1994)
Google Scholar
Rusu, A., Farrahi, K., Emonet, R.: Modelling digital and manual contact tracing for COVID-19 Are low uptakes and missed contacts deal-breakers? Preprint. Epidemiology (2021). https://doi.org/10.1101/2021.04.29.21256307
Article Google Scholar
Salathé, M., Jones, J.H.: Dynamics and control of diseases in networks with community structure. PLOS Comput. Biol. 6(4), e1000736 (2010). https://doi.org/10.1371/journal.pcbi.1000736, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000736
Sato, R., Yamada, M., Kashima, H.: Random features strengthen graph neural networks (2021). https://doi.org/10.48550/arXiv.2002.03155
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009). https://doi.org/10.1109/TNN.2008.2005605
Article Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation (2018). https://doi.org/10.48550/arXiv.1506.02438
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 [cs] (2017)
Serafino, M., et al.: Digital contact tracing and network theory to stop the spread of COVID-19 using big-data on human mobility geolocalization. PLOS Comput. Biol. 18(4), e1009865 (2022). https://doi.org/10.1371/journal.pcbi.1009865
Shah, C., et al.: Finding patient zero: learning contagion source with graph neural networks (2020)
Google Scholar
Sigal, A.: Milder disease with Omicron: is it the virus or the pre-existing immunity? Nat. Rev. Immunol. 22(2), 69–71 (2022). https://doi.org/10.1038/s41577-022-00678-4
Article Google Scholar
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961
Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017). https://doi.org/10.48550/arXiv.1712.01815
Smith, J.: Demand for Covid vaccines falls amid waning appetite for booster shots. Financial Times (2022). https://www.ft.com/content/9ac9f8fc-1ab3-4cb2-81bf-259ba612f600
Smith, R.L., et al.: Longitudinal assessment of diagnostic test performance over the course of acute SARS-CoV-2 infection. J. Infect. Dis. 224(6), 976–982 (2021). https://doi.org/10.1093/infdis/jiab337
Article Google Scholar
Song, H., et al.: Solving continual combinatorial selection via deep reinforcement learning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 3467–3474 (2019). https://doi.org/10.24963/ijcai.2019/481
Su, Z., Cheshmehzangi, A., McDonnell, D., da Veiga, C.P., Xiang, Y.T.: Mind the “Vaccine Fatigue”. Front. Immunol. 13 (2022)
Google Scholar
Sukumar, S.R., Nutaro, J.J.: Agent-based vs. equation-based epidemiological models: a model selection case study. In: 2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom), pp. 74–79 (2012). https://doi.org/10.1109/BioMedCom.2012.19
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988). https://doi.org/10.1007/BF00115009
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning Series, 2nd edn. The MIT Press, Cambridge (2018)
Google Scholar
Tian, S., Mo, S., Wang, L., Peng, Z.: Deep reinforcement learning-based approach to tackle topic-aware influence maximization. Data Sci. Eng. 5(1), 1–11 (2020). https://doi.org/10.1007/s41019-020-00117-1
Article Google Scholar
Tomy, A., Razzanelli, M., Di Lauro, F., Rus, D., Della Santina, C.: Estimating the state of epidemics spreading with graph neural networks. Nonlinear Dyn. 109(1), 249–263 (2022). https://doi.org/10.1007/s11071-021-07160-1
van Hasselt, H., Madjiheurem, S., Hessel, M., Silver, D., Barreto, A., Borsa, D.: Expected eligibility traces. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 9997–10005 (2021). https://doi.org/10.1609/aaai.v35i11.17200
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. arXiv:1710.10903 [cs, stat] (2018)
Watkins, C.: Learning from delayed rewards (1989)
Google Scholar
Wymant, C., et al.: The epidemiological impact of the NHS COVID-19 app. Nature 594(7863), 408–412 (2021). https://doi.org/10.1038/s41586-021-03606-z
Article Google Scholar
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv:1810.00826 [Cs, Stat] (2019)
Yamada, M., Jitkrittum, W., Sigal, L., Xing, E.P., Sugiyama, M.: High-dimensional feature selection by feature-wise kernelized lasso. Neural Comput. 26(1), 185–207 (2014). https://doi.org/10.1162/NECO_a_00537
Article Google Scholar
Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020). https://doi.org/10.1016/j.aiopen.2021.01.001
Article Google Scholar

Download references

Acknowledgements

We thank Dr Eli Meirom for correspondence clarifying certain aspects from their study. MN was funded by EPSRC grant EP/S000356/1 Artificial and Augmented Intelligence for Automated Scientific Discovery. AR was funded by the EPSRC via a scholarship from the University of Southampton.

Author information

Authors and Affiliations

University of Southampton, Southampton, UK
Andrei C. Rusu, Katayoun Farrahi & Mahesan Niranjan

Authors

Andrei C. Rusu
View author publications
You can also search for this author in PubMed Google Scholar
Katayoun Farrahi
View author publications
You can also search for this author in PubMed Google Scholar
Mahesan Niranjan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrei C. Rusu .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Athanasios Tsanas
Centre for Research and Technology Hellas, Thessaloniki, Greece
Andreas Triantafyllidis

A Appendix

1.1 A.1 Performance Analysis

We compare the mean total elapsed time for running epidemics using each of our testing agents in Table A1. These results corresponds to the wall clock time recorded on an average Windows machine equipped with an Intel i7-7700 CPU, an NVIDIA RTX 3060 GPU and 32 GB of random access memory.

Table A1. Average wall clock time per epidemic during evaluation. Configuration: Barabási-Albert networks of 2000 nodes, an average degree of approximately 3, and a daily testing budget of $k=2$.

Full size table

1.2 A.2 Epidemic Modelling

All our epidemic models rely on the SEIR compartmental formulation, but the diffusion process remains bound by the interaction network configuration.

The multi-site mean-field models considered in this work rely on exponential waiting times sampled via Gillespie’s algorithm to obtain subsequent events, with the state transition probabilities defined as follows:

$$\begin{aligned} \begin{gathered} p(S \rightarrow E) = b \, w_j \mathop {}\!\mathcal {4}t \\ p(E \rightarrow I) = e^{-1} \mathop {}\!\mathcal {4}t \\ p(I \rightarrow R) = \rho \mathop {}\!\mathcal {4}t, \end{gathered} \end{aligned}$$

(2)

where b is the base transmission rate, $w_j$ are time-dependent edge weights, e is the exposed state duration, $\rho $ is the recovery rate, while $\mathop {}\!\mathcal {4}t$ is a time interval.

In contrast, the agent-based model loops through every node i at every time step, executing the appropriate transition events when one of the normally-distributed samples ($d_i$ or $r_i$) decreases to 0. Concurrently, every edge j is visited to check whether an infection event occurs over that connection, according to the transmission probability defined in Eq 2.

1.3 A.3 Algorithmic Details for the Proximal Policy Optimization

We start by reminding the reader about some general reinforcement learning quantities and relations:

$$\begin{aligned} \begin{gathered} \hat{A}_t^{(\gamma ,0)}(a_t;\theta ) = \delta _t^\gamma (\theta ) = R_t + \gamma V(s_{t+1};\theta _T) - V(s_t;\theta ) \\ \hat{A}_t^{(\gamma ,1)}(a_t;\theta ) = G_t^\gamma - V(s_t;\theta ) \\ \hat{A}_t^{(\gamma ,\lambda )}(a_t;\theta ) = \sum _{l=0}^{T} (\gamma \lambda )^l\delta _{t+l}^\gamma (\theta ) \\ r_t^{ORIG}(\theta ) = \frac{\pi (a_t|s_t;\theta )}{\pi (a_t|s_t;\theta _k)} \quad r_t^{SARSA}(\theta ) = \frac{\pi (a_t|s_{t+1};\theta )}{\pi (a_t|s_t;\theta _k)} \end{gathered} \end{aligned}$$

(3)

In Eq 3, $R_t$ is the reward obtained by the agent after taking action $a_t \sim \pi (a|s_t;\theta _k)$ and transitioning from state $s_t$ to $s_{t+1}$. The value of a given state s is approximated using a neural network $V(s,\theta )$, which, together with $R_t$ and the discount factor $\gamma $, determines the TD-error $\delta ^\gamma $; $\theta _k$ parameterizes the acting policy, $\theta _T$ is a delayed state of $\theta _k$ that parameterizes the regression target in online learning [69], $r_t^{ORIG}(\theta )$ denotes the ratio between a policy parameterized by a given $\theta $ and the acting policy, while $r_t^{SARSA}(\theta )$ represents an alternative formulation for the latter that replaces the numerator with the policy of $\theta $ evaluated at the next state $s_{t+1}$. Finally, the $\hat{A}_t^{(\gamma ,\lambda )}$ terms represent different forms of the advantage function, as given by [87], with the special cases $\lambda =0$, when the advantage is equal to the TD-error, and $\lambda =1$, when the minuend of the RHS equation is the discounted return of the episode, $G_t$.

We rewrite the Proximal Policy Optimization (PPO) equations in terms of the quantities above in Eq 4, where $\mathcal {E}$, $c_1$ and $c_2$ are hyperparameters, clip(.) is a function that clips its argument to the specified range, transform(.) is a function that modifies the gradient descent update according to a specific optimizer (e.g. Adam [47]), while $\mathcal {H}_t(\theta )$ is an entropy regularizer [31]. In contrast to the original formulation, Eq 5 describes our proposed modification of PPO to allow for optimizing the objective in a memory-efficient online manner. In particular, we rewrite the loss terms using the one-step advantage function $\hat{A}_t^{(\gamma ,0)}(a_t;\theta )$, and introduce an intermediary operation that accumulates the gradients of our modified loss using a unified eligibility trace [100], in a similar fashion to the methodology employed by [50], obtaining a backward-view approximation of the generalized advantage estimate $\hat{A}_t^{(\gamma ,\lambda )}$ in the process [87]. We note that, by setting $r_t=r_t^{SARSA}$, we can eliminate the requirement of storing $s_t$ in memory for the subsequent timestamp, while retaining the benefits of ratio clipping. This works well empirically since major shifts between $s_{t+1}$ and $s_t$ are not common in our environment. Based on previous work and our own assessment, we set $\gamma =0.99$, $\lambda =0.97$, $\mathcal {E}=0.2$, $c_1=0.5$, $c_2=0.01$, and update the target value network every 5 episodes across all our experiments.

$$\begin{aligned} \begin{gathered} \mathcal {L}^{CLIP}_t(\theta ) = \min [r_t(\theta ) \hat{A}_t^{(\gamma ,\lambda )}(a_t;\theta ), \text {clip}(r_t(\theta ), 1 - \mathcal {E}, 1 + \mathcal {E}) \hat{A}^{(\gamma ,\lambda )}_t(a_t;\theta )] \\ \mathcal {L}_t^{VF}(\theta ) = [\hat{A}_t^{(\gamma ,1)}(a_t; \theta )]^2 \quad \mathcal {H}_t(\theta ) = - \sum _{a \in A} \pi (a|s_t;\theta ) \log \pi (a|s_t;\theta )\\ \mathcal {L}^{PPO}_t(\theta ) = \mathbb {E}_{t}[-\mathcal {L}^{CLIP}_t(\theta ) + c_1 \mathcal {L}^{VF}_t(\theta ) - c_2 \mathcal {H}_t(\theta ))] \\ \theta _{k+1} = \mathop {\mathrm {arg\,min}}\limits _\theta {\mathcal {L}^{PPO}_t(\theta )} \end{gathered} \end{aligned}$$

(4)

$$\begin{aligned} \begin{gathered} \mathcal {L}^{OCLIP}_t(\theta ) = \min [r_t(\theta ) \hat{A}^{(\gamma ,0)}_t(a_t;\theta ), \text {clip}(r_t(\theta ), 1 - \mathcal {E}, 1 + \mathcal {E}) \hat{A}_t^{(\gamma ,0)}(a_t;\theta )] \\ \mathcal {L}_t^{OVF}(\theta ) = [\hat{A}_t^{(\gamma ,0)}(a_t; \theta )]^2 \quad \mathcal {H}_t(\theta ) = - \sum _{a \in A} \pi (a|s_t;\theta ) \log \pi (a|s_t;\theta )\\ \mathcal {L}^{OPPO}_t(\theta ) = -\mathcal {L}^{OCLIP}_t(\theta ) + c_1 \mathcal {L}^{OVF}_t(\theta ) - c_2 \mathcal {H}_t(\theta ) \\ E_t = \gamma \lambda E_{t-1} + \frac{\nabla _{\theta _k}\mathcal {L}^{OPPO}_t(\theta _k)}{s} \; \text {, with} \; s=\delta _t^\gamma (\theta _k) \; \text {or} \; s=1 \\ \Delta \theta _k = \text {transform}(\delta _t^\gamma (\theta _k)E_t) \\ \theta _{k+1} = \theta _k - \Delta \theta _k \end{gathered} \end{aligned}$$

(5)

1.4 A.4 Supporting Figures

1.5 A.5 Control Framework

The logic behind our epidemic control framework in the continuous-time simulation scenario is outlined in Algorithm 1. The class hierarchy of the agents, together with their logic, can be consulted in Algorithm 2. Refer to Table A2 for details about the variables involved in these.

Table A2. Legend for the control framework pseudocode.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rusu, A.C., Farrahi, K., Niranjan, M. (2023). Flattening the Curve Through Reinforcement Learning Driven Test and Trace Policies. In: Tsanas, A., Triantafyllidis, A. (eds) Pervasive Computing Technologies for Healthcare. PH 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-031-34586-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-34586-9_14
Published: 11 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34585-2
Online ISBN: 978-3-031-34586-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Flattening the Curve Through Reinforcement Learning Driven Test and Trace Policies

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Performance Analysis

1.2 A.2 Epidemic Modelling

1.3 A.3 Algorithmic Details for the Proximal Policy Optimization

1.4 A.4 Supporting Figures

1.5 A.5 Control Framework

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation