Skip to main content

Flattening the Curve Through Reinforcement Learning Driven Test and Trace Policies

  • Conference paper
  • First Online:
Pervasive Computing Technologies for Healthcare (PH 2022)

Abstract

An effective way of limiting the diffusion of viruses when vaccines are unavailable or insufficiently potent to eradicate them is through running widespread “test and trace” programmes. Although these have been instrumental during the COVID-19 pandemic, they also lead to significant increases in public spending and societal disruptions caused by the numerous isolation requirements. What is more, after the health measures were relaxed across the world, these programmes were unable to prevent substantial upsurges in infections. Here we propose an alternative approach to conducting pathogen testing and contact tracing that is adaptable to the budgeting requirements and risk tolerances of regional policy makers, while still breaking the high risk transmission chains. To that end, we propose several agents that rank individuals based on the role they possess in their interaction network and the epidemic state over which this diffuses, showing that testing or isolating just the top ranked can achieve adequate levels of containment without incurring the costs associated with standard strategies. Additionally, we extensively compare all the policies we derive, and show that a reinforcement learning actor based on graph neural networks outcompetes the more competitive heuristics by up to 15% in the containment rate, while far surpassing the standard random samplers by margins of 50% or more. Finally, we clearly demonstrate the versatility of the learned policies by appraising the decisions taken by the deep learning agent in different contexts using a diverse set of prediction explanation and state visualization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    COVID-19 Attitudes Survey by YouGov: https://tinyurl.com/yougov-attitudes.

  2. 2.

    COVID-19 Infection Survey by ONS: https://tinyurl.com/ons-covid19.

References

  1. Abueg, M., et al.: Modeling the combined effect of digital exposure notification and non-pharmaceutical interventions on the COVID-19 epidemic in Washington state. In: medRxiv, p. 2020.08.29.20184135. Cold Spring Harbor Laboratory Press (2020). https://doi.org/10.1101/2020.08.29.20184135

  2. Alon, U., Yahav, E.: On the bottleneck of graph neural networks and its practical implications. In: International Conference on Learning Representations (2022)

    Google Scholar 

  3. Andrews, N., et al.: COVID-19 vaccine effectiveness against the omicron (B.1.1.529) variant. New Engl. J. Med. 386(16), 1532–1546 (2022). https://doi.org/10.1056/NEJMoa2119451

  4. Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers (2021). https://doi.org/10.48550/arXiv.2106.08254

  5. Bastani, H., et al.: Efficient and targeted COVID-19 border testing via reinforcement learning. Nature 599(7883), 108–113 (2021). https://doi.org/10.1038/s41586-021-04014-z, https://www.nature.com/articles/s41586-021-04014-z

  6. Beaini, D., Passaro, S., Létourneau, V., Hamilton, W.L., Corso, G., Liò, P.: Directional Graph Networks (2021). https://doi.org/10.48550/arXiv.2010.02863

  7. Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning (2017). https://doi.org/10.48550/arXiv.1611.09940

  8. Bodnar, C., Di Giovanni, F., Chamberlain, B.P., Liò, P., Bronstein, M.M.: Neural sheaf diffusion: a topological perspective on heterophily and oversmoothing in GNNs (2022). https://doi.org/10.48550/arXiv.2202.04579

  9. Braha, D., Bar-Yam, Y.: From centrality to temporary fame: dynamic centrality in complex networks. Complexity 12(2), 59–63 (2006). https://doi.org/10.1002/cplx.20156

    Article  Google Scholar 

  10. Brody, S., Alon, U., Yahav, E.: How attentive are graph attention networks? (2022). https://doi.org/10.48550/arXiv.2105.14491

  11. Bronstein, M.: Deep learning on graphs: successes, challenges, and next steps (2022). https://towardsdatascience.com/deep-learning-on-graphs-successes-challenges-and-next-steps-7d9ec220ba8

  12. Bruxvoort, K.J., et al.: Effectiveness of mRNA-1273 against delta, mu, and other emerging variants of SARS-CoV-2: test negative case-control study. BMJ 375, e068848 (2021). https://doi.org/10.1136/bmj-2021-068848

  13. Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2009, p. 199. ACM Press, Paris (2009). https://doi.org/10.1145/1557019.1557047

  14. Chung, F., Horn, P., Tsiatas, A.: Distributing antidote using PageRank vectors. Internet Math. 6(2), 237–254 (2009). https://doi.org/10.1080/15427951.2009.10129184

    Article  Google Scholar 

  15. Clair, R., Gordon, M., Kroon, M., Reilly, C.: The effects of social isolation on well-being and life satisfaction during pandemic. Humanit. Soc. Sci. Commun. 8(1), 1–6 (2021). https://doi.org/10.1057/s41599-021-00710-3

    Article  Google Scholar 

  16. Cohen, R., Havlin, S., ben-Avraham, D.: Efficient immunization strategies for computer networks and populations. Phys. Rev. Lett. 91(24), 247901 (2003). https://doi.org/10.1103/PhysRevLett.91.247901

  17. Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs (2018)

    Google Scholar 

  18. Davis, E.L., et al.: Contact tracing is an imperfect tool for controlling COVID-19 transmission and relies on population adherence. Nat. Commun. 12(1), 5412 (2021). https://doi.org/10.1038/s41467-021-25531-5

    Article  Google Scholar 

  19. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (2019)

  20. Di Domenico, L., Pullano, G., Sabbatini, C.E., Boëlle, P.Y., Colizza, V.: Impact of lockdown on COVID-19 epidemic in Île-de-France and possible exit strategies. BMC Med. 18(1), 240 (2020). https://doi.org/10.1186/s12916-020-01698-4

    Article  Google Scholar 

  21. Dighe, A., et al.: Response to COVID-19 in South Korea and implications for lifting stringent interventions. BMC Med. 18(1), 321 (2020). https://doi.org/10.1186/s12916-020-01791-8

    Article  Google Scholar 

  22. Erdös, P., Rényi, A.: On random graphs I. Publicationes Mathematicae Debrecen 6, 290 (1959)

    Google Scholar 

  23. Farrahi, K., Emonet, R., Cebrian, M.: Epidemic contact tracing via communication traces. PLoS ONE 9(5), e95133 (2014). https://doi.org/10.1371/journal.pone.0095133

  24. Ferdinands, J.M.: Waning 2-Dose and 3-dose effectiveness of mrna vaccines against COVID-19–associated emergency department and urgent care encounters and hospitalizations among adults during periods of delta and omicron variant predominance—VISION network, 10 States, August 2021–January 2022. MMWR Morbidity Mortality Weekly Rep. 71 (2022). https://doi.org/10.15585/mmwr.mm7107e2

  25. Ferguson, N., et al.: Report 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. Technical report, Imperial College London (2020). https://doi.org/10.25561/77482

  26. Ferretti, L., et al.: Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 368 (2020). https://doi.org/10.1126/science.abb6936

  27. Fung, V., Zhang, J., Juarez, E., Sumpter, B.: Benchmarking graph neural networks for materials chemistry. NPJ Comput. Mater. 7, 84 (2021). https://doi.org/10.1038/s41524-021-00554-0

    Article  Google Scholar 

  28. Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977). https://doi.org/10.1021/j100540a008

    Article  Google Scholar 

  29. Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry (2017). https://doi.org/10.48550/arXiv.1704.01212

  30. Gori, M., Monfardini, G., Scarselli, F.: A new model for earning in graph domains. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2, pp. 729–734 (2005). https://doi.org/10.1109/IJCNN.2005.1555942

  31. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018)

    Google Scholar 

  32. Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. arXiv:1706.02216 [cs, stat] (2018)

  33. He, J., et al.: Deep reinforcement learning with a combinatorial action space for predicting popular reddit threads. In: EMNLP (2019)

    Google Scholar 

  34. Henley, J.: COVID surges across Europe as experts warn not let guard down. The Guardian (2022). https://www.theguardian.com/world/2022/jun/21/covid-surges-europe-ba4-ba5-cases

  35. Hinch, R., et al.: Effective configurations of a digital contact tracing app: a report to NHSX. Technical report (2020)

    Google Scholar 

  36. Hoang, N., Maehara, T.: Revisiting graph neural networks: all we have is low-pass filters. arXiv:1905.09550 [cs, math, stat] (2019)

  37. Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983). https://doi.org/10.1016/0378-8733(83)90021-7

    Article  Google Scholar 

  38. Holme, P., Kim, B.J.: Growing scale-free networks with tunable clustering. Phys. Rev. E 65(2), 026107 (2002). https://doi.org/10.1103/PhysRevE.65.026107

  39. Huang, Q., Yamada, M., Tian, Y., Singh, D., Yin, D., Chang, Y.: GraphLIME: local interpretable model explanations for graph neural networks (2020). https://doi.org/10.48550/arXiv.2001.06216

  40. Huerta, R., Tsimring, L.S.: Contact tracing and epidemics control in social networks. Phys. Rev. E 66(5), 056115 (2002). https://doi.org/10.1103/PhysRevE.66.056115

  41. Jhun, B.: Effective vaccination strategy using graph neural network ansatz (2021). https://doi.org/10.48550/arXiv.2111.00920

  42. Joffe, A.R.: COVID-19: rethinking the lockdown groupthink. Front. Public Health 9 (2021)

    Google Scholar 

  43. Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem (2019)

    Google Scholar 

  44. Kapoor, A., et al.: Examining COVID-19 forecasting using spatio-temporal graph neural networks. arXiv:2007.03113 [cs] (2020)

  45. Kermack, W.O., McKendrick, A.G., Walker, G.T.: A contribution to the mathematical theory of epidemics. Proc. Roy. Soc. London Ser. A Containing Pap. Math. Phys. Character 115(772), 700–721 (1927). https://doi.org/10.1098/rspa.1927.0118

  46. Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. ACM Comput. Surv. 3505244 (2022). https://doi.org/10.1145/3505244

  47. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017). https://doi.org/10.48550/arXiv.1412.6980

  48. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. Conference Track Proceedings (2017). OpenReview.net

  49. Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 23(6), 4909–4926 (2022). https://doi.org/10.1109/TITS.2021.3054625

  50. Kobayashi, T.: Adaptive and multiple time-scale eligibility traces for online deep reinforcement learning. Robot. Auton. Syst. 151, 104019 (2022). https://doi.org/10.1016/j.robot.2021.104019

  51. Kojaku, S., Hébert-Dufresne, L., Mones, E., Lehmann, S., Ahn, Y.Y.: The effectiveness of backward contact tracing in networks. Nat. Phys. 17(5), 652–658 (2021). https://doi.org/10.1038/s41567-021-01187-2

    Article  Google Scholar 

  52. Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, vol. 12. MIT Press (1999)

    Google Scholar 

  53. Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! (2019). https://doi.org/10.48550/arXiv.1803.08475

  54. Lazaridis, A., Fachantidis, A., Vlahavas, I.: Deep reinforcement learning: a state-of-the-art walkthrough. J. Artif. Intell. Res. 69, 1421–1471 (2020). https://doi.org/10.1613/jair.1.12412

    Article  Google Scholar 

  55. Leung, K., Wu, J.T.: Managing waning vaccine protection against SARS-CoV-2 variants. Lancet 399(10319), 2–3 (2022). https://doi.org/10.1016/S0140-6736(21)02841-5

    Article  Google Scholar 

  56. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. Paper presented at 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico (2016)

    Google Scholar 

  57. Liu, D., Jing, Y., Zhao, J., Wang, W., Song, G.: A fast and efficient algorithm for mining top-k nodes in complex networks. Sci. Rep. 7(1), 43330 (2017). https://doi.org/10.1038/srep43330

    Article  Google Scholar 

  58. Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions (2017). https://doi.org/10.48550/arXiv.1705.07874

  59. Madan, A., Cebrian, M., Moturu, S., Farrahi, K., Pentland, A.S.: Sensing the “health state’’ of a community. IEEE Pervasive Comput. 11(4), 36–45 (2012). https://doi.org/10.1109/MPRV.2011.79

    Article  Google Scholar 

  60. Martinez-Garcia, M., Sansano-Sansano, E., Castillo-Hornero, A., Femenia, R., Roomp, K., Oliver, N.: Social isolation during the COVID-19 pandemic in Spain: a population study (2022). https://doi.org/10.1101/2022.01.22.22269682

  61. Mason, R., Allegretti, A., Devlin, H., Sample, I.: UK treasury pushes to end most free Covid testing despite experts’ warnings. The Guardian (2022)

    Google Scholar 

  62. Masuda, N.: Immunization of networks with community structure. New J. Phys. 11(12), 123018 (2009). https://doi.org/10.1088/1367-2630/11/12/123018

  63. Matrajt, L., Leung, T.: Evaluating the effectiveness of social distancing interventions to delay or flatten the epidemic curve of coronavirus disease. Emerg. Infect. Dis. 26(8), 1740–1748 (2020). https://doi.org/10.3201/eid2608.201093

    Article  Google Scholar 

  64. Mei, J., Xiao, C., Dai, B., Li, L., Szepesvari, C., Schuurmans, D.: Escaping the gravitational pull of softmax. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21130–21140. Curran Associates, Inc. (2020)

    Google Scholar 

  65. Meirom, E., Maron, H., Mannor, S., Chechik, G.: Controlling graph dynamics with reinforcement learning and graph neural networks. In: Proceedings of the 38th International Conference on Machine Learning, pp. 7565–7577. PMLR (2021)

    Google Scholar 

  66. Meirom, E., Milling, C., Caramanis, C., Mannor, S., Shakkottai, S., Orda, A.: Localized epidemic detection in networks with overwhelming noise. ACM SIGMETRICS Perform. Eval. Rev. 43(1), 441–442 (2015). https://doi.org/10.1145/2796314.2745883

    Article  Google Scholar 

  67. Mercer, T.R., Salit, M.: Testing at scale during the COVID-19 pandemic. Nat. Rev. Genet. 22(7), 415–426 (2021). https://doi.org/10.1038/s41576-021-00360-w

    Article  Google Scholar 

  68. Miller, J.C., Hyman, J.M.: Effective vaccination strategies for realistic social networks. Phys. A 386(2), 780–785 (2007). https://doi.org/10.1016/j.physa.2007.08.054

    Article  Google Scholar 

  69. Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). https://doi.org/10.48550/arXiv.1312.5602

  70. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  71. Morris, C., et al.: Weisfeiler and leman go neural: higher-order graph neural networks. arXiv:1810.02244 [cs, stat] (2020)

  72. Moshiri, N.: The dual-Barabási-Albert model (2018)

    Google Scholar 

  73. Murata, T., Koga, H.: Extended methods for influence maximization in dynamic networks. Comput. Soc. Netw. 5(1), 1–21 (2018). https://doi.org/10.1186/s40649-018-0056-8

    Article  Google Scholar 

  74. Oono, K., Suzuki, T.: Graph neural networks exponentially lose expressive power for node classification. arXiv:1905.10947 [cs, stat] (2021)

  75. Panagopoulos, G., Nikolentzos, G., Vazirgiannis, M.: Transfer graph neural networks for pandemic forecasting. arXiv:2009.08388 [cs, stat] (2021)

  76. Pandit, J.A., Radin, J.M., Quer, G., Topol, E.J.: Smartphone apps in the COVID-19 pandemic. Nat. Biotechnol. 40(7), 1013–1022 (2022). https://doi.org/10.1038/s41587-022-01350-x

    Article  Google Scholar 

  77. Preciado, V.M., Zargham, M., Enyioha, C., Jadbabaie, A., Pappas, G.J.: Optimal resource allocation for network protection against spreading processes. IEEE Trans. Control Netw. Syst. 1(1), 99–108 (2014). https://doi.org/10.1109/TCNS.2014.2310911

    Article  Google Scholar 

  78. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, USA (1994)

    Book  Google Scholar 

  79. Rayner, D.C., Sturtevant, N.R., Bowling, M.: Subset selection of search heuristics. In: IJCAI (2019)

    Google Scholar 

  80. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier (2016). https://doi.org/10.48550/arXiv.1602.04938

  81. Rimmer, A.: Sixty seconds on . . . the pingdemic. BMJ 374, n1822 (2021). https://doi.org/10.1136/bmj.n1822

  82. Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166 (1994)

    Google Scholar 

  83. Rusu, A., Farrahi, K., Emonet, R.: Modelling digital and manual contact tracing for COVID-19 Are low uptakes and missed contacts deal-breakers? Preprint. Epidemiology (2021). https://doi.org/10.1101/2021.04.29.21256307

    Article  Google Scholar 

  84. Salathé, M., Jones, J.H.: Dynamics and control of diseases in networks with community structure. PLOS Comput. Biol. 6(4), e1000736 (2010). https://doi.org/10.1371/journal.pcbi.1000736, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000736

  85. Sato, R., Yamada, M., Kashima, H.: Random features strengthen graph neural networks (2021). https://doi.org/10.48550/arXiv.2002.03155

  86. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009). https://doi.org/10.1109/TNN.2008.2005605

    Article  Google Scholar 

  87. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation (2018). https://doi.org/10.48550/arXiv.1506.02438

  88. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 [cs] (2017)

  89. Serafino, M., et al.: Digital contact tracing and network theory to stop the spread of COVID-19 using big-data on human mobility geolocalization. PLOS Comput. Biol. 18(4), e1009865 (2022). https://doi.org/10.1371/journal.pcbi.1009865

  90. Shah, C., et al.: Finding patient zero: learning contagion source with graph neural networks (2020)

    Google Scholar 

  91. Sigal, A.: Milder disease with Omicron: is it the virus or the pre-existing immunity? Nat. Rev. Immunol. 22(2), 69–71 (2022). https://doi.org/10.1038/s41577-022-00678-4

    Article  Google Scholar 

  92. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961

  93. Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017). https://doi.org/10.48550/arXiv.1712.01815

  94. Smith, J.: Demand for Covid vaccines falls amid waning appetite for booster shots. Financial Times (2022). https://www.ft.com/content/9ac9f8fc-1ab3-4cb2-81bf-259ba612f600

  95. Smith, R.L., et al.: Longitudinal assessment of diagnostic test performance over the course of acute SARS-CoV-2 infection. J. Infect. Dis. 224(6), 976–982 (2021). https://doi.org/10.1093/infdis/jiab337

    Article  Google Scholar 

  96. Song, H., et al.: Solving continual combinatorial selection via deep reinforcement learning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 3467–3474 (2019). https://doi.org/10.24963/ijcai.2019/481

  97. Su, Z., Cheshmehzangi, A., McDonnell, D., da Veiga, C.P., Xiang, Y.T.: Mind the “Vaccine Fatigue”. Front. Immunol. 13 (2022)

    Google Scholar 

  98. Sukumar, S.R., Nutaro, J.J.: Agent-based vs. equation-based epidemiological models: a model selection case study. In: 2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom), pp. 74–79 (2012). https://doi.org/10.1109/BioMedCom.2012.19

  99. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988). https://doi.org/10.1007/BF00115009

    Article  Google Scholar 

  100. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning Series, 2nd edn. The MIT Press, Cambridge (2018)

    Google Scholar 

  101. Tian, S., Mo, S., Wang, L., Peng, Z.: Deep reinforcement learning-based approach to tackle topic-aware influence maximization. Data Sci. Eng. 5(1), 1–11 (2020). https://doi.org/10.1007/s41019-020-00117-1

    Article  Google Scholar 

  102. Tomy, A., Razzanelli, M., Di Lauro, F., Rus, D., Della Santina, C.: Estimating the state of epidemics spreading with graph neural networks. Nonlinear Dyn. 109(1), 249–263 (2022). https://doi.org/10.1007/s11071-021-07160-1

  103. van Hasselt, H., Madjiheurem, S., Hessel, M., Silver, D., Barreto, A., Borsa, D.: Expected eligibility traces. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 9997–10005 (2021). https://doi.org/10.1609/aaai.v35i11.17200

  104. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  105. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. arXiv:1710.10903 [cs, stat] (2018)

  106. Watkins, C.: Learning from delayed rewards (1989)

    Google Scholar 

  107. Wymant, C., et al.: The epidemiological impact of the NHS COVID-19 app. Nature 594(7863), 408–412 (2021). https://doi.org/10.1038/s41586-021-03606-z

    Article  Google Scholar 

  108. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv:1810.00826 [Cs, Stat] (2019)

  109. Yamada, M., Jitkrittum, W., Sigal, L., Xing, E.P., Sugiyama, M.: High-dimensional feature selection by feature-wise kernelized lasso. Neural Comput. 26(1), 185–207 (2014). https://doi.org/10.1162/NECO_a_00537

    Article  Google Scholar 

  110. Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020). https://doi.org/10.1016/j.aiopen.2021.01.001

    Article  Google Scholar 

Download references

Acknowledgements

We thank Dr Eli Meirom for correspondence clarifying certain aspects from their study. MN was funded by EPSRC grant EP/S000356/1 Artificial and Augmented Intelligence for Automated Scientific Discovery. AR was funded by the EPSRC via a scholarship from the University of Southampton.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrei C. Rusu .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Performance Analysis

We compare the mean total elapsed time for running epidemics using each of our testing agents in Table A1. These results corresponds to the wall clock time recorded on an average Windows machine equipped with an Intel i7-7700 CPU, an NVIDIA RTX 3060 GPU and 32 GB of random access memory.

Table A1. Average wall clock time per epidemic during evaluation. Configuration: Barabási-Albert networks of 2000 nodes, an average degree of approximately 3, and a daily testing budget of \(k=2\).

1.2 A.2 Epidemic Modelling

All our epidemic models rely on the SEIR compartmental formulation, but the diffusion process remains bound by the interaction network configuration.

The multi-site mean-field models considered in this work rely on exponential waiting times sampled via Gillespie’s algorithm to obtain subsequent events, with the state transition probabilities defined as follows:

$$\begin{aligned} \begin{gathered} p(S \rightarrow E) = b \, w_j \mathop {}\!\mathcal {4}t \\ p(E \rightarrow I) = e^{-1} \mathop {}\!\mathcal {4}t \\ p(I \rightarrow R) = \rho \mathop {}\!\mathcal {4}t, \end{gathered} \end{aligned}$$
(2)

where b is the base transmission rate, \(w_j\) are time-dependent edge weights, e is the exposed state duration, \(\rho \) is the recovery rate, while \(\mathop {}\!\mathcal {4}t\) is a time interval.

In contrast, the agent-based model loops through every node i at every time step, executing the appropriate transition events when one of the normally-distributed samples (\(d_i\) or \(r_i\)) decreases to 0. Concurrently, every edge j is visited to check whether an infection event occurs over that connection, according to the transmission probability defined in Eq 2.

1.3 A.3 Algorithmic Details for the Proximal Policy Optimization

We start by reminding the reader about some general reinforcement learning quantities and relations:

$$\begin{aligned} \begin{gathered} \hat{A}_t^{(\gamma ,0)}(a_t;\theta ) = \delta _t^\gamma (\theta ) = R_t + \gamma V(s_{t+1};\theta _T) - V(s_t;\theta ) \\ \hat{A}_t^{(\gamma ,1)}(a_t;\theta ) = G_t^\gamma - V(s_t;\theta ) \\ \hat{A}_t^{(\gamma ,\lambda )}(a_t;\theta ) = \sum _{l=0}^{T} (\gamma \lambda )^l\delta _{t+l}^\gamma (\theta ) \\ r_t^{ORIG}(\theta ) = \frac{\pi (a_t|s_t;\theta )}{\pi (a_t|s_t;\theta _k)} \quad r_t^{SARSA}(\theta ) = \frac{\pi (a_t|s_{t+1};\theta )}{\pi (a_t|s_t;\theta _k)} \end{gathered} \end{aligned}$$
(3)

In Eq 3, \(R_t\) is the reward obtained by the agent after taking action \(a_t \sim \pi (a|s_t;\theta _k)\) and transitioning from state \(s_t\) to \(s_{t+1}\). The value of a given state s is approximated using a neural network \(V(s,\theta )\), which, together with \(R_t\) and the discount factor \(\gamma \), determines the TD-error \(\delta ^\gamma \); \(\theta _k\) parameterizes the acting policy, \(\theta _T\) is a delayed state of \(\theta _k\) that parameterizes the regression target in online learning [69], \(r_t^{ORIG}(\theta )\) denotes the ratio between a policy parameterized by a given \(\theta \) and the acting policy, while \(r_t^{SARSA}(\theta )\) represents an alternative formulation for the latter that replaces the numerator with the policy of \(\theta \) evaluated at the next state \(s_{t+1}\). Finally, the \(\hat{A}_t^{(\gamma ,\lambda )}\) terms represent different forms of the advantage function, as given by [87], with the special cases \(\lambda =0\), when the advantage is equal to the TD-error, and \(\lambda =1\), when the minuend of the RHS equation is the discounted return of the episode, \(G_t\).

We rewrite the Proximal Policy Optimization (PPO) equations in terms of the quantities above in Eq 4, where \(\mathcal {E}\), \(c_1\) and \(c_2\) are hyperparameters, clip(.) is a function that clips its argument to the specified range, transform(.) is a function that modifies the gradient descent update according to a specific optimizer (e.g. Adam [47]), while \(\mathcal {H}_t(\theta )\) is an entropy regularizer [31]. In contrast to the original formulation, Eq 5 describes our proposed modification of PPO to allow for optimizing the objective in a memory-efficient online manner. In particular, we rewrite the loss terms using the one-step advantage function \(\hat{A}_t^{(\gamma ,0)}(a_t;\theta )\), and introduce an intermediary operation that accumulates the gradients of our modified loss using a unified eligibility trace [100], in a similar fashion to the methodology employed by [50], obtaining a backward-view approximation of the generalized advantage estimate \(\hat{A}_t^{(\gamma ,\lambda )}\) in the process [87]. We note that, by setting \(r_t=r_t^{SARSA}\), we can eliminate the requirement of storing \(s_t\) in memory for the subsequent timestamp, while retaining the benefits of ratio clipping. This works well empirically since major shifts between \(s_{t+1}\) and \(s_t\) are not common in our environment. Based on previous work and our own assessment, we set \(\gamma =0.99\), \(\lambda =0.97\), \(\mathcal {E}=0.2\), \(c_1=0.5\), \(c_2=0.01\), and update the target value network every 5 episodes across all our experiments.

$$\begin{aligned} \begin{gathered} \mathcal {L}^{CLIP}_t(\theta ) = \min [r_t(\theta ) \hat{A}_t^{(\gamma ,\lambda )}(a_t;\theta ), \text {clip}(r_t(\theta ), 1 - \mathcal {E}, 1 + \mathcal {E}) \hat{A}^{(\gamma ,\lambda )}_t(a_t;\theta )] \\ \mathcal {L}_t^{VF}(\theta ) = [\hat{A}_t^{(\gamma ,1)}(a_t; \theta )]^2 \quad \mathcal {H}_t(\theta ) = - \sum _{a \in A} \pi (a|s_t;\theta ) \log \pi (a|s_t;\theta )\\ \mathcal {L}^{PPO}_t(\theta ) = \mathbb {E}_{t}[-\mathcal {L}^{CLIP}_t(\theta ) + c_1 \mathcal {L}^{VF}_t(\theta ) - c_2 \mathcal {H}_t(\theta ))] \\ \theta _{k+1} = \mathop {\mathrm {arg\,min}}\limits _\theta {\mathcal {L}^{PPO}_t(\theta )} \end{gathered} \end{aligned}$$
(4)
$$\begin{aligned} \begin{gathered} \mathcal {L}^{OCLIP}_t(\theta ) = \min [r_t(\theta ) \hat{A}^{(\gamma ,0)}_t(a_t;\theta ), \text {clip}(r_t(\theta ), 1 - \mathcal {E}, 1 + \mathcal {E}) \hat{A}_t^{(\gamma ,0)}(a_t;\theta )] \\ \mathcal {L}_t^{OVF}(\theta ) = [\hat{A}_t^{(\gamma ,0)}(a_t; \theta )]^2 \quad \mathcal {H}_t(\theta ) = - \sum _{a \in A} \pi (a|s_t;\theta ) \log \pi (a|s_t;\theta )\\ \mathcal {L}^{OPPO}_t(\theta ) = -\mathcal {L}^{OCLIP}_t(\theta ) + c_1 \mathcal {L}^{OVF}_t(\theta ) - c_2 \mathcal {H}_t(\theta ) \\ E_t = \gamma \lambda E_{t-1} + \frac{\nabla _{\theta _k}\mathcal {L}^{OPPO}_t(\theta _k)}{s} \; \text {, with} \; s=\delta _t^\gamma (\theta _k) \; \text {or} \; s=1 \\ \Delta \theta _k = \text {transform}(\delta _t^\gamma (\theta _k)E_t) \\ \theta _{k+1} = \theta _k - \Delta \theta _k \end{gathered} \end{aligned}$$
(5)

1.4 A.4 Supporting Figures

Fig. A1.
figure 7

Block diagram of our control framework. The Agent is passed as a parameter to the Simulator, and every time the latter samples enough events for the conditions to be met, a call to the control(.) method of the first is performed. The aforementioned function performs some preprocessing steps, and then calls control_test(.) and control_trace(.), which are responsible for the actual node ranking and are specific to each type of agent. Combinations of agents can be selected with the MixAgent.

Fig. A2.
figure 8

Infection control performance on different network architectures of 1000 nodes and a daily testing budget of \(k=2\). Uncertainties shown as boxplots.

Fig. A3.
figure 9

Epidemic curves for different network and epidemic seeds. These correspond to multiple 5000 nodes Barabási-Albert networks featuring a mean degree of 3, with a testing budget of \(k=1\%\). Here, two versions of the RL agent are displayed: one trained for 50, and one trained for 200 episodes. The y-axis limit is set to 3200 to facilitate the comparisons, yet the random agents perform poorer than this level.

Fig. A4.
figure 10

Infection control performance on different static network architectures with varying budgets. The uncertainties are shown as boxplots.

Fig. A5.
figure 11

Infection control performance on different static network architectures and sizes, with a budget of \(k=2\). Uncertainties are shown as boxplots.

Fig. A6.
figure 12

t-SNE plots of the node hidden states and dendrogram corresponding to their hierarchical clustering into 10 groups. As can be observed, the agent mostly groups detected (blue) nodes in a region of the space, while the new undetected infections (red) are predicted to appear within the risk regions on the right. Recent negative results are plotted as dark green. The dendrogram on the right displays the cardinality and the infection probability associated with each cluster.

1.5 A.5 Control Framework

The logic behind our epidemic control framework in the continuous-time simulation scenario is outlined in Algorithm 1. The class hierarchy of the agents, together with their logic, can be consulted in Algorithm 2. Refer to Table A2 for details about the variables involved in these.

Table A2. Legend for the control framework pseudocode.
figure a
figure b

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rusu, A.C., Farrahi, K., Niranjan, M. (2023). Flattening the Curve Through Reinforcement Learning Driven Test and Trace Policies. In: Tsanas, A., Triantafyllidis, A. (eds) Pervasive Computing Technologies for Healthcare. PH 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-031-34586-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34586-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34585-2

  • Online ISBN: 978-3-031-34586-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics