Optimal timing of non-pharmaceutical interventions during an epidemic

In response to the recent outbreak of the SARS-CoV-2 virus governments have aimed to reduce the virus’s spread through, inter alia, non-pharmaceutical intervention. We address the question when such measures should be implemented and, once implemented, when to remove them. These issues are viewed through a real-options lens and we develop an SIRD-like continuous-time Markov chain model to analyze a sequence of options: the option to intervene and introduce measures and, after intervention has started, the option to remove these. Measures can be imposed multiple times. We implement our model using estimates from empirical studies and, under fairly general assumptions, our main conclusions are that: (1) measures should be put in place not long after the first infections occur; (2) if the epidemic is discovered when there are many infected individuals already, then it is optimal never to introduce measures; (3) once the decision to introduce measures has been taken, these should stay in place until the number of susceptible or infected members of the population is close to zero; (4) it is never optimal to introduce a tier system to phase-in measures but it is optimal to use a tier system to phase-out measures; (5) a more infectious variant may reduce the duration of measures being in place; (6) the risk of infections being brought in by travelers should be curbed even when no other measures are in place. These results are robust to several variations of our base-case model.


Introduction
Non-pharmaceutical interventions can have a significant impact on the rate at which a virus spreads during an epidemic ("flattening the curve") as has been shown for previous influenza outbreaks, (see, e.g., Cowling et al., 2009;Hatchett, Mecher, & Lipsitch, 20 07;Kamigaki & Oshitani, 20 09;Stern & Markel, 20 09;Wu et al., 2010 ). 1 These are, therefore, seen as a vital element of government policy ( Ferguson et al., 2005;Leung & Nicoll, 2010 ). Recent R The authors like to thank Job Thijssen, the participants of the webinars organized by the York Mathematical Biology Seminar Series (May 2020) and The OR Society (December 2020), and the participants of the workshop "Timing Uncertainty in Economics and Finance" in Bielefeld (June, 2020) for their comments. Finally, we are grateful for the many helpful comments that we received from the Editor and three anonymous referees. This paper is dedicated to the memory of Ulco de Jong; teacher, mentor, and bon vivant .

E-mail addresses:
nick.huberts@york.ac.uk (N.F.D. Huberts), jacco.thijssen@york.ac.uk (J.J.J. Thijssen) . 1 A similar impact has been reported for COVID-19 lockdowns, see, e.g., Talic et al. (2021) , the UK government's report "Analysis of the health, economic and so-contributions by, e.g., Acemoglu, Chernozhukov, Werning, & Whinston (2020) , show that social and economic interventions can have a significant impact on public health and can significantly reduce economic damage. However, such interventions, ranging from social distancing to complete lockdown, come at a significant costs of their own ( Anderson, Heesterbeek, Klinkenberg, & Hollingsworth, 2020;Atkeson, 2020 ). These costs encompass obvious economic costs associated with a partial or complete lockdown, but also health-related costs linked to, e.g., reduced mental health levels in the population, missed diagnoses due to reduced availability of health-care workers, etc. In addition, there are one-off sunk costs of imposing a lockdown, related to, e.g., individuals' and institutions' adoption of homeworking. According to the Financial Times (24 August 2020), office owners can expect millions in extra costs to adhere to government guidelines. 2 An additional difficulty for policy makers who have to balance the costs and benefits of non-pharmaceutical intervention, is that the evolution of an epidemic is uncertain. Their cost-benefit analysis has to take this uncertainty explicitly into account.
The trade-off between costs and benefits in an uncertain environment makes the timing of the introduction and subsequent relaxation of non-pharmaceutical interventions a crucial policy decision. A consequence of the uncertainty over a virus's evolution is that it is not straightforward to determine when measures should be implemented and once implemented, when they should be relaxed. If measures are relaxed too early, a second outbreak could occur. Interventions that last too long lead to unnecessary economic and (mental) health damage. Our contribution is to develop an appropriate model and to apply dynamic programming techniques to find the optimal intervention policy of a social planner with an application to the COVID-19 epidemic. More specifically, this paper builds a continuous-time Markov chain (CTMC) of an SIRD-like 3 model of non-pharmaceutical interventions (which we refer to as "lockdown") to study the questions of (i) when to enter a lockdown and, consequently, (ii) when to exit it, under an uncertain evolution of the epidemic, by taking into account the social (health and non-health) costs of both the epidemic and the lockdown. We, therefore, not only contribute to the literature on policy making in the context of COVID-19 but our application also relates to the long tradition in OR to use CTMCs for modeling operational systems. We then calibrate our model using parameter estimates from the literature to arrive at the optimal (dynamic) intervention strategy. From this base-case scenario we then proceed to study the robustness of our policy conclusions, thereby providing a solid theoretical basis for policy evaluation by public health officials.
As pointed out by Tsekrekos & Yannacopoulos (2016) the OR community is well equipped to deal with problems where the objective is to determine the optimal timing decisions to commence and/or terminate a process or operations, when the underlying problem is subject to uncertainty. An established wide range of studies has shown how optimal stopping problems can be applied. 4 For a review and discussion of the real option valuation literature in OR we refer to Trigeorgis & Tsekrekos (2018) . We consider a model where a social planner can impose a sequence of lockdowns. Our model views the timing issues through a real options lens: for each lockdown the social planner has two, nested, options, which we value under the assumption that the social planner wants to minimize the expected present value of total costs related to the virus and the lockdown. Briefly stated, our main conclusions are that: 1. lockdown should be entered not long after the first infections occur; 2. if the epidemic is discovered too late, i.e., when there are too many infected individuals already, then it is optimal never to introduce a sequence of measures; 3. once the decision to introduce measures has been taken, these should stay in place until the number of susceptible or infected members of the population is close to zero; 4. it is never optimal to introduce a tier system to phase in measures but it is optimal to use a tier system when exiting a lockdown; 5. a more infectious variant may not only reduce the probability of lockdown being entered so that, before it is imposed, lockdown is less desirable as the virus spreads, but could also lead to a reduction in lockdown duration; and 6. lockdown is only optimal when combined with a strict policy at the country's borders that aims to minimize the risk of infections being brought in by travelers.
These results are, qualitatively, robust to several variations of our base-case model.
Some of the results are rather counter-intuitive when compared to the standard real options literature. The third result confirms the usual intuition that when switching between regimes involves sunk costs, then it is optimal to delay taking the decision. So, once lockdown is entered, one should wait until one is certain enough that the benefits of lifting lockdown are sufficiently higher than the associated costs (also see, e.g., Keogh-Brown, Wren-Lewis, Edmunds, Beutels, & Smith, 2010;Smith, Keogh-Brown, Barnett, & Tait, 2009 ). The first result is less intuitive. One might expect that the decision to enter lockdown should only be taken when one is sufficiently sure that its benefits outweigh its costs and, therefore, that the decision is taken when "enough" members of the population have been infected. However, our results belie this intuition in the sense that already after few infections it is optimal to go into lockdown. The reason is that the benefits from switching arise from the difference in the rate of infection and the expected evolution of the number of infected, which is greatest at the start of the epidemic (also see, e.g., Farboodi, Jarosch, & Shimer, 2021;Shin, 2020 ). This is supported by our finding that when the rate at which infections occur from 'outside' is higher, the decision maker wants a few more infections to occur before going into lockdown is optimal. When there is not enough time left for benefits to be realized, as is the case when a lot of members of the population have already been infected and/or recovered, then it is never optimal to incur the costs of lockdown, which explains the second result. Even when considering constraints on health care capacity, it is unlikely that lockdown is optimal when many individuals have had been infection. Our result is supported by Nova Scotia's (Canada) success with its strategy involving early and harsh measures to keep infections low in the COVID-19 epidemic (The Globe and Mail, 18 January 2021). 5 The same intuition feeds into our fifth result. An increase in the infection rate has several implications. When the virus spreads more quickly, the expected time between infections reduces. As a result, the cost of staying in lockdown may no longer outweigh its benefits, especially when a higher proportion of the population has had the virus. This increases the incentive the come out of lockdown earlier when the infection rate is higher.
COVID literature The OR literature has seen various contributions in the context of the recent COVID epidemic. Examples of empirical studies include the use of predictive analytics tools for forecasting and planning ( Nikolopoulos, Punia, Schäfers, Tsinopoulos, & Vasilakis, 2021 ), studying the effectiveness of social media as a humanitarian response ( Kumar, Xu, Ghildayal, Chandra, & Yang, 2021b ), establishing the socio-economic impact of the epidemic ( Amaratunga et al., 2021 ), how the Chinese government can stimulate consumption ( Liu, Shen, Li, & Chen, 2021 ), studying the role of social learning on closure decisions by firms ( de Vaan, Mumtaz, Nagaraj, & Srivastava, 2021 ), providing evidence that countries that did not experience SARS in 2003 delayed action ( Ru, Yang, & Zou, 2021 ), forcasting moratility rates in the USA ( Taylor & Taylor, 2021 ), addressing the capacity planning decisions of a hemodialysis clinic in Istanbul ( Bozkir et al., 2021 ), the use of an integrated epidemics-testing allocation model to minimize infections ( Abdin et al., 2021 ), and evaluating and optimizing social distancing policies ( Chen, Pun, & Wong, 2021 ).
Theoretical work related to health care and infections includes studies that, for example, develop a model to estimate and study local coronavirus outbreaks ( Chang & Kaplan, 2021 ) or to study the spread of infections in an agent-based model ( Ghaderi, 2022 ), use matheuristic algorithms for resource planning problems in a home health care ( Nikzad, Bashiri, & Abbasi, 2021 ), find facility layouts that minimize risk of infections ( Fischetti, Fischetti, & Stoustrup, 2021 ), apply an efficient large neighbourhood search algorithm to a 'contagious disease testing' problem ( Wolfinger, Gansterer, Doerner, & Popper, 2021 ), and address vaccine and testing kit allocations in a multi-agent sequential decision problem ( Thul & Powell, 2021 ). Economic impact is included in work that, for example, uses game theoretical mechanisms to study supply chain networks with labor constraints ( Nagurney, 2021 ), studies pricing during disruptions ( Feng, Rong, Shen, & Snyder, 2020 ), finds out how minimize disruptions to customer service level due to infrastructural deficiencies ( Sinha, Kumar, & Chandra, 2021 ), and optimizes the implementation of measures with a reaction-diffusion process to minimize the economic burden ( Rezapour, Baghaian, Naderi, & Sarmiento, 2021 ). The social planner is included in the theoretical work that, for example, looks at communication strategies by governments to induce compliance with measures ( de Véricourt, Gurkan, & Wang, 2021 ), establishes how test accuracy and availability impact demand and thereby the social outcome by a social planner ( Drakopoulos & Randhawa, 2021 ), and reduces social planner's cost through optimal policies of safety stock and capital reserve ( Zhang, Shi, Huang, Hua, & Teunter, 2021 ).
Other contributions include the study by Silal (2021) , who discusses opportunities for OR to contribute to infectious disease management and to improve health outcomes (also see Choi, 2021 ). Tippong, Petrovic, & Akbari (2021) review OR applications in emergency medical response coordination in disaster management. Our contribution to the OR literature is to look at the optimal timing decisions of (non-pharmaceutical) interventions.
Other recent theoretical work on optimal policy within an epidemiological framework uses optimal control theory , sometimes incorporating economic trade-offs (see, e.g., Caulkins et al., 2021;Djidjou-Demasse, Michalakis, Choisy, Sofonea, & Alizon, 2020;Garriga, Manuelli, & Sanghi, 2020;Kantner & Koprucki, 2020;Piguillem & Shi, 2020;Toxvaerd, 2020 ). However, only a handful of papers incorporate optimal timing, including Alvarez, Argente, & Lippi (2020) ; Farboodi et al. (2021) ; Patterson-Lomba (2020) ; Shin (2020) ; Zhang & Enns (2020) , and Kruse & Strack (2020) . Their findings are generally mixed. For instance, Patterson-Lomba finds that the timing strongly depends on the natural reproduction number ( R 0 ) and Kruse and Strack find that the social distancing can be delayed, whereas some others find that it is optimal to intervene when the first cases of infectives have been confirmed. The study by Federico & Ferrari (2020) is closest to our set-up, because they also incorporate uncertainty in an SIR-like model. However, in their model only the infection rate is stochastically evolving over time whereas in our model all transitions are stochastic. In addition, Federico and Ferrari do not consider sunk costs of switching, which leads to measures being continuously adjusted in the optimal policy. Therefore they find that immediate lockdown is not optimal. Kumar, Choi, Wamba, Gupta, & Tan (2021a) show empirically that rigid measures are less effective to combat infections than moderate long-lasting preventive measures in an SEIR model. They argue that harsh measures lead to more economic damage and only delay the peak. We allow the social planner to switch between rigid and moderate measures and find, indeed, that moder-ate measures may be better during the end-phase. However, especially when incorporating healthcare constraints, delaying the peak might be better despite the economic damage. Other relevant and related studies include Jia & Chen (2021) who develop an SEIAR model with human uncertainty factors, Bliman, Duprez, Privat, & Vauchelet (2021) who look at optimal immunity control with the aim to minimize the epidemic final size in an SIR model, and Doyle (2021) who predicts the spread of infections in a SEIRD model with heterogeneous agents.
Although some studies have highlighted that any optimal policy is highly sensitive to parameterizations (e.g., Manski & Molinari, 2021 , also see Avery, Bossert, Clark, Ellison, & Ellison, 2020 for a discussion), we argue that this is only partly true of our main results. When varying the parameters associated with the process, our main conclusions hold irrespective of the specific parameter values. Although our results are qualitatively the same for a wide range of parameter values, we do not want to underestimate the importance of accuracy when it comes to estimation of the infection rate, mortality rate, and immunity.
The remainder of this paper is organized as follows. Section 2 introduces the mathematical framework. Section 3 solves the social planner's problem for our benchmark model, which is analyzed and extended in Section 4 . Section 5 concludes and an online appendix contains the proofs of propositions as well as further robustness checks.

Model description
Our model considers the problem of a social planner in the context of an infectious disease outbreak. The social planner needs to decide when to undertake action and temporarily intervene in an environment where infections of members of the host population result in a (collateral) social cost. Temporal intervention could range from self-isolation to a complete lock-down. In this paper, we consider two models: a simplified benchmark model and a more realistic extended model. This allows us to identify the impact of each model characteristic. For our benchmark model, we assume that the form of intervention is exogenously given and that the costs and benefits (in terms of reduced infection/enhanced recovery) are known. Once temporal intervention has started, the social planner has to decide when to stop it. After that second decision, the social planner has no further decisions to take. In our extended model, we allow the social planner to impose a sequence of lockdowns and could opt to phase in and/or phase out measures.
For each lockdown, the social planner is assumed to hold two nested options, which we value under the assumption that she wants to minimize the expected present value of total social costs related to the virus and the lockdown over the planning horizon, which we assume to be infinite here. These costs are assumed to include (health and non-health) costs related to the number of infected people, the (health and non-health) cost of intervention, and the costs of lost lives and lost QALYs 6 due to the disease. For simplicity, throughout this paper, we refer to intervention as "lockdown", although, of course, interventions could entail different types of non-pharmaceutical measures.
The time parameter is continuous and is measured in days. Let N ∈ N be the size of the host population. Our model describes a continuous-time Markov chain (CTMC) in which at any time t ≥ 0 , the N individuals in our population are split into four groups, as in a typical SIRD model: I t members are infected , R t members are recovered , D t members are deceased, and S t = S(I t , R t , D t ) := N − I t − R t − D t members are susceptible . Initially, all individuals are susceptible, after which they might move to be infected, after which they will either recover or die in finite time (a.s.). 7 Our CTMC has state dependent transition rates, in which the interarrival times of events (i.e., infections, recoveries, and deaths) are exponentially distributed. In our benchmark model we assume that there are only three groups, setting D t = 0 for all t. In that case, R t represents both deceased and recovered individuals. Since for the benchmark case we do not have group "D", the model can be referred to as an SIR-type of model.
When the social planner decides to intervene, a lockdown starts, which is assumed to decrease the infection rate and/or increase the recovery rate. We, therefore, initially, define two policy modes m = 0 , 1 . At time t = 0 , the system is in m = 0 , where no action is undertaken, and the social planner has an option to move to mode m = 1 where a lockdown is in place. In our extended model we assume that there is an intermediate mode where only mild measures are introduced. The social planner, then, has the option to either switch to the intermediate mode first before imposing lockdown but could also choose to skip this mode. Her objective is to minimize expected costs, which consist of sunk costs , associated with switching between modes, and (state-dependent) running costs . We assume that all costs are monetized.
Let N N = { 0 , 1 , . . . , N} . The state space of our CTMC is given by Here, i , r, and d denote the number of infected, recovered, and deceased members of the population, respectively. For (i, r, d) ∈ E, the number of susceptible members is given by That is, the population is closed. 8 We denote by int (E) all points (i, r, d) ∈ E such that i + r + d < N and i, r, d > 0 .
In our CTMC the number of states the Markov chain can transition to is at most three: either a susceptible member becomes infected so that the Markov chain moves from (i, r, d) or an infected member recovers so that the Markov chain moves from (i, r, d) to (i − 1 , r + 1 , d) , or an infected member dies, which implies a move from (i, r, d) to (i − 1 , r, d + 1) . The first can only happen if i + r + d < N and the second and third can only happen if i > 0 . This implies that any state (0 , r, d) such that r + d = N is always an absorbing state. Fig. 1 summarizes all possible transitions in our SIRD-like model. 9 We shall proceed with formulating the full model description for our SIR-like benchmark model for which there are only two 7 Note that recovered individuals do not become susceptible again (in finite time). Therefore, we can refer to the group of individuals that are no longer infected as "removed", in line with the literature. 8 We refer to point 2., on page 11, for a discussion on this assumption. 9 The figure includes all transition rates, which we will introduce later.
modes and for which lockdown can only be imposed once. In our extended, SIRD-like, model, we reformulate where necessary. In addition, for simplicity, we first focus on the simplified case where D t = 0 for all t. In that case we can think of the state space as The interarrival times between state transitions are assumed to be exponentially distributed with mode-and state-dependent infection and recovery rates λ m (i, r) and μ m (i, r) , respectively. In order to stay close to standard SIR models, we assume for each mode respectively, for some β m , γ m ∈ R ++ . Note that the epidemic is over when the number of infectives is equal to 0, resulting in the process being absorbed. For our specification, the set of absorbing states of the chain is, therefore, given by In our extended model we consider the possibility of infections occurring as long as S t > 0 , i.e., even when I t = 0 . This could happen, for instance, when infections can take place from neighboring countries. In that case, the set of absorbing states is redefined as all states such that i = 0 and s = 0 .
Let X t := (I t , R t ) ∈ E represent the state of the system at any time t ≥ 0 , so that X := (X t ) t≥0 = (I t , R t ) t≥0 is our stochastic process. Note that if, at the outset, the process is in a state in , it stays there forever and therefore, lockdown will never be imposed. For that reason, we will only consider scenarios where X 0 / ∈ .
and q m xx = − y = x q m xy . Note that when y − x = (1 , 0) an infection occurs and when y − x = (−1 , 1) an infected member recovers. We then define the (infinitesimal) transition function for m = 0 , 1 as p m Let E be the power set of E. From the Kolmogorov extension theorem it follows that there exists a measurable space ( , F ) and, for each x ∈ E and m ∈ { 0 , 1 } , a unique probability measure P x such that for all 0 ≤ t 1 < . . . < t k , k ∈ N , and all F 1 , F 2 , . . . , F k ∈ E, it holds that With each mode m ∈ { 0 , 1 } , state x ∈ E, and probability measure P m x , we associate the expectation operator E m x . Throughout we will use the natural filtration F := (F X t ) t≥0 . The set of F -stopping times is denoted by M .
Switching from mode m to mode m , m, m ∈ { 0 , 1 } , m = m , is associated with an immediate and sunk cost K mm ≥ 0 , which represent all costs that are incurred once and are associated with entering or exiting lockdown. Examples of costs of entering include a change in bureaucracy, hiring people to work on legislation, local governments' work on policies, shops to make investments to accommodate new rules, home workers needing office chairs at home, etc. Exiting lockdown can be associated with costs around, e.g., hiring new staff, legislative changes, etc. 10 For each mode m = 0 , 1 , we define a bounded function c m : E → R + , which represents the running health and non-health costs related to the number of infected members and, for mode m = 1 , additional economic and indirect health costs (e.g., due to missed diagnoses of other diseases) as a result of lockdown. Letting χ { v } denote the characteristic function that is equal to 1 if v is true and 0 otherwise, we take for some epidemic related per-patient cost function δ E : E → R + , with δ E (0 , r) = 0 , and fixed lockdown cost flow δ L > 0 . Although in our analyses the cost functions are chosen to be linear, our methodology can be applied to, e.g., convex specifications of the cost functions.
The social planner is assumed to discount costs at a constant rate ρ > 0 . The decision problem is modeled as a nested optimal stopping problem: The first, second, and third terms represent the present values of the costs incurred before lockdown ( 0 ≤ t < τ 1 ), during lockdown ( τ 1 ≤ t < τ 2 ), and after lockdown ( t ≥ τ 2 ), respectively.
Finally, we denote the basic reproduction number by R m 0 , for modes m = 0 , 1 . The basic reproduction number represents the average number of infectives emerging from the introduction of a single infectious member into a completely susceptible population. In our model, this is given by

Discussion of modeling assumptions
After having introduced our basic model, we now briefly discuss some of our assumptions before analysing the model.
1. The basic reproduction number, R m 0 , is observed without error. This assumption is reasonable when prevalence of the disease can be accurately observed even at low levels ( Mutesa, Ndishimye, Butera et al., 2021 ). 2. The system is closed, i.e., we assume no births and no non-COVID deaths. Births can be interpreted in two ways: either they could represent migration into the population or they could represent newborns. For the former, the impact of migration can be studied separately, as we do in Section 4.5 . The latter is not expected to drive any of our results and is, therefore, ignored. 3. We restrict the social planner's policy choice to two or three modes. While, theoretically, a social planner could consider an infinite set of modes, many governments have opted for a relatively simple traffic-light system in their responses to the COVID-19 pandemic. Hence, our model is reasonably close to observed practice. In fact, we show in Section 4.4 that when measures are added or taken away from the intermediate mode only the speed with which measures are phased out is impacted, but not the sequencing of the optimal policy. 4. The social planner can only impose new restrictions once mode m = 0 is reached. While this assumptions looks restrictive, we assume throughout that switching from the intermediate mode to mode m = 0 is costless. Therefore, in principle, mode m = 0 can be skipped and stricter measures be introduced at any time. 5. The cost of infection is constant over the population. Throughout we use the weighted average of the cost per patient when in need of (hospital) treatment and the cost when no treatment is necessary. In the latter case, costs could be related to, e.g., self-isolation. Although this assumption might not be as realistic when I t is small, since the population in our model aims to represent a country's population (in our case the UK), small values of I t may still represent a large number of individuals. 11 6. The cost of being in lockdown is constant over time. This may not always be a realistic assumption, especially when measures are in place for a long time. Therefore, we relax this assumption in Appendix C.3 and show that our results do not change qualitatively. 7. Recovered members of the population are assumed to acquire immunity and that infection, recovery, and death rates are known to the social planner. These are standard assumptions in SIR(D) models, which we further discuss in Section 5 where we point to avenues for future research to weaken these assumptions. In our model, the recovery rate incorporates the weighted sum of the average recovery times of patients with and without symptoms, including those receiving hospital treatment (see Section 3.2 ). The incubation time can be incorporated in a similar way.

Properties of the stochastic process
We first look at how X behaves as a function of time in our benchmark model. Therefore let us first study the change in X t = (I t , R t ) . One can easily see that this can be written as for all x ∈ int (E) , m = 0 , 1 , where the last step follows from (1) .
This gives the intuitive result that the number of infectives is expected to go up if, and only if, λ m (I t , R t ) > μ m (I t , R t ) and the num-11 An alternative way of modeling would be to consider δ E as two branches where a fraction of I t receives hospital treatment. Since we already take into account that different members require different treatment (if at all), we do not expect that this more complicated version would lead to qualitatively different results. ber of recovered members is always expected to increase. If control measures in mode 1 lead to a decrease in the transmission rate, i.e., if λ 2 (·) < λ 1 (·) , then there exists a set of states in E for which the number of infections is expected to go down in mode 1, whereas it is expected to go up for the same states in mode 0. For our SIR-like model, the expected change in X t equals so that in expectation we obtain the dynamics of the traditional deterministic SIR model. One of the convenient features of the SIR model is that the state space, E, can equivalently be represented by a simple lattice structure; that is, states can be ordered to form a very simple tree describing all potential paths. Fig. 2 illustrates this lattice structure for the case N = 4 . These lattice structures are useful in determining the value functions in all modes and determining the stopping sets of our optimal stopping problems.
The volatility of the process is given by where denotes the component-wise multiplication of vectors and Note that when the function λ m (·) and/or μ m (·) changes, there is a direct effect on the speeds with which the process moves through the tree. When the volatility increases, costs are incurred over a shorter period of time (in expectation). Therefore, an increase in, e.g., β has two effects. Individuals are infected at a higher rate, but at the same time, the volatility of the process is also increased. The basic reproduction number is given by R m 0 = βm γm , for modes m = 0 , 1 . Therefore one can alternatively write as more commonly done for SIR models.

Model solution
We solve the optimal stopping problem (3) by working backwards. So, we start with the situation where the social planner has come out of lockdown and there are, thus, no options left. We illustrate our findings with our benchmark model. In Section 4 the full model is analyzed. Note that then, the social planner has a new option to go into lockdown.

Value after lockdown
For our benchmark model, after lockdown has been entered and exited, the social planner has no more decisions to take, i.e., we remain in mode m = 0 perpetually. Recall that X t = (I t , R t ) denotes the state of the process at any time t. For a given initial state The following proposition shows that G (x ) can be defined recursively, that is, it can be written as a function of all states the chain can transition to when leaving x .
The expression in (5) consists of two terms. The first term relates to the costs incurred while being in state x . After a transition to some state y = x , the costs after the transition are represented by G (y ) . The present value of these costs after the event of a transition, for all possible subsequent states, is then given by the second term multiplied by q m xy from the Markov generator which relates to the transition probability. Thus, the second term is the expected present value of costs after the next transition.
Proposition 1 provides the solution to the general problem in (4) . Note that since in our formulation, see e.g.
We observe that it is not guaranteed that the recursive formulation has a solution. However, due to the nature of the lattice that corresponds to our problem, a sufficient condition would be that if the process starts in a state that is not in , i.e., the process is not absorbed, then the process will reach a state in a.s. at some future point in time.
Since the chain is absorbing, it is easy to see that G (x ) can be determined for all x by working recursively through the tree. First, (5) can be determined using a lattice method, finding

Value in lockdown
In mode 1, the social planner has lockdown in place. This ends when the social planner decides to switch back to mode m = 0 at sunk cost K 10 . After this switch, the value function is given by G , cf., (4) .
Therefore, in mode 1, the social planner is faced with the optimal stopping problem where G is defined in (4) . The integral captures the present value of costs while being in mode m = 1 , i.e., until lockdown is exited at time τ . The present value of costs incurred after τ are given by the first term.
The optimal stopping time is the first exit time of the set C 1 , where Proposition 2 shows that in order to find the value, we need to distinguish states where it is decided to switch to mode 0 and where it is decided to delay exiting lockdown. Thus, problem (7) has a solution that allows us to split the state space E into a continuation set C 1 ∈ E and a stopping set D 1 = E\ C 1 ∈ E such that the social planner decides to stay in mode 1 for all x ∈ C 1 and to switch to mode 0 for x ∈ D 1 .
The first term on the right-hand side in (8) represents the value if the social planner decides to switch to mode m = 0 immediately, i.e., when x is in the stopping set. The second term is the value when x is in the continuation set so that the decision to come out of lockdown is "delayed": it is the sum of the costs while being in state x and the expected present value of costs after transition.
The stopping set, for mode m = 1 , is then given by Since for all x ∈ it holds that q m xy = 0 for all y ∈ E, F in (8) For further reference, the first exit time of a set C ∈ E is defined as the stopping time Notice that, given the set C 1 ∈ E, it follows that F can written as Table 1 Model parameters and their baseline values. Our choice of N can be interpreted as the size of the grid that we impose on the population. For example, if N = 500 , then each change in, say, the number of infected from i to i + 1 represents an increase in the infected population equal to 0.2% of the total population.  Table 1 summarizes our baseline parameterization. The resulting corresponding basic reproduction numbers are R 0 0 = 3 , R 1 0 = 1 . 5 for modes 0 and 1, respectively. Starting with mode 0, the empirical literature is not unanimous in its estimates for the basic reproduction rate R 0 0 . However, as pointed out by, e.g., Liu, Gayle, Wilder-Smith, & Rocklöv (2020) , the median and average estimates are roughly 3 for empirical studies on mostly China. Toda (2020) estimates R 0 0 to be equal to 2.7 for the UK and 3.7 for the USA. An estimate close to 3 is also in line with empirical studies by, e.g., Roques, Klein, Papaïx, Sar, & Soubeyrand (2020)  For mode 1, we follow, e.g., Garriga et al. (2020) ; Patterson-Lomba (2020) , and Kruse & Strack (2020) , in assuming that R 1 0 is roughly half of R 0 0 (for some empirical studies see, e.g., You et al., 2020 ), while assuming that the rate of recovery is not affected by the lockdown.
We assume that all costs are measured in thousands GBP. For the epidemic related cost parameter δ E we use the study by Bethune & Korinek (2020) and assume that the virus imposes a cost of $50k for each infective, so that, using the exchange rate . 84 for our population of size N, assuming the British population to be 63.7m.
We solve (8) employing a lattice method as described above. When starting with states in , it can be easily determined whether staying in lockdown perpetually is optimal or whether coming out of lockdown is optimal. When assuming D 1 and C 1 to be empty at the outset, states in can either be allocated to D 1 or C 1 . Next, states that transition to states in can be studied: the value as given by (8) can be determined and these states can be assigned to the appropriate set. Recursively, one can continue this process until all states have been treated.
Panel (a) of Fig. 3 illustrates the stopping set for this parameterization. Throughout this paper, we use orange to represent states where it is optimal to come out of (full) lockdown. Generally, the stopping set consists of two types of states: states that are in , i.e. all absorbing states, or are close to , and states (i, r) such that 12 The Times, April 6, 2020, https://www.thetimes.co.uk/edition/business/ consumer-confidence-at-its-lowest-since-the-financial-crisis-p6phf7x3k .  i + r is close to N. In the former case, the process is close to absorption or is absorbed, which means that lockdown-related costs outweigh any benefits of suppressing the epidemic. Therefore, it is optimal to lift lockdown. In the latter case, the number of susceptibles is relatively small. Since staying in mode 1 only keeps the infection rate low, staying in mode 1 is costly and, therefore, it is optimal to switch and lift lockdown. Fig. 3 shows a cross-section of the value function F , as well as the cost function G , for the case where R t = 0 for t = 0 . Other cross-sections of the functions F and G , qualitatively, look the same. Panel (b) illustrates that the value function F (X ) in the continuation set exhibits lower total costs than G (X ) + K 10 , i.e.
if it were to stop. To understand this, consider the total expected accumulated discounted costs if the social planner decides ex-ante to always stay in mode 1. In that case, the lockdown related costs are incurred even when the number of infectives is low, but exhibits lower expected costs per unit of time when the number of infectives is high relative to being in mode 0. The curve F takes advantage of lower expected cost per unit of time when the number of infectives is high, with the prospect of not incurring lockdown related cost when the number of infectives is low. Comparing G (X ) + K 10 to F (X ) illustrates that the option to switch has led to a lower present value of expected total costs in the continuation set. In the stopping set, F (X ) = G (X ) + K 10 . for modes 0 and 1, respectively. The total number of infectives and the resulting (health and economic) related costs are lower in mode 1 but, in addition, lockdown costs are incurred. Irrespectively, there is a steep incline of infectives in mode 0 at t = 0 (a "second wave"), which results in it being optimal for the social planner to delay the moment of exiting lockdown. Therefore the optimal total costs are lower when remaining in mode 1 for a (short) period of time and then switching to mode 0.
Panel (b) depicts the incurred cost per unit of time after lockdown (black) and when starting while in lockdown (dotted), when considering the expected path of (I t , (1 , 0) . Note that the dashed line represents the expected point in time when switching to mode 0 is optimal. For the dotted curve, i.e. when starting in mode 1, notice that at t = 0 the curve exceeds the solid curve representing costs incurred when starting in mode 0. Even though in both cases (I t , R t ) = (1 , 0) , in mode 1 additional lockdown-related costs are incurred which are independent of the number of infectives. Clearly, the costs peak earlier for mode 0, reflected by the solid curve, and at the same time the expected costs for all t < 68 are higher than the costs incurred when in mode 1. This also illustrates that lockdown is lifted only when the number of infectives is well past its peak to prevent a (severe) second wave.

Value before lockdown
In mode 0, before lockdown has started, the social planner is faced with the optimal stopping problem where F is defined in (7) . The integral captures the present value of costs while being in mode m = 0 , i.e., until lockdown is imposed at time τ . The present value of costs incurred after τ are given by the first term.
The following proposition shows that also for the stopping problem in (9) the solution splits the state space into a stopping set and a continuation set.
The optimal stopping time is the first exit time of the set C 0 , where Proof. See Appendix A . Similar to Proposition 2 , Eq. (10) contains two terms: the first term giving the value when going into lockdown immediately and the second term representing the present value of costs when x is in the continuation set so that lockdown is delayed.
The stopping set, for mode m = 0 , is then given by It is easy to see that (10) can be determined using an induction method as described for G and F . Illustration We illustrate Proposition 3 for the parameterization of Section 3.2 . Fig. 5 illustrates the stopping sets for both stopping problems: the blue and orange areas represent all states for which it is optimal to switch from mode 0 to mode 1 and vice versa, respectively. The gray area is the set of all states for which there is a positive probability that the blue area can be reached. So, when the process is in a state in the white area, then it will never be optimal to enter lockdown. Going back to Fig. 4 (b), the difference between the two curves illustrates that instantaneous costs, when not in lockdown, increase more rapidly for small values of t. As such, an expected positive gain from switching to mode 1 vanishes if the social planner waits too long, demonstrating that switching in early stages of the process (when I t and R t are both "small") is optimal and that switching is not optimal in later stages of the process.
The value functions V and F as well as the total costs function G are depicted in Fig. 5 (b). The dash-dotted curve represents the value in mode 0, V , at time t = 0 . Note that the solid curve representing the graph of the function G , i.e., the value in mode 0 after the second switch, also represents the graph of the value function V if it is optimal to stay in mode 0 from time t = 0 onward, i.e., when lockdown is never implemented. 13 We can now observe the following for the two options. The continuous pasting principle is visible for the option to switch to mode 0 from mode 1: the dotted curve and the solid curve have approximately the same ascend around the boundary of the stopping set. For the first switching option, instantaneous switching is optimal when I 0 is small, as Panel (a) illustrated. Therefore, V (I 0 , 0) coincides with F (I 0 , 0) + K 01 for sufficiently small I 0 , the latter represented by the upper dotted curve in Panel (b). Then, instantaneous switching is optimal as long as the value to permanently stay in mode 0 exceeds F + K 01 . Indeed, V coincides with G for high values of I 0 where switching to mode 1 is not optimal.
Nevertheless, for I 0 (very) close to zero, there is a positive value of waiting: it is optimal for the social planner to delay switching until D 0 is reached. For these values there is a positive probability that at the next transition, or in a small number of transitions, the process hits an absorbing state and in that case, the disease will not be able to spread further so that lockdown is redundant. Since lockdown is associated with a fixed switching cost, the expected total cost involved with entering lockdown is not outweighed by the expected gain. This is illustrated by Fig. 6 . Panel (a) shows for the same parameterization as before that, for I 0 < 3 and R 0 = 265 , waiting is optimal. The graphs of the resulting functions V , F , and G are shown in Panel (b).

Optimal policy
In this section we first study the social planner's optimal stopping strategy in our SIR-like base-case model further. Then, we expand our benchmark model in the following ways.
1. We first extend the number of classes individuals can be in. In addition to agents being infected (I), recovered (R), and susceptible (S), we allow for events where individuals are deceased (D). This model is the equivalent of a standard SIRD model. 2. Next, we relax the assumption that lockdown can only be imposed once. 3. Additional modes are introduced, allowing the social planner to phase lockdown in and/or out. 4. The policy regarding open borders is studied. Infections can occur even when I t = 0 . 5. Restrictions imposed by health care capacity are taken into account.
Finally, we execute an extension where we study the stopping sets in a scenario where the lockdown related costs are increasing over time. The analysis is concluded by a short summary of the comparative statics; the full analysis can be found in Appendix C .

SIR
Recall that the transition rates are given by for modes m = 0 , 1 , where R m 0 = βm γm . In this section, we parameterize the dynamics by R m 0 and μ m , for modes m = 0 , 1 . Recall that the process has the following properties For the illustrations we use the same parameterization as before. Fig. 7 depicts the expected evolution of the number of infected, susceptible, and removed individuals as a function of time t. For panels (a) and (b), the brown region represents the total number of active cases of infectives at each moment in time. The blue area depicts the number of susceptible individuals and the red area, then, are all remaining individuals that are labeled as removed/recovered. At time t = 0 , 490 individuals are susceptible and at time t = 80 , in mode 0, a very large share of the host population has been infected and subsequently recovered. This is different for mode 1 where a significant share of the host population has not been infected. This would apply if at time t = 0 , the social planner had decided to immediately switch to mode 1. Notice that the curve of infectives in mode 1 is flatter than the curve in mode 0. Panel (c) juxtaposes the expected number of infectives at each point in time for modes 0 and 1. Fig. 8 illustrates the stopping sets. In both panels the blue area is the set of all states where it is optimal to switch from mode 0 to mode 1. The orange area represents all states where it is optimal to switch back to mode 0 from mode 1. The panels also feature stream plots for the system This system represents the expected evolution of I t and R t for modes m = 0 , 1 for any point in the grid that represents the state at time t = 0 . The black affine curve represents all points (i, r) such that the expected change in the number of infectives is 0 in the next instance of time, i.e., ˙ I = 0 . Panels (a) and (b) show the stream plot for modes 0 and 1, respectively. The plot shows that, in expectation, the maximum number of infectives at any time t is lower for mode 1, i.e. sup t E m =1 x I t . From the blue area in the figure, two implications for the optimal decision to impose lockdown jump out. First, in mode 0, it is optimal to undertake action as soon as the first infectives have been confirmed. The stopping set in mode 0, the blue area, covers the combinations of infectives and recovered individuals close to the origin of the plot, which indicates that switching is optimal during the early stages of a potential epidemic. Second, if the social planner waits too long or when the disease is discovered late, the process may start outside the blue and gray areas, implying that it will a.s. never be optimal to switch to mode 1. This could happen, for instance, due to a lack of testing in the early stages. If the epidemic gets discovered when the state is in the gray area, then it might be optimal to impose a lockdown in the future, but not immediately. So, the social planner should act swiftly if the disease is detected early on. If the process leaves the blue area, before switching has taken place, the expected gain from switching is not enough to compensate for the total expected lockdown costs and the sunk cost of entering lockdown.

SIRD
In our SIR-like model it is assumed that all infected individuals will recover. In order to study the impact of infection fatality rates on the optimal switching times, the model can be extended, as described in Section 2 , to include an additional class of individuals: deceased (D). For our continuous-time Markov chain, X t = (I t , R t , D t ) represents the state of the system at each time where D t is the number of deceased individuals. Denote by θ m (I t , R t , D t ) = φm 1 −φm γ m I t the mortality rate, where φ m denotes the death rate, i.e., the fraction of member that dies due to an infection (also see, e.g., Bastos & Cajueiro, 2020 ). Then, the Markov generator is defined as We will use an estimate of 2% for the death rate. 14 The estimated average number of years lost per deceased individual is estimated to be 16 ( Pifarré i Arolas et al., 2021 ) and for the qualityadjusted life-year (QALY), we assume that the cost per QALY is equal to £6.8k. 15 Thus, per time unit the cost per individual in D is δ D = (16)(6 . 8) ρ in each mode. Note that this model can be solved 14 We base ourselves on the data presented on https://coronavirus.data.gov.uk/ where one can check that the death rate is highly volatile and varies between roughly 0.4% and 3.5%. 15 To estimate the cost per QALY, we refer to a report presented by the UK's Department of Health and Social Care (DHSC) and Office for National Statistics (ONS), published on 9 September 2021, "Direct and Indirect health impacts of COVID-19 in England". The cost per QALY differs per region and we have chosen to use Yorkshire's estimate, which is below London's but above estimates for some other regions (e.g., North East). We will argue that a deviation from this estimate does not qualitatively change our results. recursively as well, using the same lattice method as described in Section 3.2 .
Results What we find is that if in our SIR-like model a state (i, r) is in a stopping set, then any state (i, r , d ) ∈ E such that r + d = r is in the equivalent stopping set for the SIRD model. Intuitively, since the cost associated with any deceased individual is sunk, further actions taken by the social planner cannot change outcomes that happened in the past. Rather, the social planner will need to take into account the cost of future deaths. Note that this result does not depend on our parameterization. In the SIRD-like model the total costs are higher compared to the SIR-like model because, in addition to the lockdown related costs and the epidemic related per-patient costs, we add that, on average 2% of the transition out of I are associated with a positive shock of (16)(6.8)k to the present value of total costs. Since the costs are higher, we find that the blue area has slightly increased and the orange area has slightly shrunk, as illustrated by Fig. 9 a. Fig. 9 b then shows the impact of the mortality rate. Setting the rate as high as 5% allows us to more clearly observe the impact. In line with what we saw for Panel (a), a higher mortality rate leads to an expansion of the blue set, especially for states where R t + D t is small. In addition, we observe a slight change in the orange area in a way that the social planner delays coming out of lockdown, predominantly for states where R t + D t is small.
An alternative way to model the cost associated with D is by incurring a one-off cost at each transition, i.e., when an individual moves from I to D, a cost of (16)(6.8)k is incurred. This way of modeling would result in the same (qualitative) result.
For what follows, we continue by illustrating the stopping sets in a way as done in Fig. 9 , i.e., by adding up all individuals in R and D.
In a similar way other models can also be studied using our set-up, by extending the number of classes of individuals. This includes for instance, the SEIR model (see, e.g., Atkeson, 2020;Wang et al., 2020 ) where susceptible individuals become exposed before infected.

A sequence of lockdowns
Since costs associated with lockdown can be high, it might be optimal for the social planner to impose a sequence of shorter lockdowns, rather than imposing one long one. This means the social planner has the option to alternate between modes. In terms of our value function, for any lockdown k = 1 , . . . , K, 16 we can denote V k , F k , and G k as the value functions equivalent to (9), (7) , and (4) , respectively. Then, G k coincides with V k +1 and thus has embedded the option to go back into lockdown. This applies to any k < K, i.e., for k = K, G k is the value as defined in (4) . We assume that the switching costs associated with the first time lockdown is imposed is not smaller than the costs assumed for the benchmark model. These costs include, e.g., investments undertaken by companies to provide a safe environment such as the provision of sneeze guards or, e.g., the purchase of equipment and software to be able to work from home. However, for subsequent lockdowns these investment do not need to be undertaken again and as such the switching costs for later lockdowns are assumed to be a fraction of the K 01 .
Results We start by assuming that the switching costs are equal 1% of K 01 . Fig. 10 shows the stopping set for our parameterization. Panel (a) shows that the stopping set for the first lockdown is not qualitatively different from what we have found before. The result that when the epidemic is discovered too late (i.e., when the number of infected individuals is sufficiently high), intervention is no longer optimal is also present here. Since the first lockdown is concerned with substantial switching costs, the intuition as found for the SIR-like model applies here as well: the time period during which there are gains from interventions is too short to weigh against the costs. In addition, we find again that it is optimal to go into lockdown immediately when the first infections occur. Panels (b) and (c) illustrate that, when switching costs are very low, subsequent lockdowns are always immediately imposed unless we have reached a state that lies in the stopping set for coming out of lockdown. In other words, lockdown ends when the number of infections or the number of susceptible individuals is low, as before, but is imposed again when infections are rising. Surprisingly, the orange area in Panel (a) is not substantially larger than the orange area in the SIR model. This means that, despite our initial intuition, it is not optimal to come out of lockdown temporarily: the increase in the infection rate and thereby the increase in infections is not outweighed by a recess of control measures. The only noticeable difference is for states where I t is close to 0. Then, for R t + D t > 290 we see that the orange area is somewhat thicker which indicates that the option for subsequent lockdowns 16 For illustration purposes we assume that the social planner can only impose K ≥ 2 lockdowns. However, this may not be an unrealistic assumption when people are more reluctant to obey restrictions as more lockdowns are imposed. makes that the social planner comes out of lockdown sooner. Intuitively, these are the states where the increase in infection rate is least "harmful". The same applies to subsequent lockdowns. Fig. 11 illustrates the same figures but when switching costs for later lockdowns are 50% of K 01 and confirms what we found before. Although the blue area has shrunk for lockdowns k ≥ 2 , the social planner does not change its strategy for the first lockdown.
In what follows, we analyze the stopping sets for the first lockdown as long as subsequent lockdowns are not qualitatively different from what we have discussed above.

Phased lockdown
So far, we have only considered two states: state m = 0 where no intervention takes place and state m = 1 where a lockdown is imposed. Our set-up allows for extensions where more modes can be considered. Thereto an intermediate mode is introduced. In this mode only mild measures are in place, such as social distancing. These come with a running cost that is lower than those of a full lockdown and higher than in mode m = 0 . Costs for the intermediate mode can be associated with, e.g., shops and restaurants allowing a lower number of consumers and investments made to be able to adhere to these measures.
We redefine and relabel the modes in the following way. In mode m = 0 there is no intervention, in mode m = 1 mild measures are in place, and mode m = 2 represents a full lockdown. In other words, what was previously called mode 1 is now mode 2.
Correspondingly, we can distinguish lockdown related costs δ m L for modes m = 1 and m = 2 . Our parameterization for illustration purposes for modes m = 0 and m = 2 is the same as before, i.e. R 0 0 = 3 , R 2 0 = 1 . 5 , and γ 0 = γ 2 = 0 . 1 . For mode 1, we consider two cases: L . Finally, switching costs are K 01 = K 02 = 20 0 0 and K 12 = K 21 = K 10 = K 20 = 0 . Assuming K 12 = 0 allows us to consider scenarios where the social planner decides to never stay in mode 2 with a positive period of time and to therefore decide to switch to mode m = 0 from m = 1 , thereby essentially skipping mode m = 2 . Later we will study what happens if K 12 > 0 . We will again use blue to denote states where it is optimal to leave mode 0 and, for consistency, we use orange for states where it is optimal to leave the mode with the highest R 0 , in this case mode 2. We introduce brown as the colour related to coming out of our, new, intermediate mode. Fig. 12 illustrates the stopping sets for R 1 0 = 2 and δ 1 L = 1 2 δ 2 L . Panel (a) illustrates the sets for the scenario with only 2 modes as in the previous section. Panel (b) illustrates the stopping sets for the scenario with 3 modes. The brown area contains all states of the stopping set where it is optimal to switch from the intermediate mode to mode 0, and the orange area contains states that are part of the stopping set for switching from a full lockdown to social distancing. Although not visible, states of the brown set are also states of the stopping set for switching from full lockdown to social distancing: if (I 0 , R 0 ) is in the brown area, then it is optimal to switch from mode 2 immediately to mode 0. The orange set in Panel (a) contains fewer states than the orange (plus brown) region in Panel (b), but more than the brown region itself.
The blue area consists of all points where it is optimal to switch from mode 0 to mode 1. Panel (c) distinguishes the sets for switching to mode 1 (blue) from mode 0 and to mode 2 (blue and light blue) from mode 1. This panel illustrates that all states for which it is optimal to introduce social distancing form a subset of area where switching to a full lockdown is optimal. Therefore, the only optimal policy is to implement a full lockdown. Numerical analysis shows that this result holds for a wide range of a parameter values. Further, notice that there are two light blue areas in Panel (c). The set close to the line i + r = N denotes all points such that, when currently in mode 1, it is optimal to switch to mode 0 without spending a positive amount of time in mode 2, since these states are also part of the orange and brown area. Fig. 13 illustrates what happens when changing δ 1 L and R 1 0 . In all panels, the blue area is not qualitatively different, which does not come as a surprise since no time is spent in mode 1 after leaving mode 0. Panels (a) and (b) show the stopping sets for R 1 0 = 2 with δ 1 L , respectively. When staying in mode 1 is more expensive, it is no longer optimal to switch to mode 1 before switching to mode 0, when currently in mode 2. The opposite applies when the lockdown related cost is smaller so that the region where the social planner switches from mode 1 to mode 0 shrinks, and the region where the social planner switches from mode 2 to mode 1 expands. For the cases where R 1 0 = 2 . 25 , it is also less attractive to spend a positive amount of time in mode 1, as illustrated by Panel (c) and Panel (d).
Finally, Fig. 14 illustrates the effect of K 12 , assuming that the social planner always switches to mode 2 before lifting control mea-sures. As switching to mode 2 becomes more expensive, the set of states where it is optimal to switch shrinks, in line with the intuition for our main model. However, the light blue area has not shrunk considerably, which means that once switched to mode 1, from mode 0, it remains optimal to immediately switch to mode 2 for almost all states.

Open border policies
This section studies the effect of a strict border policy on lockdown timing decisions. Thus far it has been assumed that no new infections can occur when states are reached where I t = 0 . We continue to make this assumption when in mode m = 2 , but not when in mode m = 0 . For mode m = 1 the social planner can choose whether or not to restrict entry at the border. Note that no infections occurring when I t = 0 is not only equivalent to a closed border policy but could also include, e.g., the requirement of individuals entering the country to quarantine in a hotel. In mode m = 1 , the social planner could opt to ask individuals to self-isolate so that the infection rate is somewhere between the infections rates of modes 0 and 2. We model this extension using the infection rate where ˜ i 2 = 0 and 0 ≤˜ i 1 ≤˜ i 0 are constants. 17 Results Fig. 15 illustrates how the stopping sets have changed when ˜ i 1 = ˜ i 2 = 0 . Panels (a) and (b) illustrate the case where ˜ i 0 = 0 . 02 , while Panels (c) and (d) depict the case ˜ i 0 = 0 . 04 . We first note that the blue area for the first lockdown has significantly shrunk, relative to a situation where ˜ i 0 = 0 . Our numerical exploration reveals that for any lockdown to ever be potentially optimal, ˜ i 0 should not be larger than 0.05, i.e., for any larger risks it is never optimal to impose restrictions. Intuitively, any increase in the infection rate leads to an increase in costs: it takes longer, on average, to bring the number of infectious individuals down and, in addition, more infections occur. This leads to our conclusion that because open borders heavily reduce the effectiveness from lockdown, any gains from lockdown cannot weigh against the (sunk) cost involved with imposing restrictions.
The blue areas in Panels (b) and (d) are very similar, i.e., the main impact of a change in the infection rate from abroad has an impact on strategy for the first lockdown only. Thirdly, the orange area has slightly expanded for the states where I t is close to zero. Recall that the orange area is the set of states for which it is optimal to leave the mode with the highest infection rate, in this case mode m = 2 . Because the increase in the infection rate has the largest impact on these states in particular, the net gain from staying in mode m = 2 is smaller when ˜ i 0 > 0 and the social planner prefers to switch to mode m = 1 . Note that the brown area, the area where it is optimal to come out m = 1 , has barely expanded so that it remains optimal to keep measures in place.
It is essential to notice, though, that although there exists a small brown region near the origin, it is unlikely that these states are reached. Because I t + R t + D t cannot decrease over time, 18 whenever the blue area is reached, for the process to reach this brown region subsequently we would require: a much higher than expected number of infections before the blue region is reached and a much higher than expected number of recoveries after lockdown is imposed while in the mean time close to nil infections occur.
Fourthly, we find that there are states where it is optimal, once switched to mode 1, to never switch to mode 2. These are states where the blue area overlaps with the orange area. In these states, the social planner switched immediately from mode 0 to mode 1, 17 In this extension, in order to be able to focus on the impact of ˜ i m , additional effects that stem from costs associated with restrictions at the border are not taken into account. 18 Note that this effectively means that, when the process is in a given state, the process can only either move to the east (when a new infection occurs) or northwest (when a recovery or death occurs).
in order to switch back to mode 0 in the future. For these states the epidemic has evolved too much for a full lockdown to be beneficial.
Finally, Fig. 16 illustrates that a positive value for ˜ i 1 does not qualitatively change the social planner's decision and that an increase has no significant impact on whether or not lockdown is optimal for positive ˜ i 0 . This is not surprising in light of earlier figures where we showed that "phasing in" lockdowns by entering mode 1 before moving to mode 2 are not optimal.

Health-care imposed capacity constraints
Next we study the effect of constraints on healthcare capacity, by generalizing δ E in the following way: Here, 150 is the assumed threshold that represents a limitation of the health care system. It represents the fact that (health-related) costs go up once the health system has reached capacity. We need to keep in mind that only a fraction of infectious individuals need hospital treatment and therefore the number 150 should be interpreted as the equivalent number of total infections such that the fraction of infectious individuals that requires hospital treatment is equal to the imposed "critical care capacity". 19 Fig. 17 shows the results of adapting the baseline parameterization. This figure illustrates that the stopping set in mode 0 now includes states where either the number of infectives is above 150 or where the risk is sufficiently large to reach states where the number of infectives is above 150. As illustrated in Panel (a), states in the extended region with I t < 150 are those where it is expected that the process will reach states where I t ≥ 150 when remaining in mode 0. This explains the shape of the border of the stopping set where there seems to be an "inaction" region. However, note that when taking into account the stream plots in Fig. 8 , we can see that the expected path, when starting around (0,0), does not go through any of the 'newly added' states. Thus, the extended area includes states where the decision maker wishes to bring down the number of infections rapidly when, against the odds, a spike of infections has occurred.
Also notice that the orange region has shrunk, especially for all states where I t > 150 : the social planner delays the moment to switch back to mode 0, so that the expected amount of time spent in states for which I t > 150 is minimized.
An alternative way to model hospital constraints is as follows. Governments may wish to avoid to have a number of infectious individuals that is near to the constraint. Therefore, δ E could be mod- There is a limit on hospitals admissions that leads to an equivalent of a limit of 150 on infections (see (13) ). Brown: switching from mode 1 to mode 0, orange: from mode 2 to mode 1, blue: from mode 0 to 2. eled as an increasing function of I t when I t gets close to the constraint imposed by the critical care capacity. Appendix B.1 shows that, in that case, the figures look qualitatively the same.

Extension: time-dependent cost structures
In this section, we relax our assumption that lockdown related costs are constant over time. One could argue that the length of lockdown has an impact on the associated lockdown related costs and the willingness of individuals to comply. Especially for business, a lockdown lasting for a longer period of time will have a long lasting impact and comes with more economic damage per unit of time than a lockdown imposed for a shorter period of time. Here, we look at our SIRD-like model as presented in Section 4.2 with ˜ i 0 > 0 , but without health-care constraints. We consider the following cost structures, In this section we analyse the second case with a linear relationship; the convex and concave cases can be found in Appendix B.2 , where we conclude that the results are qualitatively the same. We start by assuming that the infection rate ˜ i 0 is low, to clearly identify the direct impact of the cost structure. We assume that ˜ i 0 = 0 . 002 and ˜ i 1 = 0 . To see the impact of an increase in costs, consider that when lockdown lasts, e.g., 80 days, then 0 . 01(80) = 0 . 8 , so that the daily costs increase by £15,072 for the linear specification. This is equivalent to, by the end of lockdown, £2b per day for a population of 67.22m in the UK. Fig. 18 illustrates the stopping sets for the linear case. The orange area in Fig. 18 a illustrates the stopping set for coming out of lockdown when t = τ C 0 , i.e., the moment when the social planner decides to impose restrictions. Fig. 18 b and c illustrate that the orange area grows larger as time goes by, i.e., the social planner comes out of lockdown earlier when costs are higher. However, despite the daily equivalent increase of £760m for (b) and £1.5b for (c), the increase does not significantly change the shape of the orange area. The blue area corresponds, as before, to the decision of going into lockdown, which does not depend on time and it therefore remains constant. Interestingly, there has appeared an "inaction" for states where both I t and R t + D t are small but positive. For these states going into lockdown might mean that lockdown lasts too long meaning that the social planner incurs large costs, instead the social planner waits until I t has sufficiently grown to limit the length of lockdown. However, when I t is very close to 0, it is optimal to keep I t low and impose lockdown. Since for these states, as the stream plots in Fig. 8 in Section 4.1 illustrated, lockdown may be in place for a short period of time, the increase in costs per unit of time does not impact the inclusion of these states in the stopping set. Hence, we see that the social planner comes out of lockdown slightly earlier and may delay lockdown only when it is discovered too late. Fig. 19 illustrates what happens when ˜ i 0 is increased. Then the inaction region grows but the impact on the orange sets is negligible.

Comparative statics
Appendix C covers an extensive comparative statics analysis for our parameters. Here we briefly summarize the results. First we look at an increase in the infection rate, which may result from, e.g., a mutation, and find that the blue area shrinks. This happens, because when the virus is more infectious, the expected interarrival times are shorter thereby reducing the length of a potential lockdown but also making lockdown less effective. The social planner therefore also comes out of lockdown earlier. We find the (intuitively clear) opposite effect when measures are more effective so that R 2 0 < 1 .
Next we consider how the recovery rate γ m impacts the social planner's strategy. There are mixed effects on the stopping sets when increasing γ m , where one effect generally dominates: the blue area shrinks when the recovery rate goes up. In addition, the social planner expands the orange area.
Finally, Appendix C.3 illustrates that increasing δ E , δ L , or K 01 has a similar impact: the blue area shrinks. The orange area expands for the first two cases. In all cases we find that the figures look qualitatively the same and we therefore conclude that our results are robust against (moderate) changes in parameter values.

Concluding remarks and future research
Motivated by the recent COVID-19 outbreak, we have developed a continuous-time Markov chain to study the optimal timing of interventions in an SIRD-inspired epidemiological model of the evolution of a disease. A social planner has to decide when to enter "lockdown" and, subsequently, when to lift it. Although, traditionally, epidemiology models are assuming a deterministic evolution of the disease, there is still uncertainty about how the disease spreads, especially when the number of infected individuals is low. In such a scenario the disease could die out before the disease spreads. Nonetheless, surprisingly, we find that it is optimal to enter lockdown in the very early stages of the disease. Moreover, it is found that it is never optimal to introduce a lockdown when the prevalence of the disease is too high, i.e. a lockdown is optimally started only at low prevalence of the disease. In addition, despite high economic cost, it is optimal for the social planner to wait with exiting the lockdown until either the fraction of susceptible members of the population or infected members of the population is close to zero. This holds, even when the lockdown-related costs are increasing over time.
If there is a capacity constraint on the health system, which is modeled by a jump in the per-infective cost once the number of infectives exceeds a given threshold, then the region where never entering lockdown is optimal shrinks. In addition, lockdown should be kept in place for longer.
When we allow for a phased introduction and exiting of lockdown (e.g., when using a 3-tier traffic-light system), it is found that, while a phased exiting strategy is optimal, a phased introduction is not. Our analysis also shows that lockdowns are only optimal when entry into the country is restricted.
Our model has several limitations that present avenues for future research. First, we assume that the number of infected individuals is known. In reality this only holds by approximation, even when testing is done frequently and on a large scale. This provides scope for research on models where decisions and policies are made based on imperfect information. Since we find that immediate action is optimal when the first cases of infected individuals have been detected, we expect that such an extension does not lead to qualitatively different optimal policies. Nevertheless, it stresses the importance of testing, especially since we find that if the epidemic is detected too late, then intervention is no longer optimal.
Other potential extensions of our work include considering a heterogeneous host population, by extending the SIRD framework to include more classes of individuals (as discussed to some extent in Section 4.2 ). An interesting, but technically substantially more demanding, extension for future research would be to allow recovered agents to become susceptible again. Although herd immunity might be more cost effective than an infinite loop of lockdown measures, such a model would include the expectation that over time the virus becomes less dangerous and as such we expect that the decision to come out of lockdown is not qualitatively different. In addition, the result that lockdown is not optimal when the virus is discovered too late, might still hold. Having said that, it seems that many governments, at least in many EU countries, currently seem to be willing to impose restrictions every winter for the foreseeable future, which is in line with how we have modeled the problem of the social planner. 20

Appendix A. Proofs
Proof of Proposition 1 . For any measurable function f : E → R , x ∈ E and m = 0 , 1 it holds that, Here we have used the fact that q m xx = − y = x q m xy . Hence, it follows that the Bellman equation can be written as Rewriting leads to (5) .
Proof of Proposition 2 . We first assume that ϕ solves (7) . Let τ C denote the first exit time of any set C ∈ E. Note that (8) can be written as it follows that for all x ∈ E. Then, Fix T > 0 . From Dynkin's formula it follows that 20 A model inspired by SIS frameworks where transitions are modeled through a CTMC as well, similar to our work, is studied by Huberts & Thijssen (2021) who consider the investment problem of a monopolist in a market with network externalities. The underlying state process becomes mean reverting, which is in line with our intuition for the extension that is discussed here.
Now take any stopping time τ ∈ M and fix T ≥ 0 . Then Since T was chosen arbitrarily, the inequality also holds as T → ∞ .
Therefore, ϕ = F and τ C 1 is the optimal stopping time.
Existence and uniqueness follow from the fact that F is the solution to the fixed point problem Note that the function F can be thought of as a vector in R n , where n = | E| . It is easy to check that satisfies Blackwell's conditions, so that is a contraction mapping ( Stokey & Lucas, 1989 , Theorem 3.3). Existence and uniqueness of F then follow from the Banach fixed point theorem ( Stokey & Lucas, 1989 , Theorem 3.2).
As a final note, since for all x ∈ it holds that q m xy = 0 for all y ∈ E, ϕ(x ) is determined unambiguously. For x ∈ E\ , ϕ(x ) is recursively determined as a function of the states the chain can transition to when leaving x . Hence, by induction, we establish that ϕ is unambiguously determined in (8) for all x ∈ E. Proof of Proposition 3 . The proof is analogous to the proof of Proposition 2 .

Appendix B. Further analysis
In this Appendix, we provide some additional figures for Sections 4.6 and 4.7 .

B1. Critical care capacity constraints
Here we investigate how the stopping sets are impacted when the epidemic related costs are increasing in I t when I t gets close to the threshold of 150, rather than the cliff-edge from Section 4.6 , i.e., when if I t < 100 , Fig. 20 illustrates the corresponding stopping sets and we find that these look qualitatively the same. The most noticeable difference is for states such that I t ∈ (100 , 150) since for these states the costs are higher compared to (13) .  (14) . Brown: switching from mode 1 to mode 0, orange: from mode 2 to mode 1, blue: from mode 0 to 2.

B2. Time-dependent cost structures
Figs. 21 and 22 illustrate the stopping sets equivalent to those in Figs. 18 and 19 . These figures illustrate that when lockdown costs are convex in t or concave in t, the resulting stopping sets are qualitatively the same.

C1. Infection rate
Here we investigate how the optimal policy is affected by the discovery of a more infectious mutant virus, as well as the effect of a lower infection rate due to, e.g., effective measures.
A higher infection rate β m , m = 0 , 1 , 2 , has two effects on the social planner's problem. First, infections occur more often, relative to a case with a low infection rate. This means that states with a high number of infectives are reached with a higher probability, leading to a bigger need for lockdown. Second, a higher infection rate reduces the expected interarrival times, so that the process moves through more states during the same time interval. Especially for the part of the state space with a high number of infectives, the cost of staying in lockdown may no longer outweigh the benefits of staying in lockdown. It becomes more interesting for the social planner to phase out restrictions for a larger set of states, with the potential consequence that less time is spent in lockdown. It is therefore, a priori , not clear what the net result will be on the (expected) switching times.
To illustrate these conflicting incentives, we compare our baseline parameterization to a case where R 2 0 = 5 . In line with the literature we again assume that R 0 0 = R 2 0 / 2 and we set R 1 0 = 3 . 5 . Although we appreciate that R 1 0 = 5 is high, this choice allows for the net effects on the stopping sets to be more distinguishable. Fig. 23 illustrates the stopping sets for this parameterization. For the first lockdown, the blue region has shrunk indicating that only when still very few people have been infected switching to mode m = 2 is optimal. Discovering the epidemic too late and thereby never imposing lockdown, is more likely when the so- cial planner is not proactive enough in detecting the virus. At the same time, the orange and brown areas have expanded. Although not visible, the brown area now entirely intersects with the orange area, which means that phasing out lockdown is no longer optimal. In short, lockdown is only optimal very early on in the epidemic and one comes out of lockdown entirely. For subsequent lockdowns, we also find that phasing out lockdown is not optimal. Note that in the blue area an "inaction" region appears again as we found before in Section 4.7 . The intuition is the same as before: for states where I t is close to zero, a lock-down of a short length can be imposed. However, when I t is sufficiently positive and R t is small, a small delay is optimal. This, thus, occurs even when costs are not increasing over time.
These illustrations confirm that, especially when considering situations where a state with a high number of infectives is reached, there may be a higher probability of lockdown duration being shorter in a scenario with a higher infection rate. The fact that this probability can increase underlines the importance of incorporating the stochastic nature of the development of the disease, i.e., effects as these cannot be observed in deterministic spec- ifications. The most interesting result is the shrinkage of the blue area, so that it becomes even more important to discover the epidemic early. Fig. 24 illustrates the opposite case. Effective policies during the epidemic, such as social distancing across all states, may lead to a situation where the R 0 is effectively below 1. Therefore, we now look at a case where R m =2 0 = 0 . 9 . The blue area has now expanded and the orange area has shrunk, which is in line with the intuition above. Also, in the blue area for the first lockdown the inaction region appears again.

C2. Recovery rate
In this section we provide an illustration of the effect of the recovery rate γ m . Since (E m x d I t ) 2 = 0 for all m = 0 , 1 , we obtain This means that increasing γ m while keeping R m 0 fixed leads the process to move quicker.
For our illustration we multiply all γ m by a factor of 3 and reduce the sunk cost of entering lockdown to K 01 = 1500 . The latter change is needed because otherwise the stopping set would (almost) disappear and it would never be optimal to enter lockdown. To further illustrate the analysis in the main text, Fig. 25 reproduces Fig. 7 with the increased transition rates and illustrates that the shapes of the curves have remained qualitatively the same, but the time scale has changed. Also note that the share of host population that remains susceptible throughout has not (significantly) changed. This confirms that, at least in expectation, the process moves faster for higher γ m without significantly changing the expost numbers of infectives and susceptibles.
An increase in γ m has three consequences. First, an increase in the infection rate increases, in expectation, the number of infectives per period of time. Second, for any future point in time the standard deviation of susceptibles, infectives, and recovered goes up. These two reasons make the stopping set expand. On the other hand, the period over which costs are incurred is shorter, which should make switching to mode 1 less attractive as the investment required to realize a lockdown and consequently the economic damage might not be outweighed by the gain from "flattening the curve". Flattening the curve has two cost advantages. On the one hand the peak of infectives is reached later, which leads to a lower present value in absolute terms. At the same time, the number of total infectives is (in expectation) lower as well, which directly reduces cost. Fig. 26 shows that both effects are present but that they affect different parts of the stopping sets. First, the stopping set for mode 0 has considerably shrunk. The period over which costs are incurred is shorter, so that lockdown is optimal for a smaller range of states.
At the same time, the stopping set in mode 1 has expanded. Despite the increase in volatility, the speed with which the number of infectives goes to zero is sufficient to make it optimal to lift lockdown in a larger set of states. In addition, the period over which costs are incurred is shorter which makes switching more attractive.

C3. Lockdown-related costs
We now illustrate the sensitivity of the stopping sets to changes in parameter values associated with incurred costs, starting with the sunk cost of entering lockdown, K 01 , which is illustrated by Fig. 27 (a). Entering lockdown is optimal when the gain from flattening the curve outweighs the additional costs involved with entering lockdown. When K 01 increases, the benefits might no longer outweigh the costs for certain states and, therefore, the stopping set in mode 0 shrinks. Since K 01 has no effect on the value in mode 1, the stopping set in mode 1, i.e., the orange area, is not affected. Fig. 27 (b) illustrates what happens when δ L is increased. Since these costs are only incurred while being in mode 1, there are fewer cases where switching to mode 1 is optimal and more cases where switching from mode 1 back to mode 0 is optimal. Fig. 28 illustrates the effect of changes in (epidemic related) per-patient cost δ E (i, r) , incurred in all modes. Here, δ E is one-third lower than in the baseline parameterization. When the value of δ E is lower for all states, the gain of lockdown is lower and, therefore, there is less of a necessity for the social planner to switch to mode 1. This leads to the stopping set in mode 0 shrinking as illustrated in Panel (a). When δ E increases, the opposite applies. Panel (b) illustrates what happens to the stopping set if a lower cost per infective is combined with a jump at I t = 150 similar to (13) .