Economically optimal strategies for medium-term recovery after a major nuclear reactor accident. 63-76.

The dynamic process of ground contamination after a major nuclear accident is modelled, and the system is then extended to include the transient equations describing the three broad countermeasures: food bans, remediation and population movement (relocation and repopulation). Countermeasures are assumed to be applicable once the deposition period has ended and surface contamination measurements have stabilised. A value function is constructed to account for the major economic factors, including allowance for the detri-mental effect on human health of radiation exposure. The principle of optimality is then applied by requiring the value function to satisfy the Hamilton–Jacobi–Bellman partial differential equation, yielding an economically optimal combination of the countermeasures at any given moment of time within the recovery period. A classiﬁcation into Broad Strategies is made in order to explore the similarities in structure of optimal strategies for wide ranges of economic parameter values. Population relocation forms no part of any optimal strategy in the Base Case (or Case I) as parameters are varied over a wide range. Strategies incorporat-ing relocation have a low probability of being optimal even in the low-probability sensitivity studies of Case II, where relocation is imposed immediately the accident happens, and Case III, where the


Introduction
Probabilistic safety assessments (PSAs) have been developed extensively over the last 40 years to estimate the risks associated with a nuclear facility. These assessments are typically split into three "levels", with Level-1 PSAs aiming first to determine the various fault modes that can occur within a reactor and then to assign a probability to each of these events happening. Level-2 PSAs build on the results of those at Level-1 to estimate the quantities of radioactive materials (the "source term") released from the site under the various accident scenarios. A Level-3 PSA calculates off-site consequences using the results produced by a Level-2 PSA. the UK. After first specifying the source term and two countermeasure regimes and then applying historic probability distributions for meteorological conditions, it was possible to generate probability distributions for consequences. COCO-2's detailed data on the economy of southern England is able to convert these into corresponding distributions for economic cost, allowing representative values such as mean and median to be found.
A feature of such models is that they require the user to select one or a small number of particular strategies for which they can perform the relevant cost calculations, in considerable detail in the case of PACE/COCO-2. Finding the best strategy would require the programme suite to be run to evaluate each possible strategy. Since each strategy will involve several components, a great number of computer runs would be needed to give satisfactory coverage of the decision space, after which the strategy that was in some sense optimal might be selected.
The approach taken in this study is different and complementary.
This paper develops a tool for managing the aftermath of a major nuclear accident that is presumed to have occurred by dividing the potential countermeasures available to the authorities into just 3 broad categories: food bans, remediation and population movement, with the last covering both relocation and repopulation. By allowing for the dynamics of not only the radioactivity deposition and decay but also of each of the countermeasures, the application of optimal control techniques permits the determination of which strategy, out of the huge range of possible strategies, will be economically optimal. The strategy will thus involve not only the extent to which countermeasures should be pursued but also specify when each should be initiated or terminated.
The model is developed for a single location, which might be a town, a village or a specified area, and it is assumed that no countermeasure will be applied until the end of the deposition period. At this point it is assumed that the decision maker will have measurements of the level of surface contamination (Bq m −2 ) available, and will thus be in a position to make well-informed decisions. It is accepted that some precautionary decisions may have been made before the end of the deposition period, especially if deposition continues over some length of time. Such decisions are in principle reversible. As an example, Section 7 examines the case where a decision to relocate is made during the period of deposition; it will be shown that, for some cost combinations, the optimal policy requires the decision to be countermanded by initiating repopulation immediately after the deposition period ends.
The optimal strategy may be identified by invoking Bellman's principle of optimality, which states that an optimal sequence of decisions in a multistage decision process has the property that, whatever the initial state and decisions, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decisions (Bellman, 1956(Bellman, , 1957. Early applications were in the field of control engineering (Bellman, 1961), but the approach has now been adopted in the fields of mathematical finance (Dixit and Pindyck, 1994), operational research and economics. The principle of optimality is embodied in the Hamilton-Jacobi-Bellman (HJB) partial differential equation when it is applied to continuous-time optimization problems.
The paper begins by deriving a detailed model for the dynamics of the radioactive fallout at the single location, which leads to a first-order differential equation in the radioactivity concentration (Section 2). The state and control variables are presented in Section 3, while Section 4 introduces the value function and the associated HJB partial differential equation that constitutes a sufficient condition for optimality.
Section 5 describes the scaling of the equations and the generalisation of the solution using non-dimensional groups. Section 6 shows how the optimal strategies may be categorised into classes or "Broad Strategies" to bring out the commonalities. Then Section 7 presents the results for the case of a nuclear reactor accident distributing radionuclides into the environment on a scale comparable to the release caused by the accident at Fukushima Daiichi in 2011. Broad Strategies are described for ranges of costs and revenues for a Base Case (or Case I) and for the two variants used as sensitivity studies: (i) where pre-relocation is assumed to have occurred (i.e. relocation before the end of the deposition period) and (ii) where relocation to a different locality increases the economic productivity of those relocated. Section 8 lists and discusses the main modelling simplifications. Section 9 discusses the method and its findings, while Section 10 gives conclusions. Thomas (1999, Introduction) has suggested that:

Modelling
"The major task facing the control engineer working in the process industries is the detailed understanding of the physical processes occurring on the plant and the codification of this understanding into a consistent and complete set of descriptive equations." and it is clear that the same principle pertains when the control is to be applied not to a chemical plant but to the environment surrounding a nuclear power plant after a very large accident involving a significant release of nuclear contamination. This Section details the modelling that will underpin the design of the controls that will achieve the best outcome for the people living nearby. The underlying dynamic model is thus made transparent to the interested reader, who may then check its inherent limits. The physical modelling described in this section feeds into Yumashev and Johnson (2017), where their Eq. (1) is derived as Eq. (38) in this paper, while their Eq. (2) is derived here as Eq. (30). Meanwhile the results of the economic costing model described in Yumashev and Johnson (2017) are used in Section 4 of this paper. The two articles, which were written in parallel, may thus be seen as complementary companion papers.

The deposition of radioactive fallout and the harvesting of vegetation
Consider an agricultural surface of area, A (m 2 ), subject to a time-varying deposition rate, W Di (kg year −1 ), of radionuclide, i, of atomic weight, i , assumed spread evenly in a thin surface layer of mass, M Si (t), over the area. The system that needs to be considered is illustrated in Fig. 1.
After the radionuclide is deposited onto the surface layer, some of that radionuclide will be taken up by the vegetation growing through the surface. For simplicity, it will be assumed that the vegetation is in a state of equilibrium with the surface layer of the radionuclide at all times. Hence the mass fraction of the radionuclide in the vegetation, f Vi , may be taken to be proportional to the ratio of the mass of radionuclide, i, con-tained in the surface layer to the total mass, M Si + M E , of soil down to the depth of the roots, D: in which use has been made of Eq. (1) in the last step. a i (m 2 kg −1 ) may be interpreted as the ratio of the mass fraction of radionuclide i in the vegetation to the surface contamination with radionuclide i (kg m −2 ). Carrying out a mass balance on radionuclide, i, in the surface layer and in the vegetation gives: where W H is the harvesting rate of the vegetation (kg s −1 ) (including by animals feeding on the vegetation) and i M i is the rate of decay of radionuclide, i, (kg s −1 ). This formulation neglects by design the effects of ploughing, seen as a remedial measure, but it also neglects the diffusion of the radionuclide deeper into the soil through natural processes and the gradual dissolution and dispersion of soluble radionu-clide compounds. Hence it will tend to be pessimistic in the sense that the radioactivity will be assumed to stay longer in man's environment than may be the case in the real situation. For example Robison et al. (2003) and Paller et al. (2014) report that while the physical half-life of caesium-137 is 30 years, its half-life in the environment may be significantly lower. Their studies, based on measurements in the Marshall Islands in the one case and on the Savannah River site in the USA in the other, produces various estimates for the effective half-life depending on the environmental medium under consideration, from less than 9 years up to a maximum of 17 years. Work on Chernobyl post-dating  suggests that the effective half-life of 137 Cs for use in calculating external dose is 18.8 years, while the value to be used in calculations of internal dose should be 23 years in the case of mushrooms but 15 years for general agricultural products (Jacob et al., 2009). The radioactivity is disappearing from man's environment roughly twice as quickly as through radioactive delay alone.

Radioactive decay and the radioactivity in Becquerel
The decay coefficient, i , may be related to the half-life, t (i) 1/2 , of radionuclide, i, as follows. In the absence of deposition and harvesting, Eq. (7) would take the form: which may be integrated with respect to time to give, When t − t 0 = t (i) 1/2 , then ln M i /M i0 = ln (1/2) = − ln 2, and substituting these expressions into Eq. (9) gives the desired relationship: When a mass, ıM i (kg), of radionuclide, i, decays by nuclear disintegration, then ıM * i = ıM i / i kg-atoms are lost, where i is the atomic weight of radionuclide, i. Thus the number of atoms of radionuclide, i, that are lost by nuclear disintegration will be: where N Ak = 6.023 × 10 26 is Avogadro's constant ×1000, which is the number of atoms in a kilogram-atom, and the minus sign is used to convert the reduction in mass into a positive number. If these atoms are lost in a time, ıt, then the average rate of disintegration of atoms over that time will be: As ıt → 0, this expression will give the instantaneous rate of disintegration of radionuclide, i, ˇi(Bq), where 1 Bq = 1 disintegration per second anď

Relationship between radioactive mass and radioactivity in Bq
Eq. (13) may now be combined with Eq. (8) to give the radioactivity in Bq of the mass, M i (kg), of radionuclide, i, in the surface layer and in vegetation: Clearly this mass may be deduced if the radioactivity in Bq is known: Eq. (15) may be used to give another interpretation of the coefficient, a i , as a radioactive transfer coefficient from soil to food for radionuclide, i. Substituting Eq. (15) into Eq. (6) gives: where ˇV i /M V is the specific radioactivity in the vegetation (Bq kg −1 ), while ˇs i /A is the specific surface radioactivity of the surface deposition (Bq m −2 ). Eq. (15) may be extended by differentiation, where, since i , i and N Ak are all constants,

The permitted rate of harvesting of vegetation, W H
Harvesting is taken to have a pre-accident level, W H (kg s −1 ). The population occupying the area before the accident is p 0 , but the population at a later point will be p, which may be lower if relocation measures are introduced. As an approximation, the maximum harvesting rate after the deposition is taken to be proportional to the manpower available in the region at time, t, after the nuclear accident: while the actual harvesting rate needs to allow for possible agricultural restrictions. Let the Government permit a fraction of land, , in a restricted area to be used in agriculture, where 0 ≤ ≤ 1. The harvest rate, W H (t), will be similarly constrained by , so that, where the final step follows from Eq. (18).

The mass balance and radioactivity balance on the surface layer and its vegetation cover
Eq. (7) may be written: where where m 0 = W H /A is the mass flux (kg s −1 m −2 ) for harvested vegetation at the pre-accident level. Defining the specific agricultural extraction rate, ˛i, for radionuclide i as: it is clear that ˛i (s −1 ) is mapped into the range 0-˛i 0 = a i m 0 as moves from 0 (complete restriction) to 1 (no restriction). The Government restrictions could also be imposed by varying the levels of the agricultural yield, m, (the mass flux for harvested vegetation) between 0 and its maximum pre-accident value, m 0 , which has exactly the same effect on ˛i as the restrictions on .
Thus Eq. (20) may be rewritten as: Substituting from Eqs. (15) and (17) converts the mass balance into a radioactivity balance: Dividing throughout by i i N Ak , gives: The analysis to this point has been for an area of extent, A m 2 . If the specific surface radioactivity, B i (Bq m −2 ), is reasonably uniform over the area, then: Substituting for ˇi from Eq. (27) and its differential, dˇi/dt = AdB i /dt, into Eq. (26) gives: or: where q i = W Di /A is the deposition rate of radionuclide, i, per unit area (kg s −1 m −2 ). The instantaneous external or "ground-shine" dose, r i (t) (Sv s −1 ), for radionuclide, i, is proportional to the rate of disintegration: where c is a constant that depends on the energy of disintegration. The term, ˛iB i p/p 0 represents the transfer of radioactivity from unit agricultural area into the food chain. Assuming, conservatively, complete consumption by humans of all agricultural products, the collective internal dose rate, r (int ) i , may be calculated as ˛ieB i p/p 0 , where e is the ingestion coefficient, taking the value e = 1.3 × 10 −8 Sv Bq −1 when the radionuclide is 137 Cs (Field, 2011), the radionuclide that dominates doses after a few years. Using Eq. (30), the collective internal dose rate from radionuclide, i, may therefore be calculated in terms of the external dose, r i , as: We note that according to these equations, the collective internal dose rate received along the entire food chain is linked to the original and remaining population of the contaminated region where the food is produced.

Control variables, state variables and state equations
The radionuclides of principal interest, because of their contribution both to external and internal dose via the food chain, are the caesium isotopes, 134 Cs (t 1/2 = 2.1 year) and 137 Cs (t 1/2 = 30.2 year), with 137 Cs dominating after a few years. The approach taken in this section is to build a framework around a single radionuclide so that the subscript, i, used in Section 2 for generality, will be dropped henceforth. (It is relatively straightforward to model multiple radionuclides, as shown in Yumashev and Johnson (2017, additional material). Alternatively, a composite approximation to the two caesium isotopes, 134 Cs and 137 Cs, could be used to cover the first few years.) It is assumed that the Government in whose territory the nuclear reactor accident occurs will have 3 control variables at its disposal: • The target population level, p c , in the affected area. The range for p c is 0 ≤ p c ≤ p 0 where p 0 is the number of people living in the affected area when the accident happens. The population is assumed to follow this target with first-order dynamics: reflecting the assumption that the Government has the means both to relocate and also to return population.
• The remediation rate, * , where 0 ≤ * ≤ * 0 . * = 0 corresponds to no remediation, while a positive value of * : * = b > 0 implies that radioactive contamination is being removed from the surface layer at a rate proportional to the mass, M, of contaminant present, with the constant of proportionality being b. Meanwhile * 0 is the highest possible rate of remediation based on the available technology. Thus the rate of change of contaminant mass due to remediation, dM ( * ) /dt, is given by: By Eqs. (15), (17) and (27), this is equivalent to: Substituting Eqs. (35) and (30) into Eq. (29) gives, after omitting the subscript, i, the following first-order differential equation in dose rate, r: The duration of deposition, T D , will be small compared with the time constants associated with radioactive decay of the isotopes of principal interest and with the countermeasures, the substantial implementation of which may take months. For example the large areas and populations involved meant that the 1986 relocation after the Chernobyl accident took from April to September to complete. See Fig. 2, which is based on Table 20 of UNSCEAR (2000). Hence we may represent the deposition as happening rapidly enough to be accounted for by the initial condition at t = 0 for the dose rate, r: with Eq. (36) thus being transformed into the decay and removal process: The differential Eqs. (33) and (38) represent the dynamics of the population, p, and the dose rate, r, and describe the system that the Government will control by adjusting the control variables, p c , * and ˛. In the parlance of control engineering, the state vector, x, is given by: while the control vector, u, is given by: Eqs. (33) and (38) constitute the state equations. The derivation of parameter values is described in Yumashev and Johnson (2017).

The value function and optimal control
The total flow, F (£ year −1 m −2 ), of economic value from the affected area will depend on the population present and on the radiation dose and will vary with time: F = F (p, r, t). The function, F, has the form: where the coefficients, F z , z = r, d, ˛, ˇ± and * , are positive. Derivations of their values and that of F are given in Yumashev and Johnson (2017). The terms on the right-hand side of Eq. (41) represent, in order left to right: • The health costs from external dose (ground shine) and from contaminated produce, • The difference in productivity between the old (contaminated) and new locations for the displaced people, • The maximum disruption cost at the old location, • The revenue from the agricultural produce grown at the old location, • The relocation/repopulation costs, • The remediation costs.
The population differential, |dp/dt|ˇ, covers population movement following Government action, and hence will not include voluntary relocation.
It is desired to maximize the value coming from the affected area up to some time horizon, T, which is assumed to be set by the Government or regulators and may be defined as a significant fraction ( 1 ⁄3, ½ . . .) of the half-life of 137 Cs. For the medium term analysis of this paper, the time horizon, T, has been set as 15 years. The value, V, is then the integral of the economic flow, F, with future contributions reduced by a discount rate, ε: where is the final condition corresponding to the optimal perpetual state with fixed controls. Following the principle of optimality (Bellman, 1956), V, must satisfy the Hamilton-Jacobi-Bellman partial differential equation: The optimal paths, p * (t) and r * (t), for the population and radiation dose respectively may be found by integrating the state equations, Eqs. (33) and (38), from the relevant initial conditions and with the optimal solutions for the control variables, p c , * and ˛, in the state space (p, r, t) obtained from Eqs. (43) and (44).
The inherent difficulty of finding the optimal solutions for the three control variables is reduced considerably by the linear, first-order nature of the state equations, which means that the optimal control policy may be expected to be binary ("bang-bang" solutions). This means that the control variables must take extreme values if optimality is to be achieved. Hence * must take either the value 0 or else * 0 and similarly ˛ will be either 0 or else ˛0. Because of the possibility of both relocation and repopulation, the control variable, p c , may take one of 3 possible values: p c = 0, p c = p 0 and also the intermediate state, p c = p. The "bang-bang" condition means that, at any instant, there are only 12 possible combinations of the control variable values that can be optimal.
The optimal policies for the control variables, p c , * and may be found by integrating Eq. (44) back in time, but this requires its final condition, V t=T (p, r, t), to be found. Fortunately closed-form, approximate solutions for this final condition can be obtained for each of the 12 possible, optimal combinations of the control variable values, provided the state Eqs. (33) and (38) can be decoupled. This may be done by adopting the simplifying assumption that the dynamics of population movement will be very rapid compared with those characterizing changes in dose rate. Thus we may assume, as far as the radiation dose is concerned, that the population is in an evolving steady state at all times, a proposition that can be implemented by replacing p by p c in Eq. (38). Such an assumption is likely to be reasonable in most cases, since the time constant of population dynamics is likely to be measured in days or weeks, while the reduction in radiation, influenced heavily first by 134 Cs decay and later by 137 Cs decay, will occur over years and decades.
Demarcating the three possible values for p c by superscript i, the two possible values for * by superscript j, and the possibilities for ˛ by superscript k, the optimal value of V at time, T, is: which requires numerical evaluation for each pair (p, r) within the state space.
The procedure described above for finding V T (p, r) may be refined by applying direct numerical integration to Eq. (42) for times between T and T 2 = 3T (say) and then using analytical solutions to close the condition at T 2 : Here p (t ) and r(t ) are given by Eqs. (33) and (38), with the controls corresponding to the discrete state (i, j, k) integrated from the "initial" conditions, p (T) = p and r (T) = r. The second integral may be approximated by using the relevant simplified closed-form expressions based on the values p (T 2 ) and r (T 2 ) at the tail end of the first integral. This provides the appropriate closure beyond T 2 , thus speeding up the computations whilst retaining a high degree of accuracy for the final condition.

5.
Scaling and the use of non-dimensional groups The partial differential Eq. (44) may be scaled and nondimensional groups introduced in order to establish the minimal number of independent parameters that provide unique solutions. Based on the obvious mathematical and economic properties of the problem, the relevant upper bounds are used to scale p and r, while a characteristic time, , is introduced to scale the rates and times, leading to the following non-dimensional variables: where the characteristic time, , is defined as follows. Consider the maximum possible external dose, R T , that would have been received over the optimization interval, T, in the absence of relocation and if there had been no remedial measures ( * = 0) but a complete food ban (˛ = 0): The characteristic time, , and its limiting values are then: The non-dimensional scaling of population,p = p/p 0 , means that the solution will hold irrespective of whether p is regarded as the population of the affected area or the population density within the area.
We further introduce the following non-dimensional cost/revenue groups: where all the lambda parameters: , ˇ, ℵ and ˛a re scaled to characteristic health costs from external radiation over the period. In particular: • is a general measure of the economic difference between the old and new locations (including the respective relocation and repopulation costs); • ˇm easures the average cost of moving; • ℵ measures the cost of remediation; and • ˛m easures the amount of revenue coming from the agricultural produce grown at the affected location.
Finally, the scaled value function,V, is defined implicitly by, where F dd p 0 is the maximum annual disruption cost at the new location, F l p 0 is the maximum annual non-radiological infrastructure depreciation cost following full relocation, F˛˛0p 0 is the pre-accident annual revenue from agriculture produce, and F p is the accident-driven difference in annual productivity between the new and old locations..

Classification of optimal strategies
The optimal time behaviour of the desired in-situ population, p c , will correspond to one of 5 broad possibilities: • No relocation, • Partial relocation followed by full repopulation, • Full relocation followed by full repopulation, • Full relocation and then partial repopulation, • Full relocation and no repopulation.
This class of possibilities may be given the identifying variable, Reloc, which will take the values 1-5 respectively. A similar classification is possible for the optimal time behaviours of remediation rate, * , and of the specific agricultural extraction rate of radioactivity, ˛, where the identifying variables, Remed and Food, may be assigned. See Table 1.
Any joint optimal strategy for the control variables, p c , * and ˛, may be now classified as one combination of the broad strategic components listed in Table 1. It is clear that there will be 3 × 3 × 5 = 45 possible classes of optimal strategies. An integer index, S (x), may be assigned to each class of optimal strategy, where the index will take a value in the range 1-45, and will depend on the vector of values of the dimensionless cost/revenue groups and the dimensionless rates: that defines a unique setting of the optimisation problem. The assignment to the classes of the optimal strategies can be made according to: as shown in Table 2, where the term, "Broad Strategy", is used to signify the class of optimal strategies with index, S.
It is possible to classify reliably any particular optimal strategy into this system by running a sequence of binary checks.

The probability of occurrence of classes of optimal strategies
The cost structure of the recovery measures will differ depending on the country in which the accident has occurred, the location of the reactor relative to local towns or cities, the agricultural productivity of the surrounding land and so on. The different cost structures for a similar accident in different localities will be reflected in the dimensionless cost/revenue groups given in Eqs. (50)-(53), with the components of the parameters vector, x = , ˇ, * , ˛,¯− ,¯+, * 0 , ¯0 , taking different values for each accident. Since it is not possible to know in advance which, if any, of the world's more than 400 operating reactors is likely to be affected, it is reasonable to model each component in x with a probability distribution.
To illustrate the procedure for finding the probability of a given Broad Strategy, S, assume that the simplest probability density applies that is valid over the region defined by x qmin ≤ x q ≤ x qmax for each component, q = , ˇ, · · ·, ¯0, of the parameters vector, x, namely the joint uniform density. Now, The subset of parameters, x n , leading to the class of optimal strategy with the index S (x:x ∈ x n ) = n, may be picked out by: where ı K (i − j) is the discrete Kronecker delta function, with ı K (i − j) = 1 when i = j and zero otherwise. Combining Eqs. (57)-(59) gives the discrete probability of the class of optimal strategy characterised by S (x) = n as: If f (x) is modelled by non-uniform probability distributions, Eq. (60) will take the form,

Results
Consider a location near a nuclear power plant that has undergone a major accident and released radioactive material into the environment. It is possible to find a unique optimal strategy that satisfies the HJB equation by measuring the surface radioactivity concentration at the end of the deposition period and feeding in the resulting cost/revenue parameters as well as the characteristic rates needed to evaluate the dimensionless groups, x = , ˇ, * , ˛,¯− ,¯+, * 0 , ¯0 . In this Section, however, we will consider the more general problem of finding the optimal strategy when the surface activity concentration is given, but the cost structure is known only to the extent that the dimensionless cost/revenue parameters are constrained to lie within the ranges: x qmin ≤ x q ≤ x qmax , q = , ˇ, · · ·, ¯0, as discussed in Section 6.2. A representative surface activity is found by assuming the total masses of radionuclides deposited per square metre to be similar to those found in the vicinity of Fukushima Daiichi after the 2011 accident. This leads to specific surface activities at time, t = 0, corresponding to the end of the deposition period, of 200 kBq m −2 for both 134 Cs and 137 Cs and 720 kBq m −2 of 131 I, totalling 1120 kBq m −2 . The short half-life of 131 I means that the specific surface activity a year later will be dictated by the presence of the caesium isotopes only and will have dropped to 337 kBq m −2 .
Results are presented for the classes of optimal strategies derived for the medium term, taken to cover the period from the end of the deposition period out to T = 15 years. This time period is particularly important as decisions on relocation, the most disruptive and potentially most expensive of the possible countermeasures, are usually taken during this time.
The NREFS study (NREFS, 2017; Research Councils UK, 2017) distinguishes between "evacuation", taken to be short-term and followed by return, and "relocation", taken to be long-term or even permanent. Evacuation may be in force for only days or a week or so, allowing for time to establish the extent of the accident, after which a decision may be taken on whether or not to allow return. Temporary relocation is defined in Ashley et al. (2017) as implying an enforced period of absence of up to 3 months, with relocation regarded as permanent if a recommendation to return cannot be made 3 months after the accident. It would certainly seem that resistance to going back and problems with a large-scale return are likely to be encountered once people have stayed away for a year or more, as in the cases of the accidents at both Chernobyl and Fukushima Daiichi.
A further possibility included in this study is "prerelocation", where the decision is made before the end of the deposition period, not merely to evacuate people as a precaution, but also to move them permanently to a distant new home. This is considered in Section 7.2, and it is characterised by the following four alternatives for the population (instead of the five options introduced previously): • Immediate full repopulation, • Delayed full repopulation, • Partial repopulation, • No repopulation.
The time constants associated with relocation, repopulation and remediation are assumed to be: Repopulation is assumed to be a less urgent and hence slower process (in reality, it well might proceed in stages). Remediation is also assumed to go more slowly. Finally, the characteristic times associated with the process of radiation extraction through food production are in the range ˛= ln 2/˛0 = 2000 − 4000 years, corresponding to agricultural yields roughly between 5 and 10 ton ha −1 . These estimates are based on the available historic data and are described in detail in Yumashev and Johnson (2017).
The ranges chosen for the parameters are: , ˇ∈ [0.5,10] and * , ˛∈ [0.1, 2], values found after considering , Gillett et al. (2001), Jacob et al. (2009) andMunro (2012), and assuming the initial radiation levels in the range 5 mSv/year < r 0 < 20 mSv/year (see Yumashev and Johnson, 2017, for the full derivation). The different ranges reflect the fact that characteristic relocation costs tend to be several times greater than health costs, while remediation costs and revenues from agricultural produce tend to be smaller. These estimates are based on approximately two-fold variations in the cost parameter F ranging from their estimated minimum to the maximum values. The characteristic health cost F r , for example, varies roughly between £25,000 per manSv and £55,000 per manSv (central value £40,000), which is based on the Linear-No-Threshold hypothesis for the effects of radiation on the resulting losses in life expectancy, and the Willingness to Pay approach to translate these losses into costs (Yumashev and Johnson, 2017). For all the cases described below we perform 1000 Monte-Carlo experiments with uniform distributions for the four characteristic rates¯−,¯+,‫א‬ 0 , ¯0 and log-uniform distributions for the four cost/revenue groups , all defined using the ranges specified above. The log-uniform distributions are chosen according to the multiplicative nature of the cost/revenue groups.

Base Case (or Case I)
Economic productivity will normally fall as a result of a forced movement to a new location in the same country, certainly in the short to medium term, because of the disruption that people will experience. Relocated people are likely to find it hard to obtain the better paid jobs that would indicate higher economic productivity. In fact, if the people living in the old location believed that they could earn more in the new location, they might well have moved anyway, before being obliged to do so in the unlikely event of a nuclear accident. The fact that they have not done so suggests that their earnings and productivity would not be increased by moving. Hence in the Base Case it is assumed that economic productivity is reduced by moving to the new location, which implies a positive value for : > 0. For the ranges of the characteristic times and cost/revenue groups, , listed above, only two classes of optimal strategy emerged from the analysis, namely Broad Strategy 8, with a probability of 83.7%, and Broad Strategy 7, with a probability 16.3%. Both strategies warranted early remediation, with a temporary food ban being required by Broad Strategy 8 and no food ban required under Broad Strategy 7. Both Broad Strategies rejected relocation as a policy option.
The fact that relocation is not called for by any optimal strategy means that any policy calling for relocation for the Base Case (or Case I) would not be the best.

Pre-relocation is assumed to occur (Case II)
In this sensitivity study, the people are assumed to have been moved out at a very early stage so that the affected area is empty of people at t = 0: p 0 = 0. The relocation costs are now regarded as "sunk costs" that have been incurred already and so are not relevant to future decisions. As in the Base Case, the more likely situation for economic productivity at the new location is assumed to pertain, namely that it is reduced by moving to the new location: > 0. An incentive is thus provided for returning at least some of the population to the old location. 5 classes of optimal strategies are now possible, depending on the position of the parameters, , ˇ, ℵ , ˛,¯− ‫א,+¯,‬ 0 and 0 , within their ranges, which are kept the same as in the Base Case, although the relocation costs are not relevant in this case since we assume that the relocation had already taken place during the early stages.
The class of optimal strategies most likely to be appropriate when relocation costs are removed is Broad Strategy 26, which has a probability of 47%. This strategy corresponds to delayed full repopulation, early remediation and temporary food ban. The next most likely Broad Strategy, 8, has a probability of 31.6%, and calls for immediate full repopulation, early remediation and temporary food ban. Broad Strategy 7 is invoked 16.3% of the time, which calls for immediate full repopulation, early remediation and no food ban. Broad Strategy 23 (probability 2.9%) is similar to Broad Strategy 26 in that it requires delayed repopulation and temporary food ban, but now the remediation is also delayed.
Finally, Broad Strategy 39 (1.6%) calls for no repopulation, no remediation and a perpetual food ban, which implies completely abandoning the original location and justifies the original (not necessarily optimal) decision to relocate the people. This is the only strategy that does so, suggesting that in most cases it is not economically viable to relocate the peo- ple if this is going to result in lower productivity at the new location.

Economic productivity is increased by moving to the new location (Case III)
Although it seems prima facie to be unlikely that the productivity of those obliged to move and relocate within the same country would increase as a result, this sensitivity study examines that possibility for completeness. The lambda ranges stay the same as for the Base Case, except that now, since < 0, the limits are set as ∈ [−10, −0.5]. All the other parameter ranges are the same as before.
With this negative range of , it is found that the first two classes of optimal strategies, Broad Strategies 8 and 7, are appropriate for 81.8% and 16.3% of the cases, respectively, which is very similar to the Base Case. What is different for the cases with < 0 is that one additional strategy, Broad Strategy 39, is invoked in 1.9% of the cases, at the expense of Broad Strategy 8. As noted in Section 7.2, Broad Strategy 39 corresponds to complete abandonment of the area, with a permanent food ban, no remediation and relocation of everyone in the area with no repopulation at any time. Its slightly higher likelihood relative to the pre-relocated state with > 0 (1.9% instead of 1.6%) is due to the additional economic disincentive to move back. • Case II, where pre-relocation has occurred;

Summary of the results
• Case III, where economic productivity will be increased by a move to the new location.
The higher numbered Broad Strategies call for generally stronger recovery measures and longer periods when the population has to be displaced from the original location, albeit temporarily. The Broad Strategies appropriate for the Base Case (S = 7 and S = 8) exclude relocation as an optimal option. The effective net cost of relocation is lower for the two other cases examined, which means that relocation has a greater chance of being a favoured policy option. In the case of prerelocation, Case II, the probability of Broad Strategies involving immediate full repopulation, the closest equivalent to no relocation in this particular context, is around 48% (strategies S = 7 and S = 8 combined), compared with 100% no relocation for the same two strategies in the Base Case. Meanwhile Broad Strategy 26, which is based on a seemingly balanced and pragmatic approach of combining delayed full repopulation, early remediation and temporary food ban, becomes the dominant class of optimal strategies, invoked just under half (47%) of the time. Broad Strategies 23 (similar to 26 but with delayed remediation) and 39 (abandonment of the affected area) make up the balance (just under 5% of the cases in total).
Relocation will tend to be incentivised further in the case where economic productivity will increase after a move to the new location (Case III). Now the abandonment option, Broad Strategy 39, increases in probability to just under 2%, which is marginally higher than the probability of 1.6% obtained for the same Broad Strategy when there is pre-relocation but no additional economic incentive to move (Case II). It can be seen that decreasing the economic penalty and increasing the economic reward for relocation will influence strategic decisions away from repopulation and in the direction of abandonment of the affected area, although the net effect is relatively small.
Concerning the accuracy of the probabilities given for the Broad Strategies, it needs to be born in mind that these were calculated on the assumption that the non-dimensional cost/revenue and the non-dimensional rate variables obeyed log-uniform and uniform probability distributions, respectively. While this is a reasonable assumption in the absence of more detailed knowledge, lack of that knowledge means that the resulting probability figures for the optimal strategies should be regarded as indicative rather than definitive.
Given the choice of probability distributions (including the parameter ranges), capturing the appropriate optimal control strategies for each of the cases considered in Sections 7.1-7.3 within the list of Broad Strategies depends only on the number of realisations for the eight non-dimensional parameters , ˇ, ℵ , ˛,¯− ‫א,+¯,‬ 0 , ¯0 within their respective ranges. One thousand Monte-Carlo experiments are regarded as providing a sufficient sample size for the purposes of this study.
Hence, for example, the finding that relocation forms no part of any optimal strategy in the Base Case is likely to be valid unless the ratio of the characteristic relocation costs to the health costs turns out to be below the range for ˇu sed in the present calculations. All the optimal strategies found for this core case are likely to involve keeping the population in situ, applying early remediation and either imposing a temporary food ban or no food ban at all.

8.
Limitations of the modelling The system model combines a physical model (deposition of fallout, uptake of radioactivity by the vegetation, direct and indirect harvesting, external and internal dose, population movement) with an economic model (productivity differences before and after relocation, agricultural revenue, relocation costs, remediation costs and so on). Finkelstein (2006) notes that to "treat effectively the complexity of real systems, the models used are highly abstract, that is to say they idealise and omit detail.", and it is this philosophy that has been applied to the task in hand.
The main idealisations used in the paper are now listed, together with comments on their consequences: • The controls are applied after the deposition period has finished. This would imply in any practical realisation that the level of fallout would need to be measured soon after the fallout had ceased in order to set the initial condition for the model. There is, in principle, no problem in measuring the degree of fallout using either spatially distributed, static detectors or moveable measuring instruments (e.g. drones). Such a recommendation is contained in Waddington et al. (2017a). • Uniform spatial distributions are assumed within the region under examination for radioactivity, for people and for economic activity. Clearly the extent of the approximations will depend on the size of the region and on the mix of rural and urban life within the region. There would be no inherent difficulty, however, in employing several models to cover a large region. • The fallout is assumed subject to only two methods of removal from man's environment: via harvesting and via radioactive decay (Eq. (20)). This omits both remedial ploughing and also the natural effects that may reduce the effective half life by 50% or more, as noted at the end of Section 2.1. The effect is to make the physical model conservative -in real life the radioactivity in man's environment might decrease at a significantly faster rate. • Simple models are used to represent the processes of relocation, repopulation, remediation and the production of food. The dynamics might be described more precisely by nonlinear differential equations, for example, with more parameter values, perhaps more accurately defined. However, the relative simplicity of the system model makes it convenient to use as a management tool in its current form. As a general point, it is not desirable to make a model more complicated than its end-use warrants. • The models culminate in the representation of both the number of people in the region and the dose rate by 1st order differential equations, as described in Sections 2 and 3. This has the effect noted in Section 4 of rendering the optimal control actions binary or, in control engineering parlance, "bang-bang". This is a standard result in control theory (see, for example, Owens, 1981, Section 6.1.3).
In fact, such a control law is very reasonable for the management of a homogeneous, relatively small region, where the avoidance of ambiguity would require a Government body, or "Gold Commander" in the short term (National Policing Improvement Agency, 2009), to recommend either that people stay in situ or leave, for example. More complex recommendations could be envisaged (e.g. a recommendation that people under a specified age should move away) but this could well be seen as over-complicated and subject to individual interpretation in the practical world of real people with their own opinions. • The recommendations constituting the control actions are assumed to be binding on the population, and, by the same token, voluntary relocation is not allowed for. Some people might move away under their own volition, of course, and might not be prepared to return if and when advised to do so. Hence the assumption requires the Government body either to have the power to enforce its will or to be trusted by the population affected well enough for its recommendations to be followed by the overwhelming bulk of people.
• The region is assumed to be sufficiently independent of neighbouring regions for interdependencies (such as induced by commuting between regions for work) to have negligible effect. Non-negligible interdependencies would lead to an understatement of the economic disruption felt by the surrounding regions if the population of the region in question were relocated. Allowing for such dependencies would tend to increase the calculated incentive for keeping the population in situ, which would reinforce the major finding of the model that relocation should be used sparingly. • No sectoral detail is included in the economic model. In fact the use of average properties can be expected to give a good representation when, as here, the control recommendations are binary (on or off) in nature. • The average cost of moving, as measured by ˇ, has been set to one half of that used in Yumashev and Johnson (2017). This reduces the disincentive to relocate, but even so no optimal strategy for the Base Case involves relocation. • The results apply for initial dose rates lying between 5 and 20 mSv per year, corresponding approximately to the 10-20 mSv year −1 "medium radiation levels" used in Yumashev and Johnson (2017). For comparison, the 162,700 evacuees from Fukushima Daiichi would have experienced an average radiation dose of 16 mSv in the 12 months following the accident if they had stayed in place (Tables  11 and 12 of Waddington et al., 2017a). Within this group of 162,700 people, 107,200 would have received less than 20 mSv (average: 4 mSv), while 55,400 would have received between 25 mSv and 51 mSv (average: 39 mSv) in the first year if they had not moved. The higher doses affecting about a third of the population group lie between the "medium" and "higher" radiation ranges considered by Yumashev and Johnson (2017) -the latter consists of the range: 50-100 mSv year −1 . But those authors found that even at the higher radiation level, the most common optimal Broad Strategies would involve no relocation in either the Base Case (or Case I) or Case III.
While the predictions of economic models are known to be fallible (e.g. Harford, 2014), it is arguable that the very simplicity of the model, as just discussed, is an advantage. Nor is the model attempting to provide an absolute forecast of economic prosperity at some future time, but rather the difference between two scenarios. The output may then be seen as guidance to a decision maker, who will then be in a better position to weigh the harm associated with nuclear radiation against the harm the same people will experience if their economic well being is reduced. It has been shown conclusively (Waddington et al., 2017a) that the decisions taken by the authorities after both the world's major nuclear accidents were poor, apparently driven by fear of radiation exposure to the exclusion of all other considerations; use of the model advanced in this paper would provide the essential economic counterbalance. In this context, it is worth noting that life expectancy has been found to depend strongly on GDP per head (Thomas, 2017;Thomas and Waddington, 2017).
The main conclusion from the Monte Carlo simulation runs was that in the Base Case (or Case I), the recommendation was against relocation in every case. This is striking because the result finds close parallels in the outcomes of studies using both the J-value (Waddington et al., 2017a) and Public Health England's PACE-COCO2 program suite, as will be discussed further in the next section.

Discussion
The results for the Base Case (or Case I) show that after a major nuclear accident involving a significant release of radioactive material, it is not economically sensible to move people away from their homes on a semi-permanent or permanent basis. The sensitivity studies embodied in Cases II and III are inherently less plausible. In Case II it is assumed that people have been moved out of their homes to semi-permanent or permanent new housing very early on, before the deposition phase of the accident is complete, with the cost of the relocation discounted as far as future decisions are concerned. In reality such a rapid, large-scale movement of people is likely to be infeasible. Fig. 2, which records the experience at Chernobyl, suggests that such a movement of people may well take weeks. There is also the very sizeable problem of providing new semi-permanent or permanent accommodation so quickly. Case III is also unlikely, since it assumes that the economic conditions for the people after relocation will be better than those they experienced in their original area. This poses the question as to why, given these better conditions elsewhere, those people have not moved already. Essentially these cases have been included for completeness.
But even in the unlikely Cases II and III, where the disincentives to moving out have been artificially reduced, immediate full repopulation is called for in Case II about 50% of the time, while no relocation is ordered in Case III on 98% of occasions.
The message coming from the study is that permanent or semi-permanent movement of people away from areas affected by nuclear fallout is likely only rarely to be an economically optimal strategy, even after allowing for health costs. Early remediation is, by contrast, desirable in just about all cases. Food bans may well be needed, but they should be temporary.
These quantified results are based on a model that includes a good many simplifying assumptions, as documented in Section 8. But although they stand in stark contrast to the ways in which governments reacted after the big nuclear accidents at both Chernobyl and Fukushima Daiichi, the results are in strong agreement with the findings of two separate and diverse studies that formed part of the NREFS project (NREFS, 2017). J-value analysis has shown that the number of people, 335,000, relocated after Chernobyl was too high by a factor of between 5 and 10, and that it is difficult to justify relocating any of the 160,000 people moved out after Fukushima Daiichi (Waddington et al., 2017a). J-value analysis has also endorsed remediation measures instituted in the vicinity of both Fukushima Daiichi and Chernobyl, in the latter case even after 20 years (Waddington et al., 2017b). Meanwhile Public Health England's PACE-COCO2 program suite has been used to model an major accident at a fictional nuclear reactor situated on England's South Downs, about 2½ miles from Midhurst in Sussex (Ashley et al., 2017). Here it was found that only 620 people would be expected to need permanent relocation, even when a very strict return criterion was imposed. Applied three months into the accident, this criterion stipulated the radiation dose to be received over the following 12 months at the original location should be no more than 10 mSv, half the safe-return dose, 20 mSv year −1 announced by the Japanese Government after the Fukushima Daiichi accident (World Nuclear Association, 2017).
The results from this paper are also in line with much expert opinion over the decades, which has regarded many of the measures introduced after Chernobyl as excessive and probably counterproductive. See, for example, EC/IAEA/ WHO (1996), Discussion on Background Paper 6. Even earlier, the effectiveness of relocation post 1990 was the subject of a detailed study carried out by . See also . The work, which started in 1990, was carried out under contract for the Commission of the European Communities (CEC) within the framework of the IAEA International Chernobyl Project. The IAEA was itself responding to a request from the Soviet Government that an international group of experts should be organised to review and evaluate the measures being taken to assure safe living conditions for the people continuing to live in the affected areas. Lochard and Schneider recommended against a second mass relocation in 1990, but unfortunately this advice was not acted on.
The first lesson that the present study holds should be of value to future decision makers who may face the challenge of coping with a big nuclear accident. It is this: rather than allowing themselves to be influenced by the precedents of mass relocation following both the Chernobyl and Fukushima Daiichi accidents, such decision makers should realise that such a response is highly likely to constitute a poor strategy that will not serve the interests of the people it is intended to help. Remediation and temporary food bans are more likely to form part of the optimal policy.
Further development could see the model and the associated control recommendations applied to a specific accident. This would require a compilation and updating of economic data for the areas around nuclear plant, together with the availability of instrumentation systems to provide the necessary measurements of radioactivity levels. The goal would be to apply the overall system to provide real-time information and recommendations to the decision maker facing a big nuclear accident.

Conclusion
The principle has been demonstrated that advanced control techniques may be used to find the optimal mix and timing of countermeasures after a major nuclear accident, where "optimal" implies economic optimisation making due allowance for: • Health costs resulting from ground shine radiation and contaminated food produce, • Dfferences in productivity between the old (contaminated) and new locations for people who have been relocated, • Disruption in productivity as a result of relocation and repopulation, • Revenue from the agricultural produce, • Relocation/repopulation costs, • Remediation costs.
The method may be applied to a location in the vicinity of a stricken nuclear power station once the deposition period has ended and the measurements of surface contamination have stabilised. A good system for measuring the radioactivity levels over a wide area in the vicinity of the nuclear site is required. This may be either fixed or movable, possibly implemented using drone technology, but it needs to be established in advance of any accident.
The model from which the optimal strategy is found is relatively simple in structure, but requires an extensive set of parameters to characterise regional economic activity, both near the nuclear power plant and in the area envisaged for possible relocation. It would be sensible to have estimates of these parameters available on an ongoing basis for nuclear sites so that the method could be deployed in a timely fashion if and when needed.
More study could improve the representation of the dynamics of the possible countermeasures, which might be described more precisely by nonlinear differential equations, for example, with more accurately defined parameter values. However, the relative simplicity of the model makes it convenient to use as a management tool in its current form. As a general point, it is not desirable to make a model more complicated than its end use warrants.
The values of the parameters near any particular nuclear power station will obviously vary significantly with the country and region in which the reactor is situated and the geography of the surrounding area. The study made allowance for this by regarding the dimensionless economic groupings and rates as random variables spanning wide ranges. The choices made for the associated probability distributions are advanced as reasonable in the absence of further data, but it is accepted that there must be a degree of residual uncertainty here. Thus, while it is judged likely that all the possible optimal strategies have been identified for each of the scenarios considered in Section 7, the probability figures estimated for the Broad Strategies should be taken as indicative rather than precise estimates.
It is found that no optimal strategy for the Base Case involves any degree of relocation. Nor does permanent relocation figure in the large majority of the optimal strategies associated with the variant sensitivity scenarios considered, namely Cases II and III. It is concluded that relocation is almost certain to be a less than optimal response after many major nuclear accidents.