A modelling framework based on MDP to coordinate farmers' disease control decisions at a regional scale

The effectiveness of infectious disease control depends on the ability of health managers to act in a coordinated way. However, with regards to non-notifiable animal diseases, farmers individually decide whether or not to implement control measures, leading to positive and negative externalities for connected farms and possibly impairing disease control at a regional scale. Our objective was to facilitate the identification of optimal incentive schemes at a collective level, adaptive to the epidemiological situation, and minimizing the economic costs due to a disease and its control. We proposed a modelling framework based on Markov Decision Processes (MDP) to identify effective strategies to control PorcineReproductive andRespiratorySyndrome (PRRS), a worldwide endemicinfectiousdisease thatsignificantly impactspig farmproductivity. Using a stochastic discrete-time compartmental model representing PRRS virus spread and control within a group of pig herds, we defined the associated MDP. Using a decision-tree framework, we translated the optimal policy into a limited number of rules providing actions to be performed per 6-month time-step according to the observed system state. We evaluated the effect of varying costs and transition probabilities on optimal policy and epidemiological results. We finally identifiedan adaptive policy that gave the best net financial benefit. The proposed framework is a tool for decision support as it allows decision-makers to identify the optimal policy and to assess its robustness to variations in the values of parameters representing an impact of incentives on farmers' decisions.


Introduction
The control of animal diseases is a major concern for the livestock sector. Animal diseases are an important source of vulnerability due to the diversity of their economic impacts [1]. They create substantial shortfalls for farms, by degrading their technical and economic performance (production losses), and lead, for some of them, to the loss of commercial opportunities. The control of animal diseases also implies the allocation of resources, both ex ante in terms of surveillance and prevention, and ex post to mitigate the sanitary and economic consequences if the disease occurs (e.g. curative expenditures, disinfection, carcass disposal). These shortfalls and costs induced by animal diseases weigh heavily on the economy of farms and have a wider effect on the competitiveness of animal production chains. Beyond these direct impacts on the livestock sector, animal diseases can have a broader impact on regional and national agricultural economies (animal feed, for example), as well as on firms engaged in the processing of animal products for food. For infectious diseases, the effectiveness of control measures often depends on the ability to act in a coordinated manner across a group of farms. However, for non-notifiable animal diseases such as Bovine Viral Diarrhoea orPorcine Respiratory and Reproductive Syndromemost of them being endemic diseases-farmers individually decide whether or not to control the disease, balancing the benefit of implementing or not control measures within their own farm (decentralized decision-making process). As a result, it can lead to a too low proportion of farms under control to ensure disease control at larger (e.g., regional) scale. Indeed, contagious pathogens spread among farms through numerous transmission pathways such as: animal purchases (e.g., in paratuberculosis [2]), direct contacts between animals from neighbouring herds (e.g., in bovine viral diarrhoea [3]), environmental contamination (e.g., in Q Fever through airborne transmission; [4,5]), equipment shared between farms, animal vectors such as insects (e.g., in Bluetongue [6]), small mammals, wildlife (e.g., in tuberculosis; [7]), and movement of persons [8]. Therefore, decision-making at farm level gives rise to externalities that have sanitary and economic consequences to interconnected farms.A farmer who decides to protect his herd against a particular disease by vaccinating or by adopting strict biosecurity measures (e.g., hygiene, quarantine, etc.) creates a positive externality, in that his action benefits other farmers by lowering the risk of pathogen spread [9]. Conversely, a farmer could behave as a free rider, seeking to benefit from the efforts of his neighbours without bearing the costs [10]. This behaviour generates a negative externality since it contributes to maintaining the disease within a given geographic area. This results in strong interrelationships among individual decisions to control animal diseases at the regional scale. Furthermore, the regional epidemiological situation may vary only if a sufficient number of farmers implements a given control strategy. In a laissez-faire situation, it is likely that the observed outcome of a decentralized decision-making process is not the best outcome for the collective level (pursuit of selfinterest does not lead to maximized utility on the aggregate level) [11,12,13]. Control decisions should also be understood in a dynamic perspective. Individual decisions regarding disease control are made over time. They vary according to the health statuses of herds to better account for disease spread. Hence, the decision should be adaptive, a key challenge in designing effective schemes. Since the herd health status changes over time, incentives should also vary over time.
Modelling is a powerful tool to assess ex-ante adaptive strategies. Often, coordination scenarios are defined in a non-adaptive way (e.g., [12]). Scenarios are simulated and compared to identify the best ones regarding given criteria. More recently, adaptive approaches have been developed to construct a guideline with rules varying over time and optimizing given criteria. The scope of application encompasses herd management [14,15], species conservation problems [16,17], human health management [18,19,20], and animal health management [21,22]. Nevertheless, the issue of adaptive coordination has not been considered yet in animal health management.
Our objective was to facilitate the identification of optimal incentiveschemes at a collective level, adaptiveaccording to theepidemiologicalsituation, and minimizing the economic costs to the community due to a disease and its control. We focused on the collective dimension (i.e., individual decisions are not modelled). We considered a social planner supervising the health management decisions of a group of farmers and proposing collective disease management devices. In the first part of the paper, the modelling framework based on Markov Decision Processes (MDP) is presented.In the second part, this framework is used to identify effective disease control strategies, with application to the control of PorcineReproductive andRespira-torySyndrome (PRRS), an endemicinfectiousdisease thatsignificantly impactsthe productivity of pig farms [23,24].

Description
We considered that farmers are facing the spread of a non-notifiable endemic animal disease. Farms differ according to the health status of the herd (e.g., virus-free, infected). For a given herd status, we assumed that all of the farmers are facing similar economic losses due to the disease and similar disease control costs. Farmers' individual decisions were not explicitly modelled. However, we integrated an epidemiological model describing at each time-step the proportion of herds moving from one health status to another depending on the epidemiological processes that characterize the disease, but also on control measures implemented by farmers.
At the collective level, we considered a social planner [25], whose objective was to improve the welfare of all of the participants in the primary production chain (i.e., the farmers and the social planner). By taking into account the herd statuses (which derived from epidemiological processes and implemented control measures), the social planner coordinates farmers' efforts as related to disease control. The social planner's objective was here to define a sequence of actions over H time-steps {a 1 ,a 2 ,. . .,a H } to improve the economic situation of the primary production chain over this given time horizon (H can be finite or infinite). The impact of the actions on the proportion of herds moving from one herd status to another was assumed to be known by the social planner. This proportion is a function of farmers' compliance with advised measures as well as measure efficacy. These two factors were separated in the framework. Each action had a specific associated cost. The optimization economic criterion used in our model was to seek to minimize, over the given time horizon, the sum of economic costs supported by farmers (i.e., economic losses due to the disease and disease control costs) and the costs of actions offered by the social planner.
In our framework, the epidemiological model thus dynamically interacts at each time-step with the decision process at the collective level. Herd statuses change over time according to control measures implemented, and thus incentives decided at a collective level.On the other hand, the collective actions are chosen based on farm statuses at each time.

Actions (A).
At each time-step, the social planner chooses one of the available actions A = {a 1 ,a 2 ,. . .,a k }, each consisting of a combination of recommendationsand incentives. Each action has a given impact the proportion of farmers implementing advised control measures.
Transitions (T). T(t,s,s 0 ,a) is the probability to go from state s 2 S at timetto state s 0 2 S at time t + 1 when using action a 2 A. It depends on the stochastic epidemiological processes and on control measures implemented by farmers.
Rewards (R). The reward for the social planner r(t,s,s 0 ,a) when usingactionaat time t is associated with a transition of the system from state s at timet to state s 0 at time t + 1. At each time-step, rewards (always positive) consist of losses due to the disease and control costs, both depending on the number of herds in each status. Moreover, it may include a cost associated with the action chosen by the social planner, unrelated to herd statuses (e.g., advising).

Resolution
Solving a MDP model consists in finding the optimal policy, noted π Ã . A policy is a function assigning an action for each possible state at each time-step (π:s t ! a t ). To each policy is associated a cumulative reward. The optimal policy π Ã is the policy π that minimises the expected rewards (costs and losses) cumulated over the time horizon H: E[∑ t = 1,..,H ρ t r(t,s t ,s t+1 ,a t )|π] where E[.] is the expectation operator and ρthe discount factor ( Table 1). The discount factor captures the fact that the social planner overvalues more immediate compared to delayed rewards (the longer a reward is delayed, the more its value decreases). Given this optimal policy, the social planner knows at each time-step which action should be used according to the observed system state. To find π Ã , the MPD can be solved using the Value Iteration algorithm [26]. Here we used a threshold of 0.1. For a given MDP, this optimal policy always exists [26].

Case study: Porcine Reproductive and Respiratory Syndrome
PorcineReproductive andRespiratorySyndrome (PRRS) is a major issue for swine industry in most producing countries. For example, annual costs due to PRRS virus have been estimated to be approximately 664 million dollars in the United States in 2011 [24]. Within and between- Framework to coordinate disease control decisions herd spread of PRRS virus occurs through several transmission routes: purchase of infected pigs, infected semen, introduction from vehicles, people, equipment or supplies, transmission by insects or by air [28]. Depending on the intensity of within-herd virus spread, herds can be classified as negative, positive unstable during the acute phase of infection, or positive stable after stabilization [29]. Measures available to control PRRS in infected herds mostly include biosecurity and immunization through vaccination to limit within-herd spread [30]. To eradicate PRRS virus from the herd, a whole herd depopulation-repopulation is the most effective means but with a high financial cost [30]. Control and eradication programs have been implemented at national or regional levels [31,32]. Most often they are based on a voluntary adherence byfarmers and require collective organization with coordinated actions of producers and practitioners [30,32,33]. Depopulation-repopulation is rarely used in practice due to high associated costs.

Epidemiological model
A stochastic discrete-time compartmental model was developed to represent the spread and control of PRRS virus within a group of pig herds. Herds were classified into 5 mutually exclusive statuses combining infection states and individual control measures implemented within the herds: 2 statuses for virus-free herds (F and Fd) and 3 statuses for infected herds (I, I 0 , and I C ; Fig 1). We use throughout the paper similar letters to denote herd statuses and number of herds in these statuses. F herds are virus-free (i.e., without any virus circulation). When farmers implement biosecurity measures to prevent virus introduction, virus-free herds F become Fd (virus-free with biosecurity). Both F and Fd herds can be infected and then become I (with virus spread without control). When control measures to limit virus spread are implemented, infected herds become I 0 , with a lowered virus spread. When spread is controlled, I 0 herds become I C (controlled infected herds). We assumed that farmers will not stop control measures in I 0 herds until reaching status I C . Then, farmers can depopulate, I C herds then becoming Fd. As I C herds  already experienced infection, we assumed that they will maintain biosecurity measures. On the other hand, I C herds can become I again when control measures are stopped without depopulation.
Virus transmissionto F and Fd herds occurred from herds in I, I 0 , and I C statuses. Because of the control measures implemented by farmers in I 0 and I C herds, virus transmission is reduced compared to transmission by I herds. The virus can also be introduced due to contacts with herds located outside of the modelled group of herds (external risk). Pig production systems are almost closed. However, the external risk is not nil, and, even with a very low value, could impair disease control and thus should be accounted for. Transitions from statuses F and Fd to status I were modelled using a frequency-dependent function [34,35] (Eq 1). We assumed that the transition rate from Fd to I is equal to the one from F to I, weighted by factor γ, representing the protection induced by biosecurity (Eq 2).
where N is the total number of herds, and β I , b I 0 ; b I C , and out are the transmission rates to herd F from a herd I, I 0 , I C , and from outside, respectively. The other transitions were defined as constant probabilities representing the proportion of herds in each status that implements the control measures (biosecurity, management practices, vaccination, and depopulation). Parameter values provided in Table 2 correspond to the case where the social planner was assumed not to influence measure implementation in farms. These values were chosen from expert's opinion assuming a time-step of 6 months.  Actions. We modelled the consequences of the social planner's action on farmers' decisions by modifying the value of model parameters related to transitions between herd statuses (Table 3). Four actions were considered: None, Incent1, Incent2, and Incent3 (Table 3). Transitions between herd statuses ( Table 3 in lines) involved one or several measures, or their end, as defined in Table 2. Actions( Table 3 in columns) correspond to a level of incentives to implement a set of measures according to herd status, incentives increasing from none to incent3. When using action None, the social planner did not influence measure implementation on farms but some measures still were assumed to be implemented. When using incent1, there was a positive but low incentive to protect virus-free herds and to control infection in infected herds (fewer new infections and returns to I state). When using incent2, the same measures were advised but with a higher effect, corresponding either to an increasing efficacy of measures or a higher level of farmers' compliance. For realism, we nevertheless considered that farmers' compliance was not complete and that for each status, a given proportion of farmers did not implement control measures (we assumed that farmers with infected herds were less reluctant to implement the prescribed control measures than those within virus-free herds). When using incent3, measure efficacy and farmers' compliance was much higher. In addition, depopulation-repopulation of infected herds was considered with no possible return to state I from state I C . With such an action, eradication is expected to be achievable.

MDP model
Transitions. Transitions in the decision model were computed based on transitions defined in the epidemiological model (Fig 1). A transition probability given action a from state s t = (F t ,Fd t ,I t ,Io t ,Ic t ) to state s t+1 = (F t+1 ,Fd t+1 ,I t+1 ,Io t+1 ,Ic t+1 ) was defined based on the number of herds moving from one status to another. The probability was not null if, for all statuses, the number of herds at time t + 1 is consistent with the one at timet given the transitions between herd statuses (Fig 1). For example, as no herd can become F, the number of F herds at time t + 1 should be lower than or equal to the one at t. Let us denote: • w the number of herds moving from F to Fdat time t Therefore, knowing states s t and s t+1 , we only need to consider potential values of w, x, and y. Indeed, from w, we can deduce l, from x and w, we can deduce m and from w,x,and y, we can deduce z and u.
The transition probability when using actiona was then given by ( with P(F ! I) = 1 − exp(−β F ) and P(Fd ! I) = 1 − exp(−β Fd ). In this formula, the first term in square brackets corresponds to the product of two binomials. The first binomial denotes for the x herds chosen among Ic t to go to Fd t+1 with probability Ic t x ! kðaÞ x ð1 À kðaÞÞ ðIc t À xÞ . The second binomial denotes for the z herds chosen among the remaining ones (Ic t − x) to go to I t+1 conditionally they are not going to Fd, with probability Ic t À x z ! φðaÞ 1À kðaÞ z 1 À φðaÞ 1À kðaÞ ðIc t À xÀ zÞ . It has to be noted that the probability for herds not to go to Fd while going to I is one. Once aggregated you end with the sub-formula in square brackets. Similarly, the second term in square brackets corresponds to the probability for w herds to go from F t to Fd t+1 times the probability for l herds among remaining ones to go from F t to I t+1 . The last three terms in square brackets are simpler and corresponds respectively to m herds going from Fd t to I t+1 , y herds going from I 0t to Ic t+1 , and u herds going from I t to I 0t+1 . Then, we sum over all of the possible values for x, w, and y, accounting for constraints on these sums, each being limited by the number of herds that can change of status (i.e., not more than available in source status at time t, not more than needed at time t+1 in receiving status, and not more than missing in the source status at time t+1 compared to time t).
Rewards. For the social planner, the choice of any action a 2 A except for action None induced a fixed cost Cdiff(a). Moreover, at each time-step, the other control costs (C) and losses (L) due to the disease incurred by the social planner were expressed as functions of the number of herds in each herd status or transiting between statuses. We assumed that the magnitude oflosses in controlled infected herds (L Ic ) were lower than those in other infected herds (L I ).Herds in statuses I 0 and I C had costs due to external biosecurity (Cbe) and internal biosecurity with vaccination (Cbi). Herds in status Fd had costs due to external biosecurity (Cbe). Finally, a cost related to depopulation CdepconcernedI C herds becoming Fd(transition flow between the two statuses). For the social planner in state s t = (F t ,Fd t ,I t ,Io t ,Ic t ) at time t moving to state s t+1 = (F t+1 ,Fd t+1 ,I t+1 ,Io t+1 ,Ic t+1 ) at time t + 1 when using action a, the reward at timet was:/) As herds were not individually modelled, we used the expected number computed using the transition probabilities given the initial and final states. The values of losses and costs are given in Table 4.
As we considered an endemic disease, we looked for optimality in the long run. For the MDP resolution, we fixed an infinite horizon with a discount factor ρof 0.975 per time-step (6 months).

Model scenarios and analysis
Initial conditions. We considered a group of 50 herds. In order to reflect the endemic situation of the disease, an initial state was chosen corresponding to 40% of the herds in a virusfree status (5 in F and 15 in Fd), 40% of the herds in status I C (20 herds), and the remaining 20% in statuses I(5 herds) and I 0 (5 herds). A program was developed in Java language (S1 File) [36] and the model was simulated (5,000 replications) over 100 time-steps of 6 months (50 years) (S2 File).
Outputs. Two outputs were of interest to represent the simulation outcomes: the number of virus-free herds representing the clearance level (F + Fd) and the number of controlled infected herds (I C ). These outputs, corresponding to non-transient states, were regarded as relevant for an endemic disease.
Model behaviour when the social planner always chose action None. We first calibrated the epidemiological model so that it produced a realistic equilibrium situation over time if the social planner always chose action None. The objective was to represent the endemic situation of the disease in production areas. We then checked the effect of uncertainty in model parameters to identify the parameters that need to be accurately determined. In particular, we verified that the model behaviour was appropriately impacted by parameters influenced by actions. We also conducted a global variance-based sensitivity analysis using the FAST sampling design for varying parameters simultaneously ( [37]; 100,000 scenarios), assuming a uniform and continuous distribution between minimal and maximal values ( Table 2). For each parameter and Table 4

Parameter
Definition each model output, we computed the first order and total sensitivity indices using the sensitivity package of the R software [38]. Computed optimal policy. To assess the advantage of using the optimal policy π Ã , we compared the epidemiological results and the discounted cumulative rewards P 100 t¼0 r t rðt; s t ; s tþ1 ; a t Þ obtained when using π Ã versus one action consistently (irrespective of the state).
Approximated policy. The optimal policy π Ã consists of a table of 316,251 lines (each line corresponding to a possible state of the system, the whole table thus describing the action to be performed for each of the possible distributions of the 50 herds among the 5 herd statuses). Such atable cannot be easily used under field conditions. To provide the social planner with a simpler but approximated policy made of alimited number of rules, π Ã was approximated using a decision-tree into π Ã , approx , providing the actions to be performed at each time-step according to the observed state of the system. Only 7.4% of the possible states can be reached from our initial state given our parameter values. The approximation of π Ã was done only using these states, called hereafter available states. A supervised classification approach was performed using the C4.5 algorithm [39,40] available as anR package (RWeka Package). This method uses as dataset available states and their associated actions as known in the optimal policy and generates errors corresponding to misclassifications (i.e., for a few states, another action than the one given by π Ã can be predicted by π Ã , approx ). We evaluated the quality of π Ã , approx by calculating the percentage of misclassifications.
Scenario analysis. As model parameters were roughly estimated, particularly transition probabilities for which no observed data were available, we evaluated the effect on π Ã and epidemiological results of varying costs and losses (16 scenarios; Table 4), transition probabilities (19 scenarios; Table 3), and the level of protection conferred to virus-free herds by biosecurity implementation (γ; 2 scenarios; Table 2) one-at-a-time, resulting in a total of 37 scenarios. We restricted analyses to transitions impacted by the social planner's actions, i.e. all except None.
As a result ofthe high computing time needed for each scenario (nearly 20 hours despite the use of multi-thread programming), we did not investigate interactions among factors. We compared π Ã , approx among scenarios and evaluated a weighted divergence index to account for differences between retained actions in the policy for the available states only (states which can be reached from the initial state). First, we evaluated the frequency of visits of each state for all of the scenariosincluding the reference one (with all nominal values), noted wg(s)for state s. The divergence index for scenario iwas the sum of the visit frequencies over all states for which the action implemented for state s in scenario i was not similar to the action implemented for the same state sin the reference scenario:

When the social planner does not influence measures implemented on farms
A description of the evolution of (F + Fd) and I C numbers is provided in Fig 2A and 2C respectively. The median numbers of I C and of (F + Fd) after 50 years were impacted (Fig 2C) by the probability of transition from I C to Fd corresponding to the use of the individual measure of depopulation (κ), the protection due to biosecurity when in Fd (γ), the probability of transition from I C to I corresponding to virus reintroduction due to lower biosecurity and vaccination (φ), and the transmission rate due to I herds (β I ). The parameter γ impacted more (F + Fd) than I C as it was directly associated to the risk of infection (leaving status Fd) and indirectly to the control (for I C ). The probability of transition from I to I 0 corresponding to the biosecurity and vaccination (ν) impacted mainly the mean number of I C (Fig 2C) (which was expected as it concerns the control and not the virus clearance). Interactions barely influenced model output variations.
All of the parameters which can be impacted by actions of the social planner, except the transition from F to Fd (α), had a direct influence on both virus clearance and control within the system. Hence, if the social planner chose an action different from None, the prevalence of infected herds and of controlled herds should vary. The parameters that most impacted model outputs were all influenced by the social planner except the transmission rate due to I herds. As the median value of the number of (F + Fd) herds was near equilibrium (Fig 2A), we concluded that we had a good approximation of the value.

Optimal computed policy
Using π Ã , the social planner always used action Incent3during the first time-steps (Fig 3A). Then, according to the evolution of the system, the other actions also were used, action Incent2being barely used (Fig 3A). The use of π Ã reduced the disease prevalence (Fig 3B and  3C). After 45 years, the discounted cumulated rewards when using π Ã was lower than the one  Table 2 for parameter definition). https://doi.org/10.1371/journal.pone.0197612.g002 Framework to coordinate disease control decisions obtained if the social planner systematically used any other action (Fig 3D). After 50 years, using π Ã reducedthe rewards of 4% compared to the systematic use of action Incent3, and of 33% compared to the systematic use of action None. Such a result was expected as π Ã is defined as minimizing the discounted cumulative rewards. Nevertheless, the advantage of π Ã was observed after 18 years when comparing median values. Before 12 years, the discounted cumulative rewardswere similar to those obtained systematically using action Incent3because this action was used almost systematically in the first time-steps (Fig 3A). Although π Ã was more expensive than other actions over the first years (Fig 3D), we observed an advantage after 30 years as the prevalence and the discounted cumulative rewards were lower than those obtained with other actions (Fig 3C).

Approximated policy
The computed optimal policy π Ã was very complex. Working only on available states,we obtained 8 simple rules (Fig 4A). For a number of I C herds higher than or equal to 4 (most states), Incent3was used (Fig 4A, top leaf). Otherwise, for a number of I herds equals 0, actions None, Incent1,andIncent2wer e used with respect to the number of I C herds (Fig 4A, bottom  leaf). If there was at least one I herd, actions Incent2andIncent3we re used with respect to the number of S, I C , and I 0 herds (Fig 4A, bottom leaf). It has to be noted that on the branch with I ! 1 and I C ! 2 herds, it is not known how many infected herds there are. Hence, it is not possible to conclude on the actual level of infection in the system. Actions predicted by π Ã , approx were mostly the same as in π Ã (Fig 4B), and slightly differed only for states with actions Incent2 and Incent3 corresponding to less than 0.04% of misclassified actions among all of the available states. In addition, the epidemiological dynamics obtained following π Ã , approx was close to the  Representation as a decision tree; (B) Concordance between π Ã , approx and π Ã with π Ã as a reference (sum to 100 over a column). https://doi.org/10.1371/journal.pone.0197612.g004 Framework to coordinate disease control decisions one obtained following π Ã both in terms of number of virus-free herds over time (Fig 3B and  3C) and in terms of cumulated cost (Fig 3D). Hence, the decision tree π Ã , approx was a good approximation of the optimal policy π Ã , but providing a much more practical tool for decision-makers.

Impact of uncertainty in parameter values on the model behavior
Only two parameter variations induced a divergence between computed policies ( Fig 5A): a decrease in the loss due to infection in I C herds (L Ic ), and a decrease in the probability of transition from I C to Fd when action Incent3was considered (noted κ(3)). A decreased value of L Ic induced an increase in the cumulative use of action Incent1, a decrease in the cumulative use of action Incent3, and a decrease in the clearance level corresponding to an increase of the infection prevalence (Fig 5C). L Ic as a cost impacted the median total cost when both decreasing and increasing (Fig 5C). For the variation of the transition probability between I C and Fd (κ (3)), the use of each action and the total costs were impacted (Fig 5D), its increase particularly decreasing the use of Incent3while increasing the use of None and Incent1.
Variations of other parameter valuesbarely impacted π Ã , resulting in only few variations of simulated results. Obviously, if a cost varied, the total cost was modified as expected. Parameters inducing no variation of π Ã on available states were the cost of external biosecurity (Cbe), the transition probability between F and Fd (αirrespective of the action of the social planner, the transition probability between I and I 0 (ν) for action Incent1only, the transition probability between I C and I (φ) for action Incent3only, and the fixed cost (Cdiff) for action Incent1.

Discussion
In this paper, we proposed a framework which can be used by a decision-maker acting as a social planner to identify an adaptive strategy consisting of incentives in order to optimise the net financial benefit at the collective scale for a group of farmers. We applied this framework for the control of PRRS, an endemic non-notifiable disease. The computed policy providing better net financial benefit was adaptive. It illustrated the potential of adaptive approach to propose an optimal dynamic strategy. Even if some incentives were expensive, the optimisation over a long horizon took into account the incentive benefits to balance these high costs. This framework is a tool to help a social planner to define collective schemes, provided that the social planner is accurately informed about the health status of the herds.As an example, for PRRS, the Morrison's Swine Health Monitoring Project is conducted on a convenience sample of 910 herds for which diagnostic status is reported weekly [41]. To be used in field conditions (and thus as a tool for decision support), the framework should be specified according to the aim of the collective decision-maker by defining the corresponding objective function, the horizon for optimisation, the incentive levels, and the impact of farmers' response to incentives. As the framework can become verycomplex for use by a decision maker ("curse of dimensionality"), a compromise should be found between realism and simplification. In such a case, the decision-maker should be involved in the modelling phase to inform the model assumptions and interpret results. One perspective to this study is to implement a participative research project in which stakeholders would be involved in order to incentivize them to test and improve our framework.
The definition of the objective function is one of the most important components. According to the objective function, the MDP policies can be different as shown in [20]. In our case study, we optimised the net financial benefit at the collective scale. The disease prevalence which is a result of interest for epidemiologists was introduced indirectly in the objective function as each infected farm induced losses and control costs for the decision-maker. Even if the optimisation was based on losses and control costs, the computed policy in our case study had a positive impact on disease prevalence. On the other hand, decision-makers may be interested in optimising simultaneously several objectives (for instance decreasing both costs and disease prevalence). For optimising more than one criterion at a time, the framework has to be adapted.The method used here to compute the optimal policy does not allowthe use of combined criterion. However, relevant algorithms to compute the optimal policy in such a case can be found in the literature [42,43].
It is not straightforward to anticipate when a given disease situation will show a large impact of an adaptive policy compared to a fixed one.Of course, it is expected to vary with the pathosystem, as well as with the considered control options and incentives. As regards our case study on PRRS virus, the difference between the adaptive policy and a fixed one is small as regards saved euros (4%) but large as regards acceptability of the collective scheme. Indeed, eradication can almost be achieved without the need of implementing forever a very constraining action (depopulation). This action is already known by health managers as the main one that can impact PRRS herd prevalence.It is also known by that eradication of such an endemic disease cannot be achieved in a few years only. However, when to stop a control action such as depopulation was unknown. We highlighted that 10 years after the start of depopulation implementation in detected infected herds, this action could be progressively stopped and replaced by lighter ones for the next 10 years without impairing the large reduction in prevalence almost achieving eradication which would have been obtained with a fixed action on the same duration.
Another main component of the framework to be decided with the decision-maker is the horizon of optimisation. In our case study, we used an infinite time horizon. It was retained as PRRS virus is highly prevalent [28] and often no eradication was looked for at first, leading us to assume that the disease may persist in the long run. The optimal policy computed with the infinite time horizon can be acceptable even for a finite horizon if the finite horizon is long enough and if an accurate discount factor is used. In our case study, the total cost obtained with the optimal policy was better than others after 25 years onward ( Fig 3D). Nevertheless, the social planner may choose to reach an objective before a specified time (finite horizon). In our framework, the computation of the optimal policy should be modified to optimise only over a finite horizon. Meanwhile, when computing a policy over a finite horizon, it has to be assumed that the system does not exist after the end of the optimisation horizon, which may induce some drawbacks when the policy is used for long-term system [27]. For example, some incentives may avoid the steps just before the end of the horizon if their impact can be observed only a few time-steps after. To avoid such drawbacks, a rolling horizon approach has been proposed [44] but optimality is not guaranteed [45]. The decision-maker thus should define the horizon having knowledge about advantages and drawbacks. Regarding our framework, we used the Value Iteration algorithm [26] which can be used for computing both finite and infinite horizons and adapted for computing rolling horizon.
To render the policy usable in field conditions, we proposed a way to approximate the optimal computed policy which was far too complex for practical purposes. For a decision-maker, having simple decision rules is important for practical use in field conditions as pointed out by  Tables 2-4). (A) Weighted divergence index, parameters having different values according to the social planner's action as shown by the number in parentheses (1 for Incent1, 2 for Incent2, and 3 for Incent3). (B) Reference spider plot provided for label definitions (no parameter variation): branches denote the considered outputs, curve position on a given branch describes the impact of varying the value of one parameter on this output. For each output, relative values are provided with respect to the reference scenario (thick line at 0.0). A value below (above) the thick line denotes a decrease (an increase) in the output (green '-' and red '+' areas, respectively). (C) Spider plot obtained when varying the cost of controlled infected herds (L Ic , Table 3) (D) Spider plot obtained when varying k(3), i.e. the transition probability from I C to Fdwhen action Incent3 (depopulation) is retained by the social planner (Table 4). https://doi.org/10.1371/journal.pone.0197612.g005 Framework to coordinate disease control decisions Pichancourtet al. [46]. It was possible to transform the policy into decision rules (if . . . then . . . else . . .), using for example the approach of Gil et al. [47]. However, due to the complexity of the optimal computed policy in our case study, the number of obtained rules using such an approach would have been far too high to be usable under field conditions. Therefore, we used a method developed in the area of data mining for the analysis of large volume of data [39] and produced a decision tree having only eight rules. We showed that this approximation was relevant regarding retained actions. In [48], an approximated policy was also computed to simplify their complex policy. As in our paper, the agreement was verified only in terms of actions, not on the value of the objective function. The optimality thus is not guaranteed. Steimle and Denton [49] considered that proposing a simple approximated policy although sub-optimal is a way to enhance the application of the policy by decision-makers.
The sensitivity analysis is a crucial step in modelling work to assess the impact of parameter uncertainty on model outputs [37]. Performing a sensitivity analyses helps the decision-maker to identify knowledge gaps impacting predictions. Here, two analyses have been performed. The first one was based on the model without incentive (always doing nothing), which allows the decision-maker to identify possible policy instruments. The second one highlighted parameters impacting the predicted optimal policy, illustrating for the decision-maker the impact of parameter uncertainty on the policy. More precise values are needed for parameters corresponding to pathogen spread or to an economic value (for example in our case study, parameter L Ic corresponding to the losses for Ic herds). In addition, some identified parameters correspond to the impact of the farmer's response to a decision-maker action (incentive), such as the transition from I C to Fdwhich corresponds to depopulation when the social planner retains action Incent3.For such parameters, a knowledge on how farmers would respond to incentives is needed. To estimate farmers' responses, either data on previous use of incentives can be explored [50] or behaviour experiments can be proposed such as in [51]. Moreover, global sensitivity analysis-in which the values of all parameters simultaneously vary-can be done only for the first analysis. For the second one, the approach proposed by [52] evaluated the result confidence both on policy and objective function. In this approach, parameters would be considered either one by one or jointly. In our case study, we did only scenario analysis corresponding to a one-at-a-time (OAT) analysis with two values for each parameter due to the high computing time per parameter set.
Our framework is a novel contribution in terms of a decision-making support that can be applied to animal disease control situations. In the special case of non-notifiable diseases, control decisions are made by farmers based on the risk of their herd being infected and on disease consequences, sometimes in interaction with a social planner. In animal health literature, models focused on individual decision and scenarios comparing collective incentive schemes [12,53]. These models did not consider an optimal decision of the social planner,but only the impact of incentives on farmer's decisions. Instead, we proposed to consider interactions between farmers' decisions and collective actions through incentives. Our framework provides a practical tool for decision makers to evaluate a prioritheir policy under a variety of epidemiological situations and incentive levels.
Supporting information S1 File. Code. This archive contains the data and the Java code required to run the MDP model.