Optimal environmental testing frequency for outbreak surveillance

Public health surveillance for pathogens presents an optimization problem: we require enough sampling to identify intervention-triggering shifts in pathogen epidemiology, such as new introductions or sudden increases in prevalence, but not so much that costs due to surveillance itself outweigh those from pathogen-associated illness. To determine this optimal sampling frequency, we developed a general mathematical model for the introduction of a new pathogen that, once introduced, increases in prevalence exponentially. Given the relative cost of infection vs. sampling, we derived equations for the expected combined cost per unit time of disease burden and surveillance for a specified sampling frequency, and thus the sampling frequency for which the expected total cost per unit time is lowest.


Surveillance cost
An important consideration for implementing environmental surveillance for pathogens is the frequency at which tests are performed.Environmental sampling and testing should be done frequently enough that an emerging pathogen is intercepted quickly, but not so frequently that surveillance costs outweigh the benefits of early detection.Here, we assume that whenever the environment is sampled and a test is conducted, a surveillance cost equal to c 1 is incurred.We also assume that surveillance costs are additive, so that if n tests are performed, then the total surveillance cost is equal to nc 1 .
We consider that environmental tests are performed with period T .Since the cost of a single test is equal to c 1 , and since the time between tests is equal to T , the surveillance cost per unit time is given by 2 Expected infection cost The costs incurred from the surveillance program itself must be considered in the context of infection-related costs.The costs due to an outbreak can vary depending on several factors: • When the pathogen first appears in relation to the first environmental test that is performed following its introduction • How the pathogen grows after it is introduced • The sensitivity of the environmental testing program for detecting the pathogen • The per-case infection cost In this section, we describe each of these points in detail.Considering the full stochastic dynamics of pathogen initiation, pathogen growth, and pathogen detection, we derive a solution for the expected size of an outbreak when it is detected.We then derive an approximation for the expected size of an outbreak by assuming deterministic growth of a pathogen after it is initiated.

Emergence of a pathogen
For determining the optimal testing frequency, we require knowledge of how new pathogens are introduced.We assume that the introduction of new pathogens follows a Poisson process.New pathogens are initiated independently and continuously in time at rate λ.

Growth of a pathogen
We also require knowledge of how a pathogen increases in abundance once it first appears.
Here, we assume that each instance of the pathogen makes new instances of the pathogen at rate r according to a Poisson process.Let x m,n (rt) denote the probability that there are n copies of the pathogen at time t, given that there are m copies of the pathogen at time 0. In this section, we present the steps for calculating x m,n (rt), beginning with the simplest cases and then progressing to the solution for any values of m and n, where m ≤ n.

m = 1, n = 1
Suppose we start with a single instance of the pathogen at time 0 (m = 1).x 1,1 (rt) gives the probability that the original instance of the pathogen has not produced any new instances of the pathogen up to time t (n = 1).x 1,1 (rt) is given by Next, consider x 1,2 (rt), which is the probability that the original instance of the pathogen has produced a single new instance of the pathogen by time t (n = 2).For this to occur, three things must happen: The original instance of the pathogen does not make any new instances of the pathogen between times 0 and t 1 , the original instance of the pathogen makes a new copy of itself at time t 1 , and neither of the two resulting instances of the pathogen make any new instances of the pathogen between times t 1 and t.We must integrate over all values of t 1 between 0 and t: Simplifying, we have Performing the integration, we get This then becomes Next, consider x 1,3 (rt), which is the probability that the original instance of the pathogen has led to two new instances of the pathogen by time t (n = 3).For this to occur, five things must happen: The original instance of the pathogen does not make any new instances of the pathogen between times 0 and t 2 , the original instance of the pathogen makes a new copy of itself at time t 2 , neither of the two resulting instances of the pathogen make any new instances of the pathogen between times t 2 and t 1 , one of the two instances of the pathogen makes a new copy of itself at time t 1 , and none of the three resulting instances of the pathogen make any new instances of the pathogen between times t 1 and t.We must integrate over all values of t 2 between 0 and t 1 , and we must integrate over all values of t 1 between 0 and t: e −rt 2 (r dt 2 )e −2r(t 1 −t 2 ) (2r dt 1 )e −3r(t−t 1 ) We can extend the range of the integration over t 2 from t 2 = 0 to t 2 = t if we also divide by 2: Simplifying, we have Performing the integration, we get This then becomes Next, consider x 1,4 (rt), which is the probability that the original instance of the pathogen has led to three new instances of the pathogen by time t (n = 4).For this to occur, seven things must happen: The original instance of the pathogen does not make any new instances of the pathogen between times 0 and t 3 , the original instance of the pathogen makes a new copy of itself at time t 3 , neither of the two resulting instances of the pathogen make any new instances of the pathogen between times t 3 and t 2 , one of the two instances of the pathogen makes a new copy of itself at time t 2 , none of the three resulting instances of the pathogen make any new instances of the pathogen between times t 2 and t 1 , one of the three instances of the pathogen makes a new copy of itself at time t 1 , and none of the four resulting instances of the pathogen make any new instances of the pathogen between times t 1 and t.We must integrate over all values of t 3 between 0 and t 2 , we must integrate over all values of t 2 between 0 and t 1 , and we must integrate over all values of t 1 between 0 and t: We can extend the range of the integration over t 3 from t 3 = 0 to t 3 = t and the range of the integration over t 2 from t 2 = 0 to t 2 = t if we also divide by 3!: e −rt 3 (r dt 3 )e −2r(t 2 −t 3 ) (2r dt 2 )e −3r(t 1 −t 2 ) (3r dt 1 )e −4r(t−t 1 ) Simplifying, we have Performing the integration, we get This then becomes x 1,4 (rt) = 1 − e −rt 3 e −rt 2.2.5 m = 1, any n We can generalize the calculation to arbitrary values of n: Changing the integration limits, we have Performing the integration, we get

Any m and n
Following the same procedure, we can calculate x m,n (rt): Changing the integration limits, we have This simplifies further: We can rewrite this as e −(n−j+1)r(t j−1 −t j ) (r dt j ) e −mrt n−m This becomes Performing the integration, we get

Detection of a pathogen
We further require an understanding of how the outbreak is detected.Consider that there are n instances of the pathogen within a particular lineage when the environment is tested.We assume that each instance of the pathogen is not detected independently with probability q.The outbreak is not detected if and only if no instance of the pathogen is detected, which occurs with probability q n .Therefore, the pathogen is detected with probability 1 − q n .We further assume that each lineage of the pathogen is detected independently of any other lineage.For example, suppose that two lineages of the pathogen are simultaneously present.Suppose that when the environment is tested, Lineage A contains n A copies of the pathogen, and Lineage B contains n B copies of the pathogen.In this case, Lineage A is detected with probability 1 − q n A , and Lineage B is detected with probability 1 − q n B .(If the rate of introduction of new pathogens, λ, is small, then simultaneous presence of two lineages would be a rare occurrence.Nonetheless, we describe this possibility so that the stochastic dynamics of pathogen initiation, pathogen growth, and pathogen detection are completely specified.)

Expected size of an outbreak when it is detected
Using the stochastic rules presented above, and using Equation (S3), we can derive a formula for the expected size of an outbreak when the pathogen is detected.For understanding the steps of the calculation, we define X i (a i ) to be the probability that there are i testing events following the appearance of the pathogen that fail to detect the pathogen, and that there are a i infections when the pathogen is detected.
We first consider the following question: What is the probability that the pathogen is detected in the first test following its appearance and that there are a 0 instances of the pathogen when it is detected?This probability, which we denote X 0 (a 0 ), is given by There are three components to this calculation: • The pathogen is initiated at time τ before the testing event that detects it occurs.If the pathogen emerges just before the test that detects it is performed, then τ is slightly greater than 0. If the pathogen emerges just after the previous test, then τ is slightly less that T .Therefore, we have 0 ≤ τ < T .Since new lineages appear independently and continuously in time, τ is equiprobably distributed between 0 and T , hence the integration T 0 dτ /T .
• The pathogen begins as a single infection, and it grows to a 0 infections at time τ since its appearance with probability x 1,a 0 (rτ ).
• At least one of the a 0 infections is detected with probability 1 − q a 0 .
Next, we can ask: What is the probability that the pathogen is detected in the second test following its appearance and that there are a 1 instances of the pathogen when it is detected?This probability, which we denote X 1 (a 1 ), is given by x 1,a 0 (rτ )q a 0 x a 0 ,a 1 (rT )(1 − q a 1 ) This calculation is understood as follows: The first test occurs at time τ after the pathogen appears, the pathogen grows to a 0 infections at time τ after its emergence, none of those a 0 infections are detected in the first test, the pathogen then grows to a 1 infections at time τ + T after its emergence, and at least one of those a 1 infections is detected in the second test.We must sum over all values of a 0 between 1 and a 1 .
We can further ask: What is the probability that the pathogen is detected in the third test following its appearance and that there are a 2 instances of the pathogen when it is detected?This probability, which we denote X 2 (a 2 ), is given by x 1,a 0 (rτ )q a 0 x a 0 ,a 1 (rT )q a 1 x a 1 ,a 2 (rT )(1 − q a 2 ) This calculation is understood as follows: The first test occurs at time τ after the pathogen appears, the pathogen grows to a 0 infections at time τ after its emergence, none of those a 0 infections are detected in the first test, the pathogen then grows to a 1 infections at time τ + T after its emergence, none of those a 1 infections are detected in the second test, the pathogen then grows to a 2 infections at time τ + 2T after its emergence, and at least one of those a 2 infections is detected in the third test.We must sum over all values of a 0 between 1 and a 1 and over all values of a 1 between 1 and a 2 .
The calculation of X 3 (a 3 ) follows in the same manner: To calculate the expected size of an outbreak, we sum X m (a m )a m over all possible sizes of the outbreak when the pathogen is detected (1 ≤ a m < ∞) and over all possible numbers of failed tests (0 ≤ m < ∞): Equation (S4) can be written as follows:

Approximation for n
Equation ( S5) is analytically unwieldy.To make progress, we derive an approximate solution for the expected size of an outbreak.

p = 1
To calculate a solution for n for p = 1, we use Equation (S2), and we integrate over all possible values of τ between 0 and T : Performing the integration, we obtain To calculate an approximate solution for n for small values of p, we assume that the pathogen grows deterministically after it is initiated (Figure S1).The first several steps of this process are as follows: For calculating an approximation for the expected size of an outbreak when it is detected, we can assume that the size of the outbreak grows deterministically.
• The first infection occurs, and at time τ after its emergence, a test is performed.The size of the outbreak when the first test is performed is equal to e rτ .
• If the pathogen is not detected in the first test, which occurs with probability q e rτ , then the pathogen grows until the second test is performed, and the size of the outbreak is equal to e r(τ +T ) .
• If the pathogen is not detected in the first test and the second test, which occurs with probability q e rτ q e r(τ +T ) , then the pathogen grows until the third test is performed, and the size of the outbreak is equal to e r(τ +2T ) .
• If the pathogen is not detected in the first test, the second test, and the third test, which occurs with probability q e rτ q e r(τ +T ) q e r(τ +2T ) , then the pathogen grows until the fourth test is performed, and the size of the outbreak is equal to e r(τ +3T ) .
This process continues until the pathogen is detected.We therefore have the following result for the expected size of an outbreak: + q e rτ e r(τ +T ) − e rτ + q e rτ q e r(τ +T ) e r(τ +2T ) − e r(τ +T ) + q e rτ q e r(τ +T ) q e r(τ +2T ) e r(τ +3T ) − e r(τ +2T ) Considering that p ≪ 1 and rT ≪ 1, this can be simplified: n p,rT ≪1 ≈ q e rT e 2rT − e rT + q e rT q e 2rT e 3rT − e 2rT + q e rT q e 2rT q e 3rT e 4rT − e 3rT + • • • More compactly: The process can also be considered by defining Here, p is the probability that a single infection is detected in a testing event, so that the probability that an outbreak of size n is detected in a testing event is given by 1 − (1 − p) n .Substituting Equation (S8) into Equation (S7), we have Next, we approximate k−1 x=0 e −xrT by 1 − e −rT −1 , and we approximate the summation over k by an integration over k: Performing the integration and simplifying, this becomes In the limit that p approaches 1, Equation (S9) approaches 0. In the limit that p approaches 0, Equation (S9) becomes arbitrarily large.Therefore, we can add Equations (S6) and (S9) together to obtain an approximation for n for any value of p: 3 Expected total cost per unit time For optimizing the testing frequency, the quantity of interest is the expected total cost per unit time.The surveillance cost per unit time, C 1 , is given by Equation (S1).Let C 2 denote an approximation for the expected infection cost per unit time, and let C denote an approximation for the expected total cost per unit time.We have For determining C 2 , we assume that each infection contributes a cost c 2 .If new lineages appear at rate λ, then C 2 is given by Substituting Equations ( S1), (S12), and (S10) into Equation (S11), we obtain

Optimal testing frequency
Equation (S13) specifies the expected total surveillance and pathogen cost per unit time.C is a function of the testing period, T , and we seek the value of T for which C is minimal.The first step is to show that C has a single minimum at a particular value of T .To do this, we differentiate C twice with respect to T : In Equation (S15), the quantity rT 3 /(λc 2 ) is necessarily positive.If the right-hand side of Equation ( S15) is positive for positive values of T , then d 2 C/dT 2 is necessarily positive.Note that lim We also have d dT From Equations ( S16) and (S17), it follows that the right-hand side of Equation ( S15) is necessarily positive.Therefore, We also have lim From Equations (S19), (S20), and (S18), it follows that there is a single value of T for which C is minimized.
To determine the optimal testing period, we set dC/dT = 0 and T = T * in Equation (S14).We arrive at an implicit solution for the optimal testing period, T * : The optimal testing frequency is given by 3.1.1Asymptotic behavior as p → 1 Taking the limit p → 1 in Equation (S21), we obtain the following equation for T * : Letting W 0 (x) denote the principal branch of the Lambert W function, and using Equation (S22), we obtain an explicit solution for the optimal testing frequency: Equation ( S23) specifies the optimal testing frequency if p = 1.

Distribution of pathogen-related parameters
The calculation of the expected infection cost per unit time, C 2 , assumes that, for each lineage that appears, the pathogen-specific parameters c 2 , r, and p are the same.The expected infection cost per unit time is then just the expected cost due to a single lineage multiplied by the rate, λ, at which those lineages arise.More generally, we can consider dc 2 dr dp λ ′ (c 2 , r, p) to be the (infinitesimal) rate at which lineages with pathogen-specific parameters c 2 , r, and p appear.In this generalized model, let C ′ 2 denote an approximation for the expected infection cost per unit time, and let C ′ denote an approximation for the expected total cost per unit time.We have With knowledge of the rate density function, λ ′ (c 2 , r, p), we are able to compute the expected infection cost per unit time by integrating over all possible values of c 2 , r, and p: Substituting Equations ( S1) and (S26) into Equation (S25), we obtain The optimal testing frequency is given by Equations ( S27) and (S28) can be solved numerically to determine the optimal testing frequency.Below, we consider several simple examples for which Equation (S27) can be solved analytically to show how the model works.

Example 1
As the simplest example of using Equation (S27), consider that only a single type of pathogen can emerge.The pathogen has per-case cost c ′ 2 , growth rate r ′ , and probability of detection p ′ , and new lineages are introduced at rate λ.The rate density function, λ ′ (c 2 , r, p), is given by Here, δ denotes the Dirac delta function.When this form for λ ′ (c 2 , r, p) is substituted into Equation (S27) and the integrations over c 2 , r, and p are performed, we obtain Thus, Equation (S27) reduces to Equation (S13) for the case where only a single type of pathogen with fixed parameters can emerge.

Example 2
Next, consider the possibility that two different types of pathogens can emerge.Pathogen 1 has parameters c ′ 2 , r ′ , and p ′ , while Pathogen 2 has parameters c ′′ 2 , r ′′ , and p ′′ .Lineages of Pathogen 1 are introduced at rate λ 1 , and lineages of Pathogen 2 are introduced at rate λ 2 .The corresponding rate density function is When this form for λ ′ (c 2 , r, p) is substituted into Equation (S27) and the integrations are performed, we obtain The expected total cost per unit time, C ′ , is therefore equal to the surveillance cost per unit time, plus the expected infection cost per unit time for Pathogen 1, plus the expected infection cost per unit time for Pathogen 2.

Example 3
These considerations can be extended to the case where many different types of pathogens can emerge.Let Pathogen n have per-case cost c 2,n , growth rate r n , and probability of detection p n .The rate density function, λ ′ (c 2 , r, p), is given by Substituting this into Equation (S27) and integrating yields The expected infection cost per unit time is therefore linear-i.e., we add together the expected infection costs per unit time for each of the n possible types of pathogens, and this sum equals the total expected infection cost per unit time.

Example 4
The possible parameter values that any new pathogen can have are not discrete.They are continuous.To show how this works, consider the following form for the rate density function: For this case, new pathogens have growth rate r ′ and probability of detection p ′ .New pathogens can, however, have any real value of c 2 that is nonnegative.For any lineage that is introduced, its value of c 2 is most likely to be close to zero, while larger values of c 2 occur more rarely.The parameter a controls the width of the probability density function for c 2 .For smaller values of a, this distribution has a longer tail, and the expected value of c 2 for any new pathogen increases.Substituting this form for the rate density function into Equation (S27) and integrating, we have

Example 5
For this example, we suppose that any new pathogen has per-case infection cost c ′ 2 and probability of detection p ′ , while the growth rate, r, can be any nonnegative real number.We use the following form for the rate density function: Substituting this into Equation (S27) and integrating, we have

Example 6
We can also model the case where new pathogens have per-case cost c ′ 2 and growth rate r ′ , while the probability of detection, p, can be any real number between 0 and 1. Suppose that the rate density function has the following form: Here, θ denotes the Heaviside step function.Substituting this into Equation (S27) and integrating, we obtain 5 Approximation for F * If we have excellent understanding of the stochastic dynamics and their associated parameters, then Equations (S27) and (S28) specify the sampling frequency for which the expected total cost per unit time is minimal.However, in real settings, understanding of the underlying dynamics and parameters would only be approximate.A useful result, then, is a simple equation that approximately specifies the optimal sampling frequency and can be easily solved.Accordingly, we approximate Equation (S27) for rT ≪ 1 as follows: For small values of p, this can be approximated further: Differentiating with respect to T , we have Setting dC ′ /dT = 0 yields an approximation for the optimal sampling frequency: 6 Extensions of the model In this section, we explore some simple extensions of the model.

Example 1
A realistic possibility is that the cost due to each test, c 1 , is not constant but is dependent on the sensitivity of the test, p.Our model readily incorporates this generalization if we set c 1 → c 1 (p) in our equations.If c 1 is an increasing function of p, then the inverse relationship between the optimal sampling frequency and p will be stronger than for the case where c 1 is constant.

Example 2
Our analytical calculation of the expected total cost per unit time is based on the assumption that the cost of an outbreak scales linearly with the number of infections.During the early stages of an outbreak, we believe this assumption to be reasonable.However, our model works the same for alternative assumptions about how the cost of an outbreak depends on the number of infections.To demonstrate this, we consider here a different possibility: that the cost of an outbreak scales quadratically with the number of infections.If we denote an approximation for the expected total cost per unit time for this scenario by C sq , then this quantity is equal to The task is to calculate the expected squared size of an outbreak, n 2 .The steps in the calculation are identical to the case of linear costs.
To calculate a solution for n 2 for p = 1, we use Equation (S2), and we integrate over all possible values of τ between 0 and T : Performing the integration, we obtain For p ≪ 1 and rT ≪ 1, we have the following result for the expected size of an outbreak: n 2 p,rT ≪1 ≈ q e rT e 4rT − e 2rT + q e rT q e 2rT e 6rT − e 4rT + q e rT q e 2rT q e 3rT e 8rT − e 6rT + • • • More compactly: Next, we approximate k−1 x=0 e −xrT by 1 − e −rT −1 , and we approximate the summation over k by an integration over k: This can be rewritten as Integrating by parts, we have Performing the integration and simplifying, this becomes In the limit that p approaches 1, the above expression approaches 0. In the limit that p approaches 0, the above expression becomes arbitrarily large.Therefore, we can add Equations (S30) and (S31) together to obtain an approximation for n 2 for any value of p: Equation (S33) specifies the expected total cost per unit time for the case where the cost due to a single outbreak scales quadratically with the number of infections.

Example 3
Our model can be adapted to the case where breakthrough infections are a possibility.As a simple example, we suppose that immediately after an outbreak is detected and intervention is applied, a single new infection is initiated with probability b, while with probability 1 − b, there are no follow-up infections.The approximate expected total cost per unit time in this scenario, which we denote C b , is given by n τ =T denotes the expected size of an outbreak, given that the first environmental test is performed at time T after the outbreak begins.The summation in square brackets is understood as follows: • With probability 1 − b, all infections of the original outbreak are controlled, and there are no further pathogen-related costs.The expected total cost related to the original outbreak is equal to c 2 n .
• With probability b(1−b), a single case of the original outbreak is not controlled.It then leads to a second outbreak, and the pathogen must again be detected and controlled.
The expected size of the second outbreak when it is detected is equal to n τ =T .The expected total cost related to the original outbreak and the second outbreak is equal to c 2 ( n + n τ =T ).
• With probability b 2 (1 − b), a single case of the original outbreak is not controlled.It then leads to a second outbreak, and the pathogen must again be detected and controlled.The expected size of the second outbreak when it is detected is equal to n τ =T However, a single case of the second outbreak is not controlled, leading to a third outbreak.The expected size of the third outbreak when it is detected is equal to n τ =T .The expected total cost related to the original outbreak, the second outbreak, and the third outbreak is equal to c 2 ( n + 2 n τ =T ).
The understanding is similar for higher-order terms in the summation in Equation (S34).This equation can be rewritten: This is equal to This becomes

Figure S1 :
Figure S1: Schematic showing deterministic growth of the pathogen.For calculating an approximation for the expected size of an outbreak when it is detected, we can assume that the size of the outbreak grows deterministically.