Moderate deviations and extinction of an epidemic

Consider an epidemic model with a constant flux of susceptibles, in a situation where the corresponding deterministic epidemic model has a unique stable endemic equilibrium. For the associated stochastic model, whose law of large numbers limit is the deterministic model, the disease free equilibrium is an absorbing state, which is reached soon or later by the process. However, for a large population size, i.e. when the stochastic model is close to its deterministic limit, the time needed for the stochastic perturbations to stop the epidemic may be enormous. In this paper, we discuss how the Central Limit Theorem, Moderate and Large Deviations allow us to give estimates of the extinction time of the epidemic.


Introduction
We consider epidemic models where there is a constant flux of susceptible individuals, either because the infected individuals become susceptible immediately after healing (SIS model), or after some time during which the individual is immune to the illness (SIRS model), or because there is a constant flux of newborn or immigrant susceptibles (SIR model with demography).
In the above three cases, for certain values of the parameters, there is an endemic equilibrium, which is a stable equilibrium of the associated deterministic epidemic model. The deterministic model can be considered as the Law of Large Numbers limit (as the size of the population tends to ∞) of a stochastic model, where infections, healings, births and deaths happen according to Poisson processes whose rates depend upon the numbers of individuals in each compartment.
Since the disease free states are absorbing, it follows from an irreducibility property which is clearly valid in our models, that the epidemic will stop soon or later in the more The corresponding stochastic model is as follows. Let S N t (resp. I N t ) denote the proportion of susceptible (resp of infectious) individuals in a population of total size N .
Here P inf (t) and P rec (t) are two mutually independent standard (i.e. rate 1) Poisson processes. Let us give some explanations, first concerning the modeling, then concerning the mathematical formulation.
Let S N t (resp. I N t ) denote the number of susceptible (resp. infectious) individuals in the population. The equations for those quantities are the above equations, multiplied by N . The argument of P inf (t) reads Note that S N t = N −1 S N t and I N t = N −1 I N t , so that in particular all those processes live on the same time scale. The formulation of such a rate of infections can be explained as follows. Each infectious individual meets other individuals in the population at some rate β. The encounter results in a new infection with probability p if the partner of the encounter is susceptible, which happens with probability S N t /N , since we assume that each individual in the population has the same probability of being that partner, and with probability 0 if the partner is an infectious individual. Letting λ = βp and summing over the infectious individuals at time t gives the above rate. Concerning recovery, it is assumed that each infectious individual recovers at rate γ, independently of the others.

The SIRS model
In the SIRS model, contrary to the SIS model, an infectious who heals is first immune to the illness, he is "recovered", and only after some time does he loose his immunity and turn to susceptible. The deterministic SIRS model reads      s (t) = −λs(t)i(t) + ρr(t), i (t) = λs(t)i(t) − γi(t), r (t) = γi(t) − ρr(t), while the stochastic SIRS model reads These two models could be reduced to two-dimensional models for z(t) = (i(t), s(t)) (resp. for Z N t = (I N t , S N t )).

The SIR model with demography
In this model, recovered individuals remain immune for ever, but there is a flux of susceptibles by births at a given rate multiplied by N , while individuals from each of the EJP 25 (2020), paper 25.
three compartments die at rate µ. Thus the deterministic model r (t) = γi(t) − µr(t), whose stochastic variant reads S N r dr , Remark 2.1. One may think that it would be more natural to decide that births happen at rate µ times the total population. The total population process would be a critical branching process, which would go extinct in finite time a.s., which we do not want.
Next it might seem more natural to replace in the infection rate the ratio S N t /N by , which is the actual ratio of susceptibles in the population at time t. It is easy to show that S N t + I N t + R N t is close to N , so we choose the simplest formulation. Again, we can reduce these models to two-dimensional models for z(t) = (i(t), s(t)) (resp. for Z N t = (I N t , S N t )), by deleting the r (resp. R N ) component.

The stochastic model
The three above stochastic models are of the following form.
In the case of the SIRS model, d = 2, k = 3, In the case of the SIR model with demography, we can restrict ourselves to d = 2, While the above expressions has the advantage of being concise, we shall rather use the following equivalent formulation of (3.1 The joint law of {Z N , N ≥ 1} is the same law of a sequence of random elements of the Skorohod space D([0, T ]; R d ), whether we use (3.1) or (3.2) for its definition.
Let us state the assumptions which we will need in section 4 below. Those are more than necessary for the results of the present section to hold, see [1] for the proofs.
Remark 3.1. In practice, in our models, either the process Z N t takes its values in a compact subset of R d (this is the case for all models with a constant population size), or else we restrict ourselves to such a situation, by stopping the process when the total population exceeds a given large value, see section 4.2.7 in [1]. As a consequence, the boundedness of β j is not a severe assumption, while (H.2) would follow from the fact that b is of class C 2 . It is possible that our results could be extended to b being C 1 , since it implies that ∇b is continuous, hence uniformly continuous since its argument remains in a compact set, but that would require a very careful check.
Concerning the initial condition, we assume that for some z ∈ + is the vector whose i-th component is the integer part of the real number N z i .

Law of Large Numbers
We have a Law of Large Numbers Theorem 3.2. Let Z N t denote the solution of the SDE (3.1). Then Z N t → z t a.s. locally uniformly in t, where {z t , t ≥ 0} is the unique solution of the ODE The main argument in the proof of the above theorem is the fact that, locally uniformly in t, P (N t) N → t a.s. as N → ∞.

Central Limit Theorem
We also have a Central Limit Theorem. Let U N t := for the topology of locally uniform convergence, where {U t , t ≥ 0} is a Gaussian process of the form where {(B 1 (t), B 2 (t), . . . , B k (t)), t ≥ 0} are mutually independent standard Brownian motions.
The proof of this theorem can be found e.g. in [1]. It relies essentially upon the fact that if P is a rate 1 Poisson process, then Remark 3.4. One may wonder whether the above two theorems are compatible with the fact that for each N fixed, as we shall see below, extinction happens in finite time a.s. However, the extinction time tends to +∞ as N → ∞, so that the determinsitic limit of Z N is a function whose infectious component remains > 0 for all t, and for any fixed t the fluctuations √ N (Z N t − z t ) tend to be Gaussian as N → ∞.

Large deviations, and extinction of an epidemic
We denote by AC T,d the set of absolutely continuous functions from [0, T ] into R d . For any φ ∈ AC T,d , let A k (φ) denote the (possibly empty) set of functions c ∈ L 1 (0, T ; R k + ) such that c j (t) = 0 dt a.e. on the set {t, β j (φ t ) = 0} and We define the rate function where as usual the infimum over an empty set is +∞, and with g(ν, ω) = ν log(ν/ω) − ν + ω. We assume in the definition of g(ν, ω) that for all ν > 0, log(ν/0) = ∞ and 0 log(0/0) = 0 log(0) = 0. Given z ∈ R d , we shall denote by Z N,z N the solution of equation ( A proof of this result can be found in [5] and in [1]. A slight reinforcement of this theorem allows us to conclude Theorem 3.6 below. In what follows, we assume that the first component of Z N t (resp. of z(t)) is I N t (resp. i(t)). Assume that the deterministic ODE which appears in Theorem 3.2 has a unique stable equilibrium z * whose first component satisfies z * 1 > 0. We define Let now We have the EJP 25 (2020), paper 25. Theorem 3.6. Given any η > 0, for any z with z 1 > 0, Moreover, for all η > 0 and N large enough, We refer for the proof of this Theorem to [5] and [1].
It is important to evaluate the quantity V . Note that it is the value function of an optimal control problem. In case of the SIS model, which is one dimensional, one can solve this control problem explicitly with the help of Pontryagin's maximum principle, see [9], and deduce in that case that V = log λ γ − 1 + γ λ . For other models, one can compute numerically a good approximation of the value of V for each given value of the parameters.

CLT and extinction of an epidemic
The discussion of this subsection, which motivates the moderate deviations approach of this paper, is taken from section 4.1 in [1]. Consider the SIR with demography.
We assume that λ > γ + µ, in which case there is a unique stable endemic equilibrium, namely z * = (i * , s * ) = ( µ γ+µ − µ λ , γ+µ λ ). We can study the extinction of an epidemic in the above model using the CLT. We note that the basic reproduction number R 0 and the expected relative time of a life an individual is infected, ε, are given by The rate of recovery γ is much larger than the death rate µ (52 compared to 1/75 for a one week infectious period and 75 year life length) so we use the approximations R 0 ≈ λ/γ and ε ≈ µ/γ. Denote again by I N t the fraction of the population which is infectious in a population of size N . The law of large numbers tells us that for N and t large, I N t is close to i * . The central limit theorem tell us that √ N (I N t − i * ) converges to a Gaussian process, whose asymptotic variance can be shown to well approximated by , and for large t the law of U 1 (t) is approximately N (0, R −1 0 ). This suggests that for large t, the number of infectious individuals in the population is approximately Gaussian, with mean N i * and standard deviation N/R 0 . If N i * and N/R 0 are of the same order, i.e. N is of the same order as 1 (i * ) 2 R0 , it is likely that the fluctuations described by the central limit theorem explain that the epidemic might cease in time of order one. This gives a critical population size roughly of the order of in fact probably a bit larger than that.
Consider measles prior to vaccination. In that case it is known that R 0 ≈ 15, and ε ≈ 1/75 1/(1/52)+1/75 ≈ 1/3750 we arrive at N c ∼ (3750) 2 /15, which is almost 10 6 . So, if the population is at most a million (or perhaps a couple of millions), we expect that the disease will go extinct quickly, whereas the disease will become endemic (for a rather long time) in a significantly larger population. This confirms the empirical observation that measles was continuously endemic in UK whereas it died out quickly in Iceland (and was later reintroduced by infectious people visiting the country).
If the CLT allows to predict extinction of an endemic disease for population sizes under a given threshold N c , and Large Deviations gives predictions for arbitrarily large population sizes, it is fair to look at Moderate Deviations, which describes ranges of fluctuations between those of the CLT and those of the LD.
The assumptions (H.1) and (H.2) are assumed to hold throughout this section.

The set-up and preliminary estimates
We shall use the general model written in the form (3.2). We assume that the limiting law of large numbers ODE For the sake of simplifying many formulas below, we change our coordinates, and let z * = 0. The reader should be aware of the fact that there is a price to pay for that translation of the origin. Indeed, since in the original coordinate system, the process Z N t was living on the set of vectors whose coordinates are integer multiples of N −1 (this is essential for the process to remain in the set where it makes sense, i.e. for proportions to remain between 0 and 1), the new origin generically does not belong to the set of point in R d which our process Z N t may visit. The grid on which Z N From now on 0 will be the endemic equilibrium (of course in the translated coordinate system), while z * = 0 will denote that endemic equilibrium in the original coordinates (we shall need it for the formula of the initial condition of the SDE).
We want to study the moderate deviations at scale N α of Z N t , where 0 < α < 1/2. Note that α = 0 would correspond to the large deviations, and α = 1/2 to the central limit theorem. We shall need below to consider the ODE starting from a point close to It is not hard to prove that, under our standing assumption (H.2) that b is of class C 1 and ∇b is bounded, as N → ∞, z N (t) → z(t) uniformly for 0 ≤ t ≤ T , where z(t) solves the linearized ODE near the endemic equilibrium 0: With these notations, the SDE for Z N,α This combined with Gronwall's Lemma yields From the boundedness and Lipschitz property of ∇b, and the formula for V N,α , we deduce te following (here and in the rest of the paper, x N y N means that there exists a constant C which is independent of N , such that x N ≤ Cy N ).
We deduce from the last three inequalities We will see below that the large deviations of Z N,α will follow from those of Y N,α by a variant of the contraction principle. We first consider the simpler processes which are similar to Y N and Y N,α , but with Z N s replaced by 0.

The limiting logarithmic moment generating function of Y N,α
We note that writing the integral over [0, N β j (0)] as the sum from = 1 to = N of The processes Q 1 , Q 2 , . . . , Q N are i.i.d., and their law is that of Proof. We use in an essential way the above decomposition of Y which we will check below. From this it follows that the argument of the logarithm on the before last line is greater than or equal to 1, at least for N large enough, and the final conclusion follows easily from the fact that for any Let us now check (4.7). It follows from an exact Taylor formula that But ν(Q) is an affine combination of mutually independent Poisson random variables, so that (4.7) follows easily by an explicit computation.

The limiting logarithmic moment generating function of Y N,α
We want to study the large deviations of Y N,α . The main step will be to prove that Lemma 4.1 remains valid if we replace Y N,α by Y N,α , which will follow from the next Proposition.
Proof. For any δ > 0, we deduce from Hölder's inequality so that, if we combine Lemma 4.1 and Proposition 4.2, we deduce that For the inequality in the other direction, we note that, by similar arguments, The remaining of this subsection will be devoted to the proof of Proposition 4.2. We note that Proposition 4.2 is a consequence of the following two Propositions.  Proof of Proposition 4.4. The exponents in the expressions entering (4.9) are sums over the indices 1 ≤ i ≤ d and 1 ≤ j ≤ k. Using repeatedly Schwartz's inequality, it is sufficient to prove the results with the sum replaced by each of the summands. Therefore in this proof we do as if d = 1, we fix 1 ≤ j ≤ k and for the sake of simplifying the notations, we drop the index j. We note that

M(ds, du)
It is not hard to see that one can treat each of the two terms on the right separately, and we treat only the first term, the treatment of the second one being quite similar. We note that there exists a compensated standard Poisson process M (t) on R + such that the factor of N −α in this first term can be rewritten as We need to estimate E exp[CN −α ν(W N )]. If we decompose the signed measure ν as the difference of two measures as follows ν = ν + − ν − , we again have two terms, and it suffices to treat one of them, say ν + . Of course it suffices to treat the case where ν + = 0.
Since the positive constant C is arbitrary, we can w.l.o.g. assume that ν + is a probability measure on [0, T ]. It is then clear that We choose a new parameter 0 < γ < α, and we write the expression whose expectation needs to be estimated as a sum of two terms as follows. (4.11) We now estimate the first term on the right hand side of (4.11). For that sake, we define the stopping time Consequently the expectation of the first term on the right of (4.11) is bounded from EJP 25 (2020), paper 25.
above by where the first inequality follows from Proposition .14 in the Appendix below, and the second one exploits the Lipschitz property of β. Consider now the second term on the right hand side of (4.11).
, for some c, C > 0, where the second inequality follows from Proposition .14 and the boundedness of β. Estimating the second factor in the last expression amounts to estimating the two probabilities (with another c > 0)  We estimate the first probability. For any a > 0, where the second inequality follows from Proposition .14 and the last inequality by optimizing over a > 0. One can easily convince oneself that a similar result holds for the second line of (4.12), making use of Proposition .14 with a negative a. Note also for further use that the same result also holds in case γ = 0. In that case, the probability on EJP 25 (2020), paper 25. the second line of (4.12) is zero for large enough c, in which case the anounced estimate is of course true. The expectation of the second term of the right hand side of (4.11) is thus dominated by (with c 1 and c 2 two positive constants) which establishes (4.9).
We now turn to the second proof.
Proof of Proposition 4.5. Recalling assumption (H.1), we now define, with β j := sup z∈R d β j (z), and the stopping timē where the constant b > 0 will be chosen below, and the constant b > 0 is arbitrary. From the estimate (4.3), We take the limit successively in the two terms of the above right hand side.
We first note that the arguments used in the proof of (4.13), in the particular case γ = 0, for some constant C > 0. We next estimate the product For the same reason as in the previous proof, we need only consider the case d = k = 1.
It follows from Proposition .14 that the first factor satisfies Finally there exist two positive constants C 1 and C 2 such that for N large enough. So a N log of the above tends to 0, as N → ∞.
Step 2: Estimate of (4.15) We first note that The first term on the right tends to 0 as N → ∞. It remains to take care of the second term. Since Y N t is a martingale, it is clear that the process exp a −1 Consider first the first factor on the right hand side of (4.17). We deduce from the definition ofτ b that EJP 25 (2020), paper 25.
Consequently the square of the first factor on the right of (4.17) is bounded from above by where we have used Doob's optional sampling theorem for submartingales. From the same argument as above, we do as if d = 1, note that and exploit Proposition 4.4 in order to conclude concerning a N log of the first factor on the right of (4.17). We next note that Hence the square of the second term on the right of (4.17) satisfies Consider first the second factor on the right of (4.18). We have Using the Cauchy-Schwartz inequality several times, it is clear that it is sufficient to do , where θ N ∼ Poi(aN ). We now choose b = a/3. We have EJP 25 (2020), paper 25.

Moderate deviations and epidemics
We have proved that the second factor on the right of (4.18) remains bounded, as N → ∞.
We next consider the first factor on the right of (4.18). We first note that It follows that the left hand side of (4.18) is bounded from above by a constant times where C 1 and C 2 are two positive constants. This last expression is bounded by 2, as soon as N is large enough. Finally a N log of the left-hand side of (4.18) tends to 0, as N → ∞.
Remark 4.6. We note that the full strength of (4.3) is necessary for the proof of Propo-

Large deviations of Y N,α
We first define the Fenchel-Legendre transform of where Q has been defined by hence for any δ > 0, for N large enough, This implies clearly We shall prove below the following.
Let us now turn to the proof of the above theorem.
Proof of Theorem 4.7. From (4.19), we deduce that as N → ∞. Consequently, again by the argument of Proposition 4.3, we deduce from that same Proposition that for any signed measure ν on [0, T ], as N → ∞, This, together with Proposition 4.9, allows us to apply Corollary 4.6.14 from [2], to conclude that the sequence Y We now turn to the Proof of Proposition 4.9. Clearly it suffices to prove both that It follows from Doob's submartingale inequality and a combination of Lemma 4.1 and It remains to consider Y N,α,c . Define the modulus of continuity of an element x ∈ C 0 ([0, T ]; R d ) as w x (δ) = sup 0≤s,t≤T,|s−t|≤δ x(t) − x(s) . It follows from Ascoli's theorem that for any sequence {δ , ≥ 1} of positive numbers, the following is a compact subset Suppose that for each ≥ 1, R > 0, we can find δ R, > 0 such that for all N ≥ 1, From this we deduce that from which the result follows. A sufficient condition for (4.21) to be true is that for any b > 0, lim In turn a sufficient condition for this is that which we now prove. It is not hard to see that EJP 25 (2020), paper 25.
where we have used Doob's submartingale inequality at the last step. Clearly Using repeatedly Cauchy-Schwartz's inequality, we see that it suffices to estimate for whereβ j = sup z β j (z), we have used Proposition .14 and the inequality e x − 1 − x ≤ x 2 , valid for x ≤ log(2), which we have applied with x = 2CN −α δ −1/2 and x = −2CN −α δ −1/2 (recall that we will first let N → ∞). Putting together the last estimates yields (4.22) follows, and the Proposition is proved.

Computation of the rate function Λ *
Let us compute Λ * in the three examples which we discussed above in section 2. Here we do not translate z * to the origin.
The supremum is achieved at the signed measure ν which makes the gradient with respect to ν of the above zero, if any. We first note that for such a ν to exist, we need that φ(0) = 0, unless Λ * (φ) = +∞. Now the optimal ν must satisfy So necessarily Substituting this signed measure ν in the above formula, we obtain that T 0 |φ (t)| 2 dt, if φ(0) = 0 and φ is absolutely continuous; +∞, otherwise.

Computation of Λ * for the SIRS model
In this model, d = 2 and k = 3. We have In the case λ > γ, there is a unique stable endemic equilibrium, namely z * = ρ γ+ρ (1− γ λ ) γ λ . In order to simplify the notations, we shall write a = β 1 (z * ), b = β 2 (z * ), c = β 3 (z * ) and A = ab + ac + bc. We have The functional to be maximized with respect to ν is Writing that the gradient w.r.t. ν 1 and ν 2 of this functional is zero leads to the identities This implies the identities EJP 25 (2020), paper 25.

Computation of Λ * for the SIR model with demography
In this case, d = 2, k = 4, In the case λ > γ + µ, there is a unique stable endemic equilibrium, namely z * = . We shall use the notations a = β 1 (z * ), b = β 2 (z * ), c = β 3 (z * ) + β 4 (z * ) and A = ab + ac + bc We have  We have Since the mapping F z has the nice property that F z (x)(t) − F z (x)(t) = exp[∇b(0)t](z − z ), it follows readily again from Corollary 4.2.21 in [2] that the above result can be extended to the following statement.
For any open set G ⊂ D([0, T ]; R d ), for any sequence z N → z, From this last Theorem, we can deduce, with the same proof as that of Corollary 5.6.15 in [2], the following Corollary.

Wentzell-Freidlin theory and extinction of an epidemic
We now define where a > 0, and we recall that we have translated the endemic equilibrium z * at the origin.
We can now state our main result. For any z ∈ R d such that z 1 > −a, and any η > 0, Given Corollary 4.12, the proof of the above result follows the exact same steps as that of Theorem 5.7.11 in [2], with some minor modifications, to adapt to the fact that our processes have discontinuous trajectories, see the proof of Theorem 7.14 in [5], or of Theorem 4.2.17 in [1].

Interpretation. The critical population size
Going back to the original coordinates, i.e. z * = 0, we should interpret Z N,α (t) as Z N,α (t) = N α (Z N (t) − z * ). So (dropping the index for the starting point in order to simplify our notations), T N a is the first time when Z N 1 (t) ≤ z * 1 − aN −α . For T N a to be finite, we need to have z * 1 − aN −α ≥ 0, since Z N 1 (t) cannot become negative. This is of course no problem for the limit theorem, since aN −α → 0 as N → ∞, while z * 1 is fixed. However, a deviation of the order of −aN −α is enough for Z N 1 (t) to hit zero, if z * 1 is of the order of N −α , which means that N is of the order of (z * 1 ) −1/α . e N 1−2α V a is the order of magnitude of the time needed for Z N t − z t to make a deviation of size aN −α . This is sufficient to extinguish an epidemic, provided z * 1 is of the same order, so that the corresponding critical size is N α ∼ (1/z * 1 ) 1/α , which is roughly the CLT critical population size raised to the power 1/2α.

The value of V a in the SIS model
In the particular case of the SIS model, we can compute explicitly the value of the quasipotential V a . In this case, d = 1, the linearized ODE around the endemic equilibrium translated at 0 readsẋ = −(λ − γ)x + u, and the cost functional to minimize is We are looking for the minimal cost for driving x from 0 to −a. We now exploit the Pontryagin maximum principle, see [9]. The Hamiltonian reads The optimal controlû must maximize the Hamiltonian, so it satisfiesû = 2γ(λ−γ) λ p. Since the final time is free and the system is autonomous, the Hamiltonian vanishes along the optimal trajectory, so that along such a trajectory, either p = 0, in which caseû = 0, or else x = γ λ p, henceû = 2(λ − γ)x. Finally the pieces of optimal trajectory which move towards the origin correspond to u ≡ 0, those which move away from the origin (this is the case we are interested in) satisfy the time reversed ODEẋ = (λ − γ)x. There is no optimal trajectory from x = 0 to x = −a. However, if we start from x = −ε, the optimal trajectory is x(t) = −e (λ−γ)t ε, soû(t) = −2(λ − γ)e (λ−γ)t ε, the final state −a is reached at time (λ − γ) −1 log(a/ε), and the optimal cost is λ 2γ (a 2 − ε 2 ). A possible sub-optimal control starting from 0 is as follows. Choose u = −1 for a time of order ε, until x(t) reaches −ε, which costs or the order of ε, and then choose the optimal feedback, until −a is reached. Letting ε → 0, the cost converges toV a = λ γ a 2 2 .

Comparison between the CLT, MD and LD
We do that comparison in case of the SIS model, for which we have explicit expressions for the rate functions and the quasi-potentials. We still translate z * at the origin, and start our process at the origin: Z N 0 = 0. We fix a > 0 and want to compare (for t large) the upper bounds for P(N α Z N t ≥ a) in the three cases α = 1/2 (the central limit theorem), 0 < α < 1/2 (moderate deviations) and α = 0 (large deviations).
We start with the central limit theorem. It is easy to see that U t = lim N →∞ is an Ornstein-Uhlenbeck process (in particular it is a Gaussian process), which solves the SDE so that the asymptotic variance of U t is γ/λ. Consequently for a > 0 fixed and any η > 0, there exist t and N large enough such that we have the following upper bound for the probability of a positive deviation of Consider next the moderate deviations. Theorem 4.10 combined with the computation from the last subsection indicates that for 0 < α < 1/2, any η > 0, there exists t and N large enough such that We finally consider the large deviations. Here we need to assume that a < γ/λ.
Consequently, from Theorem 3.5, for any η > 0, there exists t and N large enough such that P(Z N t ≥ a) ≤ exp −N a + γ λ − a log 1 − a λ γ − η .
We note that Moderate Deviations resembles much more the CLT than Large Deviations.
The fact that the discontinuity in the form of the rate function is exactly at α = 0 is typical of random variables with light tails. The situation would be quite different with heavy tails, see e.g. section VIII.4 in Petrov [8].
Note however that for small a, a + γ λ − a log 1 − a λ γ ∼ λa 2 2γ , which is not too surprising, and in a sense reconcile Large Deviations and Moderate Deviations.

Appendix
In this Appendix, we establish the following technical result.
Proposition .14. Let M be a standard Poisson random mesure on R 2 + , and M(dt, du) = M(dt, du) − dt du the associated compensated measure. If ϕ is an R + -valued predictable process such that T 0 ϕ t dt has exponential moments of any order, and a ∈ R, then for any 0 ≤ t ≤ T , . If 2b = e 2a − 1 − 2a, it follows from the previous argument that the first factor on the second right hand side is less than or equal to 1, hence the result follows.
In order to complete the proof of Proposition .14, we still need to establish Lemma .15. The process ϕ satisfying the same assumptions as in Proposition .14, and X t being given by (.24), M t = t 0 ϕs 0 e Xs− M(ds, du) is a martingale. Proof. It is plain that M t is a local martingale, whose predictable quadratic variation is  All we need to show is that the above quantity is integrable. It is clearly a consequence of the assumption in case a < 0. In case a > 0, the second factor of the right hand side has finite exponential moments, so is square integrable, and it remains to show that E exp 4a The same computation with ϕ s replaced by ϕ n s = ϕ s ∧ n, and then Y s replaced by Y n s would show that Y n t is a martingale satisfying EY n t = 1. But 0 ≤ Y n t → Y t a.s., hence Fatou's Lemma implies that EY t ≤ 1. Since