Large and moderate deviations for stochastic Volterra systems

We provide a unified treatment of pathwise Large and Moderate deviations principles for a general class of multidimensional stochastic Volterra equations with singular kernels, not necessarily of convolution form. Our methodology is based on the weak convergence approach by Budhijara, Dupuis and Ellis. We show in particular how this framework encompasses most rough volatility models used in mathematical finance and generalises many recent results in the literature.


INTRODUCTION
This paper sheds new light on the asymptotic behaviour of the class of stochastic Volterra equations (SVEs) for some fixed time horizon T > 0, where X 0 ∈ R d , d ≥ 1, W is a multidimensional Brownian motion, K is a kernel that may be singular, and the coefficients are such that a unique pathwise solution exists. This class of models has been investigated in many fields, including nonlinear filtering [29] using fractional Brownian motion kernels, pharmacokinetic models [63] (Langevin equation driven by fractional Brownian motion), fluid turbulence [21], and turbulence modelling in atmospheric winds or energy prices [3,28] using Brownian semistationary processes. Mathematical finance has however been the most dynamic area by far in terms of applications of SVEs, and an in-depth study of (1.1) in the affine case with convolution kernels was recently carried out by Abi Jaber, Larsson and Pulido [1]. Following previous analyses supporting non-Markovian systems [2,23,24,26,25,47], the investigation of high-frequency data in [51] revealed the roughness, in the sense of low Hölder regularity, of the observed time series of the instantaneous volatility of stock price processes. This suggested that fractional Brownian motion (fBm) with small Hurst parameter (H ≈ 0.1) is an accurate driver for its dynamics. Since this seminal observation, more advanced results [39] have proposed that the drift and the diffusion coefficients should be state dependent, giving rise to the widespread development of (1.1) in quantitative finance.
For option pricing purposes, the asymptotic results in [2,7,48] showed that the short-maturity behaviour of option prices is captured much more accurately by these rough volatility models rather than by Markovian diffusions. Reconciling the stylised facts of the markets from both the statistical and the option pricing viewpoints is the tour de force that make these models so important today. However, the loss in tractability compared to classical Itô diffusions is not negligible. The solution to (1.1) is in general not a semimartingale nor a Markov process, preventing the use of Itô calculus or Feynman-Kac type formulas. Path-dependent versions of the latter are available in some cases, in particular for affine rough volatility models [1,31,41,52], but general results are scarce [74]. Rough path theory is a natural route but is not available for H ≤ 1/4, although a regularity structure approach was recently developed [5]. In this context, one could turn to numerical methods to understand the dynamics or to price options but, despite new advances based on Monte-Carlo methods [6,8,64], rough Donsker theorem [57] or Fourier methods [41], the roughness and memory of the process seriously complicate the task.
Asymptotic methods have been used, both to provide clearer understanding of models in extreme parameter configurations and to act as proxies to numerical schemes. Large Deviations Principles (LDP), in particular, have been widely explored in mathematical finance, and we refer the interested reader to [69] for an overview. Let {X ε } ε>0 be a sequence of random variables in some Polish space X , converging in probability to a deterministic limit X as ε goes to zero. This sequence is said to satisfy an LDP with speed ε −1 and rate function I : X → [0, +∞] if for all Borel subsets B ⊂ X , the inequalities hold, and the level sets {x ∈ X : I(x) ≤ N } of I are compact for all N > 0. This rate function encompasses in a (relatively) concise formula first-order information about the asymptotic behaviour of complex dynamical systems. If X satisfies (1.1) and X = R d , one can consider finite-dimensional LDP for {X t } t≥0 (also called small-time LDP if the limit takes place as t goes to zero), or pathwise LDP for some rescaling of X with X = C([0, T ] : R d ). The former is easily recovered from the latter by a projection argument. Moderate deviations however are concerned with deviations of a lower order than large deviations, and thus apply to 'less rare events'. We indeed say that {X ε } ε>0 satisfies a moderate deviations principle (MDP) if {η ε } ε>0 satisfies an LDP with speed h 2 ε , where with lim ε↓0 h ε = +∞ and lim ε↓0 √ εh ε = 0. Since the speed of convergence of h ε is not fixed, an MDP essentially bridges the gap between the central limit regime where h ε = 1 and the LDP regime where h ε = ε −1/2 . An edifying example of the relevance of moderate deviations appears in [46] where, interested in option pricing asymptotics, the authors judiciously rescale the strikes with respect to time to expiry. Indeed, as time to expiry becomes smaller, the range of pertinent strikes naturally shrinks, and this 'moderately-out-of-the-money' regime becomes more realistic.
Large deviations for SVEs were originally studied in [68,71] with regular kernels. In the context of rough volatility, Forde and Zhang [43] introduced the first finite-dimensional LDP where the log-volatility is modelled by a fractional Brownian motion, and refined versions followed in [5,7,45], while pathwise LDP for similar models were studied in [20,55]. Departing from regular conditions on the behaviour of the coefficients led to specific requirements, and finite-dimensional large deviations for the fractional Heston model were carried out in [42,54], while more elaborate pathwise LDPs were derived for the rough Stein-Stein model with random starting point [56], for the rough Bergomi model [58], and small-time LDPs for the multi-factor rough Bergomi appeared in [61]. We emphasise at this point that no pathwise MDP was previously known in the context of rough volatility.
The Gärtner-Ellis theorem [33,Theorem 2.3.6] is the main ingredient of a finite-dimensional LDP and depends on explicit computations of certain limits of the Laplace transform. This is only available though, when the process is either Gaussian [43] or affine [42]. Pathwise LDP on the other hand, have mainly been derived using the Freidlin-Wentzell approach [44]: starting from known large deviations for the driving (Gaussian) process [34,Theorem 3.4.5], they follow from a combination of approximations and continuous mapping, keeping track of the rate function. While this methodology is clear, it requires a case-by-case tailored path for each model, and in general leads to a cumbersome rate function. Furthermore, pathwise moderate deviations are so far out of reach in this approach, partially explaining the small number of related results compared to LDP.
A radically different method, introduced by Dupuis and Ellis in the monograph [36] and developed further by Budhiraja and Dupuis [14], relies on the equivalence between the LDP and the Laplace principle. The family {X ε } ε>0 is said to satisfy the Laplace principle with speed ε −1 and rate function I : X → [0, +∞] if for all continuous bounded maps F : X → R, This alternative, called the weak convergence approach, consists in proving a Laplace principle where the left-hand side pre-limit of (1.2) can be represented as a variational principle for expectations of functionals of Brownian motion [11, Theorem 3.1]: Lemma 1.1 (Boué-Dupuis). Let W be an R m -Brownian motion and F be a bounded Borel-measurable function mapping C([0, T ] : R m ) into R. Then The representation (1.3) contains in a single formula the usual tools used in the proof of an LDP. The first term on the right-hand side comes from the relative entropy between the Wiener measure and the measure shifted by · 0 v s ds via Girsanov's theorem, under which W + · 0 v s ds is a Brownian motion. It can be interpreted as the cost of deviating from the original path and clearly indicates where the form of the rate function comes from. In essence, this representation replaces the nonlinear analysis of the Freidlin-Wentzell approach with the linear theory of weak convergence. Instead of exponential estimates, only qualitative properties of the shifted process need to be established, such as strong existence and uniqueness and tightness.
The extensive literature on the topic, summarised in [14] and the references therein, demonstrates the strength of this generic approach which can be applied to a variety of models without appealing to their particular features. It has been used to derive LDPs, in the continuous-time case, for diffusions [22], multiscale systems [16,37,72], SVEs with singular kernels and Lipschitz continuous coefficients [76], SDEs driven by infinite-dimensional Brownian motions [17], by Poisson random measures [12] or both [18], including stochastic PDEs. Contrary to the Freidlin-Wentzell approach, this method has also proved efficient to obtain MDPs. SVEs with Lipschitz continuous kernels [62], SDEs with jumps [15] and slow-fast systems [65] are a few relevant examples. The latter were then tailored to the setting of stochastic volatility models in [59], developing the first application of the weak convergence approach in mathematical finance, and extending the MDP results in [46] to a pathwise setting. A further appealing feature of moderate deviations is the simple form, often quadratic, of the rate function, as opposed to that provided by large deviations, thereby opening the gates to the use of importance sampling and variance reduction techniques [38,70,66].
Building on this powerful approach, we provide a unified treatment of (finite-dimensional and pathwise) large and moderate deviations in the general framework (1.1) by showing the weak convergence of a perturbed system. We relax the uniqueness requirement for the limiting Volterra equation, as in [27,35] for the diffusion case, allowing us to consider coefficients that are not Lipschitz continuous and do not necessarily have sublinear growth.
The paper is organised in the following way: Section 2 introduces the framework and useful definitions. In section 3, we present abstract criteria for the validity of an LDP, extending the results by Budhiraja and Dupuis [13,Theorem 4.1]. Our main results, Theorem 3.11 for LDP and Theorem 3.20 for MDP, are then stated in the case of convolution kernels and extended to nonconvolution kernels in Theorem 3.29. In Section 4, we show how these results apply to rough volatility models, and give precise formulae for the rough Stein-Stein, the (multi-factor) rough Bergomi and the rough Heston models. We finally gather technical proofs in the appendix.

GENERAL FRAMEWORK
2.1. Notations. We consider a fixed time horizon T > 0, and denote T := [0, T ], R + := [0, +∞) and R + := [0, +∞]. For d 1 ≥ 1, d 2 ≥ 1, |·| denotes the Euclidean norm in R d1 and the Frobenius norm in R d1×d2 , and d 1 , d 2 := {d 1 , · · · , d 2 }. For p ≥ 1, L p stands short for L p (T), and · 2 is the usual L 2 norm. Furthermore, for d ≥ 1, W d := C(T : R d ) represents the space of continuous functions from T to R d , equipped with the supremum norm ϕ T := sup t∈T |ϕ t | for any ϕ ∈ W d . Finally, for any Unless stated otherwise, constants will be denoted by C (with possible subscript) and may be different from one proof to another. Every statement involving ε stands for all ε > 0 small enough. A family of random variables will be called tight if the corresponding measures are tight [36,Appendix A]. We also use the classical convention that the infimum over an empty set is equal to infinity. Finally, we recall the following definitions for clarity and notations: • It has linear growth if there exists • if it is uniformly continuous, it admits a continuous and increasing modulus of continuity

Framework.
We consider small-noise convolution stochastic Volterra equations (SVE) where ε > 0, and ϑ ε > 0 tends to zero as ε goes to zero. For each Borel-measurable functions, and W is an m-dimensional Brownian motion on the filtered probability space (Ω, F , {F t } t∈T , P) satisfying the usual conditions. The kernel function K : T → R d × R d , of convolution type, is allowed to be singular, thus encompassing fractional processes, in particular the recent literature on rough volatility [2,6,39,51]. Components of the system are in general correlated, the correlation matrix being implicitly encoded in the diffusion coefficient σ ε . General existence and uniqueness results for such stochastic Volterra equations are so far out of reach, and our conditions below are sufficient and general enough for most applications. In order to state them precisely, we first introduce several definitions and concepts: Definition 2.2. For any ε > 0, a solution to (2.1) is an R d -valued progressively measurable stochastic process X ε satisfying (2.1) almost surely and such that We shall assume that the (singular) convolution kernel satisfies the following condition, which is essentially a multivariate version of the one given in [1, Condition (2.5)]: Assumption 2.3. The kernel K : T → R d×d is an upper triangular matrix satisfying the following conditions: K ∈ L 2 (T : R d×d ) and there exists γ ∈ (0, 2] such that, for h small enough, We refer to [1, Example 2.3] for a broad range of kernels that satisfy this assumption. Of particular interest in mathematical finance is the Riemann-Liouville kernel K(t) = t H− 1 2 , for H ∈ (0, 1 2 ) implying γ = 2H. Moreover if K is locally Lipschitz and K satisfies Assumption 2.3 then so does the product K K; this includes the gamma and power-law kernels which are related to the class of Brownian semistationary processes [4].

Remark 2.4. This setup covers in particular the following two useful forms for the kernel:
• K = Diag(k 1 , · · · , k d ) is a diagonal matrix, where each k i : T → R satisfies Assumption 2.3.
• The drift and diffusion coefficients of any sub-system of (2.1) can be convoluted with different kernels. As an example, the one-dimensional SVE s is the first component of (2.1) with d = 2 and One could in fact use general matrices to eliminate the auxiliary state, such as K = (K 1 K 2 ) : T → R 1,2 in the example above. We stick to square matrices for consistency.
Volterra systems appearing in the literature, and in particular in the mathematical finance one, have a specific structure in the sense that only one component satisfies an SVE with (singular) kernel, and can be dealt with independently of the other component. This particular structure allows us to relax some conditions on the coefficients, and we shall leverage on it whenever needed. We make this more specific through the following two definitions: We define S Γ Υ as the set of functions f for which there exists a strictly positive constant C Υ such that, for all x ∈ R d , where |x| Υ c := i∈Υ c x (i) and x (Υ) := (x (i) ) i∈Υ .
Definition 2.6. The process X ε admits an autonomous S Γ Υ -subsystem {X ε,(l) } l∈Υ if for all 1 ≤ i, j ≤ d such that K ij = 0, b  ε (t, ·) satisfy the following for small enough ε and uniformly in t ∈ T: • if i ∈ Υ, they have linear growth and do not depend on X ε,(k) for k ∈ Υ c ; • if i ∈ Υ c , they belong to S Γ Υ . Example 2.7. The motivation for Definition 2.6 is to be able to handle (rough) stochastic volatility models, ubiquitous in mathematical finance, where linear growth of all the coefficients may not hold. Consider for example the rough Bergomi model [6]  where B and W are Brownian motions with correlation ρ ∈ (−1, 1). After dropping the dependence in ε, this fits into the setup of (2.1) with d = 3, y 0 ∈ R, a > 0, H ∈ (0, 1 2 ), ρ = 1 − ρ 2 and where the third component is meaningless but allows us to handle the two different kernels. Here X admits X (2) as autonomous subsystem with Υ = {2} and Γ(x 2 ) = 1 + e x2 .
The following set of assumptions, inspired by [22], completes our framework: H1. X ε 0 converges to x 0 ∈ R d as ε tends to zero.

H2.
For all ε > 0 small enough, the coefficients b ε and σ ε are measurable maps on T × R d and converge pointwise to b and σ as ε goes to zero. Moreover, b(t, ·) and σ(t, ·) are continuous on R d , uniformly in t ∈ T. H3. Either a) or b) holds: a) For all ε > 0 small enough, b ε and σ ε have linear growth uniformly in ε and in t ∈ T. b) The process X ε admits an autonomous S Γ Υ -subsystem. H4. The SVE (2.1) is pathwise unique for small enough ε > 0.
H2 ensures that, on compact subsets of T × R d , the convergence of b ε and σ ε is uniform and that b and σ are uniformly continuous. H1, H2, H3a are standard and easily verifiable. H3b is unusual but includes a large number of functions; Assumption 3.9 will complete it to indicate the role of Γ such as to include Example 2.7. Moreover, the growth conditions from H3 are uniform in ε and therefore apply to the limits b and σ. The main restrictions arise from H4, although the latter is satisfied if, for instance, the coefficients b ε and σ ε are locally Lipschitz continuous for small enough ε > 0. This condition was relaxed in [67] to the one-dimensional case where K(t) = t −α , for α ∈ (0, 1 2 ) and σ(x) = x γ , for γ ∈ ( 1 2(1−α) , 1], which is clearly not Lispchitz continuous. Furthermore, to the best of our knowledge there currently exists no pathwise LDP for stochastic equations where pathwise uniqueness fails.
One can compare our setup with the SVE considered by Zhang in [76], where a pathwise LDP was derived under the assumptions of Lipschitz continuity and linear growth of the coefficients. We relax both these assumptions. Indeed, our framework covers Hölder-continuous diffusion coefficients, as mentioned in the previous paragraph, and functions with non-linear growth through the concept of autonomous S Γ Υ -subsystem, as in Example 2.7.

LARGE AND MODERATE DEVIATIONS
As discussed in the introduction, our goal is to provide pathwise large and moderate deviations for the general convolution stochastic Volterra system (2.1), and then extend these to nonconvolution kernels. The classical Freidlin-Wentzell approach, used in [44], has limitations regarding the behaviour of the coefficients, and the rate function is often rather cumbersome to write. We follow here instead the weak convergence approach developed by Dupuis and Ellis [36]. We first introduce the reader to their abstract setting, and refine the large deviations result by Budhiraja and Dupuis [13] to our general setup. We then show how this abstract framework applies to the small-noise stochastic Volterra system (2.1), first proving pathwise large deviations, and then the moderate deviations counterpart.
3.1. Weak convergence approach: the abstract setting. Given a family of Borel-measurable functions {G ε } ε>0 from W m to W d , we enquire about the large deviations behaviour of the family of random variables {G ε (W )} ε>0 as ε tends to zero, where W is a standard Brownian motion on the filtered probability space above. For each N > 0, the spaces of bounded deterministic and stochastic controls with A introduced in (1.4), are equipped with the weak topology on L 2 (T × Ω) such that they are closed and even compact (by Banach-Alaoglu-Bourbaki theorem). Budhiraja and Dupuis [13] assume, for any sequence {v ε } ε>0 in A N converging weakly to v ∈ A N , the existence of a limit in distribution of G ε W + 1 ε · 0 v ε s ds which is uniquely characterised by v. However, such uniqueness may fail when the coefficients of the system (1.1) (in particular the diffusion coefficient σ) are not locally Lipschitz, as is the case for the Feller diffusion for example (in this case without singular kernel, a dedicated analysis was carried out in [27,35] using the Freidlin-Wentzell approach). We relax here this uniqueness assumption by replacing the limiting trajectory by a perturbed version.
Then we define the functional I : W d → R + given by In particular, if there exists v ∈ L 2 which attains the infimum in (3.2) and G 0 v = {φ} then φ is uniquely characterised, because one can choose v n = v for all n ∈ N.
In the display (3.2) and Definition 3.2, φ, v, v n and v are all deterministic. Our abstract large deviations result is the following, extending [13,Theorem 4.4], at least when the underlying Hilbert space is L 2 (T : R m ), to the non-uniqueness case. We defer the proof to Appendix A.1; the lower bound can be tackled as in [13], and we therefore concentrate on the upper bound. The idea is that the Laplace principle (1.2) upper bound involves an infimum so deriving it only requires a δ-optimal path. Hence a perturbation will also do the trick, provided one knows how to handle the control associated to it. In [13,Theorem 4.4], unique characterisation of the limiting element in (i) is granted, and the set G 0 v is a singleton that takes the form G 0 ( · 0 v s ds), where they view G 0 as a map. In that case Assumption 3.3 is clearly satisfied since φ δ can be taken as φ itself.
3.2. Application to stochastic Volterra systems. We now show how the abstract setting developed above in Section 3.1 applies to the small-noise stochastic Volterra system (2.1) and why pathwise uniqueness is so fundamental. If H4 holds, define the functional G ε as the Borel-measurable map associating the multidimensional Brownian motion W to the solution of the stochastic Volterra equation (2.1), that is: G ε (W ) = X ε . For any control v ∈ A N , N > 0 (introduced in (3.1)) and any ε > 0, the process W : Hence the shifted version X ε,v := G ε ( W ) appearing in Theorem 3.5(i) is the strong unique solution of (2.1) under P, with X ε and W replaced by X ε,v and W . Because P and P are equivalent, X ε,v is also the unique strong solution, under P, of the controlled equation Under appropriate conditions, and using the notations set in H1, H2, we heuristically observe that taking ε to zero, the system (3.4) reduces to the deterministic Volterra equation We will show later that the set G 0 v corresponds to the set of solutions of (3.5). Example 3.8. To illustrate the need for a set G 0 v rather than a singleton, consider the Feller diffusion for t ∈ T, with x 0 , κ, θ > 0. Letting t → εt and denoting X ε t := X εt yields, by scaling, For v ∈ L 2 , taking limits as ε tends to zero in the corresponding controlled equation (3.4) yields (3.5), or

Uniqueness of this Volterra equation does not hold in general because of the non-Lipschitz coefficient, and thus
The function φ t : is clearly a solution, but so is ϕ equal to φ on [0, 2] and null on [2,4]. The square root function is indeed locally Lipschitz away from zero, and uniqueness can thus be guaranteed as long as the solution remains positive. The perturbation φ δ := φ + δt is now the unique solution to [35,Proposition 3.3] shows that φ δ satisfies Assumption 3.3.
In [9], the authors were also confronted to a limiting equation with multiple solutions. Instead of perturbing the path φ, they perturb the control in a way that the resulting equation has a unique solution which is precisely φ, i.e. G 0 v δ = {φ}. This approach may seem more natural; however, it is not always obvious how to perturb the control ensuring uniqueness of the ODE, while our formulation makes it more straightforward. Before stating the main large and moderate deviations results for small-noise stochastic Volterra equations, we introduce the following assumption, monitoring the moments of the controlled equation: Assumption 3.9. Let X ε,v be the pathwise unique solution to (3.4). If H3a holds then the present assumption is satisfied. If instead H3b holds, then there exists ε 0 > 0 such that, for any p ≥ 1 and N > 0, Remark 3.10. In the following, H3b will always be complemented by Assumption 3.9.

Large Deviations.
Armed with the abstract setting in Section 3.1, and its application to the stochastic Volterra system (2.1) in Section 3.2, we can at last show large deviations for the latter: Remark 3.14. This bound also holds for any solution φ of (3.5) under the same assumptions, therefore there also exists The following lemma deals with 3.11(i) by showing tightness of

Remark 3.16.
This lemma entails that for all N > 0, v ∈ S N , any solution to (3.5) also has Hölder continuous paths of the same order.
The following lemma proves Theorem 3.5(ii) and its proof can be found in Appendix A.4. Leveraging on the above lemmas, the Large Deviations Principle (Theorem 3.11) is a direct consequence of Theorem 3.5. Lemma 3.17. For N ∈ N and v ∈ A N , we first need to identify G 0 v,N , defined in (3.1). Consider a subsequence {ε n } n∈N ⊂ R + with lim n↑∞ ε n = 0 and a sequence {v εn } n∈N ⊂ A N such that lim n↑∞ v εn = v in distribution, and assume that X n := X εn,v εn converges in distribution to some random variable φ with values in W d . We also denote v n , ε n , X n 0 , b n , σ n along this subsequence.

Proof of
By Skorohod representation theorem we can work with almost sure convergence for the purpose of identifying the limit. Hence X n , v n n≥0 converges almost surely in the product topology on W d × S N , and the limit is the W d × S N -valued random variable (φ, v). The convergence of the couple also takes place in distribution, so that we can follow the technique in [22] to identify the limit. For t ∈ T, define Φ t : Clearly, Φ t is bounded and we show that it is also continuous. Indeed, let ω n → ω in W d and f n → f in S N with respect to the weak topology. H2 implies the existence of continuous moduli of continuity ρ b and ρ σ for both coefficients on compact subsets (see Definition 2.1). Since the paths ω n , n ≥ 1 and ω are continuous, they are also uniformly bounded and hence these moduli are available. Then, using Cauchy-Schwarz inequality and the fact that |x ∧ 1 − y ∧ 1| ≤ |x − y| for all x, y > 0, Since K(t − ·) ∈ L 2 and f n tends to f weakly in L 2 then the last integral converges to zero as n goes to infinity. Moreover lim n↑∞ ω − ω n T = 0, f n 2 ≤ √ N for all n ≥ 0 and K 2 + σ(·, ω) T < ∞, which proves that Φ t is continuous, and therefore We now prove that the left-hand side is actually equal to zero. We start with the observation that, using BDG inequality, The bounds (A.2) and (A.4) show how to control the last term under H3a and H3b respectively, However the convergence of b n , σ n only occurs on compact subsets so we use a localisation argument. For all n ≥ 0, M > 0 we introduce The uniform (in n ∈ N) Hölder regularity of X n , encompassed by (3.9), entails the existence, for all for some 0 < α < γ/2 − 1/p and where X n 0 is uniformly bounded by 2 |x 0 | for n large enough. Markov's inequality then implies that Moreover, for all n ∈ N, ω ∈ Ω \ A M n , and t ∈ T, |X n t (ω)| is bounded by M , which means which tends to zero uniformly on T as n goes to infinity (and likewise for σ n ) from H2. Define now and observe that, using Jensen and Cauchy-Schwarz inequalities, the growth condition on the coefficients from H3 and the moment bounds on X n from (3.8), there exists C 4 > 0 independent of n such that Let us fix ǫ > 0 and choose M ǫ > 0 large enough such that sup n∈N P A Mǫ n ≤ ǫ 2 /C 4 ; this choice is possible because of (3.11). Therefore, using the bound (3.12) and Cauchy-Schwarz inequality to separate I n and ½ A Mε It follows from (3.10) that implies that φ satisfies (3.5) almost surely, for all t ∈ T. Since φ has continuous paths, it satisfies (3.5) for all t ∈ T, almost surely, which means G 0 v,N consists of all the solutions of (3.5). Since this definition is independent of N , it extends to G 0 v , which yields the claim.

Moderate Deviations.
Let h ε tend to infinity such that ϑ ε h ε tends to zero as ε goes to zero and define X to be the limit in law of X ε , which we identified in the previous subsection as a solution of the Volterra equation Then the MDP for {X ε } ε>0 is equivalent to the LDP for the family {η ε } ε>0 defined as where T ε : W m → W d is a Borel-measurable map for each ε > 0. Therefore η ε satisfies the following SVE for all ε > 0, and is its unique solution if H4 holds.
Similarly to the LDP case we are interested in a certain shift of the driving Brownian motion, controlled by v ∈ A. For all ε > 0, let This sequence satisfies the bound (3.8) and converges weakly towards X since ϑ ε h ε σ ε tends to zero as ε tends to zero. Finally the process defined by (3.14) satisfies For all v ∈ A, we define T 0 v to be the solution of the limiting equation The form of the limit equation is dramatically simpler than for the LDP and much easier to compute. Moreover T 0 v is well defined because the linearity of the equation and Assumption 2.3 grant uniqueness for free, provided ∇b exists. Hence we will need the following assumptions: H5. For each t ∈ T, the function b(t, ·) is continuously differentiable and b is Lipschitz continuous. H6. There exists δ > 0 such that σ(t, ·) is locally δ-Hölder continuous, uniformly for all t ∈ T.
The proof of the moderate deviations theorem follows a similar structure to that of Theorem 3.11, making use of Theorem 3.5. It will rely on moment bounds in   H2 -H5, H7, H8, Assumptions 2.3 and 3.9, for all p ≥ 2, N > 0, v ∈ A N and ε > 0 small enough, there exists c > 0 independent of ε, v, t such that (3.19) sup We recall that η ε,v ε = T ε W + h ε · 0 v s ds and ψ = T 0 v , hence the lemma above deals with Theorem 3.5 (i). The following one identifies the limit set as the unique solution to (3.16). It is thus more precise than in the LDP case, and justifies the form of the rate function (3.18). Lemma 3.23 (MDP weak convergence). Let N > 0, a family {v ε } ε>0 such that, for all ε > 0, v ε ∈ A N and v ε converges in distribution to v ∈ A N , and ψ the unique solution of (3.16). Under H2 -H8, Assumptions 2.3 and 3.9, η ε,v ε converges in distribution to ψ as ε goes to zero.
Item (ii) is dealt with in the following lemma. H2, H3, H5, Assumptions 2.3 and 3.9, the functional Λ defined by (3.18) has compact level sets.

Lemma 3.24 (MDP compactness). Under
Proof. Noticing that ∇b(t, X t ) and σ(t, X t ) are uniformly bounded on T by continuity, this lemma boils down to a particular case of Lemma 3.18. Theorem 3.5(iii) is immediate by uniqueness of (3.16), therefore all the conditions are met and Theorem 3.20 follows as a direct application of Theorem 3.5.
3.5. Extension to non-convolution kernels. The analysis undertaken in this paper is based, both for notational convenience and with a view towards application, on convolution kernels. Different assumptions were studied in the literature, in particular Decreusefond [32] considered the properties of the map f → · 0 K(·, s)f (s) ds in order to include the fractional Brownian motion in his setting.
3.5.1. Setting. We call a kernel a map K : T 2 → R for which both t 0 K(t, s) 2 ds and K(t, s) are finite for all t ∈ T and s = t. The associated space is defined as Hence, for all u ∈ K the stochastic integral is well defined for all t ∈ T in the Itô sense. For any α ∈ (0, 1), we denote the Riemann-Liouville integral I α and derivative D α as Given the space inclusions above, the following assumption implies precise Hölder regularity for the integral (3.22): Assumption 3.25. There exist χ ∈ (1, 2) and γ > 1/θ(χ) for which K is continuous from L 2 to I γ+ 1 2 ,2 and from L χ to I γ,θ(χ) . • The fractional Brownian motion kernel where F is the Gauss hypergeometric function, also satisfies this assumption with the same parameters as above [ From now on, we only consider the measurable version of the stochastic integral. Although this theorem was proved in a one-dimensional setting, it also covers multi-dimensional stochastic Volterra integrals by considering their components individually and summing them.

Large and moderate deviations.
For each ε > 0 consider the stochastic Volterra equation which was studied in [30] without the ε-dependence, and where the coefficients live in the same spaces as those from (2.1). To complete the non-convolution setup we also need the following condition.

APPLICATION TO ROUGH VOLATILITY
We now show how our results (Theorems 3.11, 3.20 and 3.29) apply to a large class of models recently developed in mathematical finance. Originally proposed by Comte and Renault [26] with financial econometrics applications in mind, rough volatility models were rediscovered later in the context of option pricing in [2,6,47,51], developed and extended widely, and have now become the new standards of volatility modelling. They usually take the following form: where both X and Y are one-dimensional, K 1 , K 2 ∈ L 2 (T : R + ) and B and W are two standard Brownian motions with d B, W t = ρ dt, for some correlation parameter ρ ∈ (−1, 1). We further define ρ := 1 − ρ 2 , and set X 0 = 0 without loss of generality. Here X denotes the logarithm of a stock price process, and Σ(Y ) its instantaneous volatility. We adopt a slight abuse of notation, as X previously denoted the multidimensional system, but writing now X as the log-stock price is consistent with the mathematical finance literature and should not create any confusion. We summarise in Table 1 the most common rough volatility models used in mathematical finance, indicating where their asymptotic behaviours were covered, and where our framework not only encompasses those, but fills the gaps so far missing. As discussed below, our application to the rough Heston model is conditional on the latter to have a unique pathwise solution, a problem that remains open so far. The detailed analysis of these cases is then provided in Section 4.2 in the small-time case, and in Section 4.3 for their tail behaviours.  IF  IF  IF  IF  IF  IF  IF  IF  IF  IF  IF  Y ε  CF  CF  IF  IF  CF  CF  CF  CF  IF  CF  IF  TABLE 1. Summary of rough volatility results and form of the rate functions (CF=closed-form; IF=integral form; OP=optimisation problem; shadowed cells are new contributions from this paper). σ corresponds to the implied volatility, defined precisely in Section 4.1.3.

Small-time rescaling (general).
In the small-time case, we need to assume some scaling behaviour for the kernel functions. We say that a function f : R → R is homogeneous of degree α ∈ R if f (λx) = λ α f (x) holds for all x, λ ∈ R. Since K 1 is homogeneous of degree ̟, then and so Assumption 2.3 is satisfied with γ = 1 + 2̟ ∈ (0, 2], and likewise for K 2 with γ = 2H. Under this assumption, the rescalings X ε t := ε H− 1 2 X εt and Y ε t := Y εt turn (4.1) into so that we are precisely in the framework of (2.1) with d = 3, ϑ ε = ε H , where, similarly to Example 2.7, the additional dimension allows to handle the two different kernels. Note that σ ε does not depend on ε but encodes the correlation. The controlled equation (3.4) for the second component reads for each t ∈ T, ε > 0 and v ∈ A. Note that the dynamics of X ε do not feed back into Y ε and that Σ ∈ S |Σ| {2} in the sense of Definition 2.5. The following assumption stands throughout this section: • Σ is either of linear growth or such that for all p ≥ 1, N > 0 and ε > 0 small enough, • the equation for Y ε in (4.2) is pathwise unique for small enough ε > 0.
The choice of kernel K 2 is a common setup in rough volatility models and allow for more explicit results. These conditions ensure that H2 holds with limit coefficients b ≡ (0, 0, 0) ⊤ and σ = σ ε . Furthermore, Y ε is an autonomous subsystem in the sense of Definition 2.6 and H3 and the bound (3. 6) hold. An pathwise unique solution of the system (4.1) exists since X ε is explicit from Y ε , and H4 is satisfied.

Large deviations.
For each control v ∈ S N with N > 0, the limit equation (3.5) of the volatility in the large deviations regime reads From the uniform bound on ϕ derived in Remark 3.14 and the continuity of Σ, we obtain that |Σ(ϕ t )| is uniformly bounded in t ∈ T and in v ∈ S N , hence (3.7) holds. Therefore Assumption 3.9 and H1 -H4 follow from Assumption 4.2. Mimicking the fractional integral notation from Section 3.5, we introduce for convenience the notations (f ), f ∈ L 1 , and the fractional derivative D is defined in (3.21). From now on, to simplify the statements, we write Z ε ∼ LDP(I, ε −1 ) to express that the family of random variables {Z ε } ε>0 satisfies an LDP with rate function I and speed ε −1 , as ε tends to zero.
if φ ∈ AC 0 and infinity otherwise; (L 1 ) and infinity otherwise; In this small-time behaviour case, we recover the same scaling as in [42,43].

Proof.
(L1) As discussed above, the assumptions of Theorem 3.11 are satisfied, so that the three-dimensional process (X ε , Y ε , Z ε ), where Z ε ≡ 0 for all ε > 0, satisfies an LDP with rate function is continuous so that the contraction principle [33, Theorem 4.2.1] yields an LDP for (X ε , Y ε ) with rate function inf J(φ, ϕ, ψ) : ψ ≡ 0 = J(φ, ϕ, 0), which corresponds to I. (L2) Since the map (X ε , Y ε ) → X ε is continuous, the claim follows from the contraction principle. (L3) Projecting the pathwise large deviations (L2) onto the last coordinate point t = 1 is equivalent to applying the contraction principle, and the claim follows immediately. (L4) A direct application of Theorem 3.11 yields an LDP with rate function Inverting it as above ends the proof and (L5) follows from the contraction principle.
We observe that in some special cases one can reach a more explicit expression for I.
if φ ∈ AC 0 and ϕ ∈ I (L 1 ), reverting the integral which defines φ gives because whenever Σ(ϕ t ) = 0, although u is not uniquely determined by φ, the optimal choice of control (the one minimising the cost) is u t = 0, see [27, Remark 2.3] for more details. In the uncorrelated case ρ = 0, the same reasoning for the equation that ϕ solves yields for all t ∈ T Furthermore, in the special case where ζ(ϕ t ) = 0 almost everywhere the equality above holds almost everywhere, which is sufficient for the optimisation problem (because of the correlation, v may not be equal to zero even if ζ(ϕ) is). Plugging these into the rate function yields the claim. The last condition stands because if φ / ∈ AC 0 or ϕ / ∈ I H+ 1 2 y0 (L 1 ) then they cannot satisfy the equations and therefore the infimum takes place over an empty set.
Notice that the first inequality is always satisfied if H ≤ 1 2 . Therefore, Assumptions 4.1, 4.2, 4.5 imply H1 -H8 and Assumptions 2.3 and 3.9. Similarly to the LDP case, and recalling the definition of MDP from the introduction, we write Z ε ∼ MDP β (Λ, l ε ) if in fact ε β−H (Z ε −Z) ∼ LDP(Λ, l ε ), for any l ε > 0 converging to zero as ε tends to zero, where Z is the limit in distribution of Z ε . We also denote the subset of W d of absolutely continuous functions by AC, and AC 0 := {φ ∈ AC, φ 0 = 0}, and refer to (3.21) for the definition of the Riemann-Liouville fractional derivative.
if φ ∈ AC 0 and ϕ ∈ I (L 1 ), and infinity otherwise; t dt, if φ ∈ AC 0 and infinity otherwise; , for x ∈ R; (L 1 ) and infinity otherwise; (M1) As discussed above, the assumptions of Theorem 3.20 are satisfied, thus it yields an MDP with rate function and inverting it as in the LDP case gives the claim. (M2) The contraction principle implies that an MDP for X ε holds with rate function Λ X (φ) = inf Λ(φ, ϕ) : ϕ ∈ I , then the rate function translates to which can be solved as a variational problem as in [59,Corollary 2.4]. The corresponding Euler-Lagrange equation readsψ = ρφ/ Σ(y 0 ) henceψ = ρφ/ Σ(y 0 ) becauseψ 0 = 0 by definition. Plugging into the above equation finishes the proof. (M3) The rate function is given by contraction principle as Setting T = 1 the optimal path under the constraint φ 1 = x is φ t = xt by the Euler-Lagrange equation. Again, plugging it into the rate function ends the proof. (M4) Theorem 3.20 gives an MDP for Y ε with rate function Inverting it yields (M4). (M5) By contraction principle one obtains Λ Y 1 (y) = inf Λ Y (ϕ), ϕ 1 = y . Then setting ψ as in (M2) it boils down to the same optimisation problem as for (M3).
As in the large deviations results, (M1), (M2) and (M4) correspond to pathwise statements, whereas (M3) and (M5) are finite-dimensional results about the marginal distributions. For the log-stock price, (M3) corresponds precisely to the moderately-out-of-the-money regime presented and justified in [46] (for diffusion volatility models), based on the observation that the range of observable strikes grows with maturity. Furthermore, one can always apply Theorem 3.20 in the degenerate case Σ(y 0 )ζ(y 0 ) = 0 although the rate functions take slightly different forms.

Implied volatility asymptotics.
We can easily deduce from the above results the asymptotic behaviour of the implied volatility, a standard norm for quoting option prices. For each maturity t ≥ 0 and log-moneyness k ∈ R, the implied volatility σ(t, k) is the unique non-negative solution to C BS t, k, σ(t, k) = C(t, k), where C BS corresponds to the price of a European Call option under the Black-Scholes model, and C a given Call option price (for example in a rough volatility model). This notion is only well defined if the underlying stock price is a true martingale, which we have not assumed so far, and may require additional conditions on the coefficients. This will be the case though in all our examples below, but for now, with the current level of generality, we assume it: , if k < 0.
Meanwhile in the Black-Scholes model with constant volatility σ > 0 the log-price process satisfies X t = − σ 2 t 2 + σB t for all t ∈ T, and simple Gaussian computations yield the large deviations behaviour The claim then follows directly from [49,Corollary 7.1], and by symmetry for the case k < 0.
(MDP) Following the same arguments as above, we obtain lim t↓0 σ t, kt , if k > 0, Plugging in the expression of Λ X 1 from (M3) finishes the proof.
This concludes the presentation of the general results for rough volatility models. The next sections display the diversity of the models found in the literature and how large and moderate deviations principles apply to them. [56] is an extension of the classical Stein-Stein volatility model [73] to the fractional setting. It corresponds to (4.1) with K 1 ≡ 1 (hence ̟ = 0), K 2 (t) = t H− 1 2 /Γ(H + 1 2 ), H ∈ (0, 1 2 ), y 0 > 0, Σ(y) = y 2 , b(y) = κ(θ − y), κ, θ > 0 and ζ(y) ≡ ξ > 0. The coefficients are Lipschitz continuous and well-behaved, hence Assumptions 4.2 and 4.5 are easily checkable and the limit equation (4.4) has a unique solution, hence Propositions 4.3 and 4.6 apply. Note that because ζ is a positive constant, Corollary 4.4 gives the rate function I in integral form and one can solve (L5) in closed-form using the Euler-Lagrange equation in a similar way as in the proof of Proposition 4.6. Furthermore, since Y is Gaussian its exponential moments are finite and Novikov's condition [60, Section 3.5.D] ensures that Assumption 4.7 holds. Therefore, Corollary 4.8 yields the small-time behaviour of the implied volatility. Notice that the LDP and MDP for this model still hold when replacing the Riemann-Liouville kernel with the standard fractional Brownian motion by virtue of Theorem 3.29. The pathwise LDP for this model was first derived in [56] albeit with the different scaling X ε t := ε H− 1 2 +2β X εt and Y ε t := ε β Y εt , for β > 0.

Rough
Bergomi. The rough Bergomi model as presented in [6] reads with V 0 > 0 and a ∈ R. A pathwise LDP for this model first appeared in [58] using the Freidlin-Wentzell approach and a tailored proof. This case is quite intricate because the exponential does not satisfy the linear growth bound but we circumvented this issue by introducing the notion of autonomous system, illustrated in Example 2.7 and completed by Assumption 3.9 and H3b. Not only does this framework unifies the result of [58] with other rough volatility models, but it also leads to a pathwise MDP. With Y := log(V ), the system (X, Y ) fits into (4.1) where K 1 (t) = t 2H−1 , K 2 (t) = t H− 1 2 /Γ(H + 2 dW s is a Gaussian process with exponentional moments bounded uniformly in t ∈ T, and for each v ∈ A N , N > 0: almost surely, by Cauchy-Schwarz inequality. Therefore sup t∈T E exp Y ε,v t is finite, yielding the claim. Moreover, the volatility equation is explicit so we shall not be concerned with uniqueness and the rest of Assumptions 4.2 and 4.5 is straightforward to check. This implies that Propositions 4.3 and 4.6 apply, and so does Corollary 4.4. Again, Theorem 3.29 guarantees that the LDP and MDP still hold when K 2 is replaced with the non-convolution fractional Brownian motion kernel. Gassiat [50] showed that, if ρ ≤ 0, then the stock price process is a true martingale, ensuring that Assumption 4.7 holds, and implied volatility asymptotics thus follow from Corollary 4.8. [41] the rough Heston model fits into the framework of (4.1) with K 1 (t) = K 2 (t) = t H− 1 2 /Γ(H+ 1 2 ), for H ∈ (0, 1 2 ), y 0 > 0, Σ(y) = y, b(y) = κ θ−y , κ > 0, θ ≥ 0 and ζ(y) = ξ √ y, ξ ∈ R d . Linear growth and local Hölder continuity of the coefficients clearly hold.

Rough Heston. As introduced in
The weak existence and uniqueness was proved in [1], however the square-root coefficient brings an issue for pathwise uniqueness of the SVE. We assume here that there exists a set U of coefficients (H, κ, θ, ξ, ρ, y 0 ) such that pathwise uniqueness indeed stands. The only known result so far is due to [75] in the smooth case H = 1 2 . We also recall that pathwise uniqueness was proved for ζ(y) = y γ where γ > 1 2H+1 in [67], but does not encompass the square root case. Therefore Assumption 4.2 holds in those two cases. On a heuristic note remark that, even if pathwise uniqueness fails, there is a unique candidate for G ε since there exists a unique strong solution until the first hitting time of zero. The issue is it may not satisfy the SVE anymore after that time, but should be consistent for small-time LDP. Moreover, uniqueness of the limit equation (4.4) only holds up to first hitting time of zero. Hence we will make use of the uniqueness relaxation presented in Section 3.1 and similar arguments as in Example 3.8 to prove that Assumption 3.3 holds. The suggested rate function (3.2) reads now We emphasise that ϕ above solves the Volterra equation , that ∆ h K * L is non-decreasing but in fact in this special (rough) case, it is strictly increasing. Indeed, the authors show that in the general case, for all 0 ≤ s ≤ t ≤ T , which is positive because K > 0 and L is decreasing. Furthermore K is decreasing and L > 0 thus where the equality holds by definition. Let ϕ t = y 0 + t 0 K(t − s)ξ √ ϕ s v s ds =: y 0 + (K * z)(t), where z is trivially a semimartingale, hence from [1, Equation (2.15)]: which is strictly positive because because y 0 , ϕ ≥ 0 and the two lines before. Now let us suppose there exists an interval [t, t + h] ⊂ T on which ϕ = 0. Then which is a contradiction. Hence no such interval exists and the claim follows.

Remark 4.10.
This argument works for any rough kernel but not for the diffusion case H = 1 2 . We refer to [35,Proposition 3.3] for the latter.
The previous lemma allows to invert the integrals as showed in the proof of Corollary 4.4 and yields a more explicit form for I: if the integral is well defined, φ ∈ AC 0 , ϕ ∈ I (L 1 ), and I = +∞ otherwise. We can now prove the following: Proof. Note that any solution ϕ of (4.5) is non-negative. Let (φ, ϕ) be such that I(φ, ϕ) is finite. Then, for each δ > 0, define ϕ δ t := ϕ t + δt H+ 1 2 such that ϕ δ is strictly positive. Therefore from definition (3.21) we have: , which entails convergence as δ goes to zero, uniformly on T. Now define the control and v belongs to L 2 since I(φ, ϕ) is finite. Then for each δ > 0 the control v δ defined as is also in L 2 because ½ ϕt>0 = 1 almost everywhere. Furthermore, for all t ∈ D, lim δ↓0 ϕ δ t −1 = (ϕ t ) −1 and therefore lim δ↓0 v δ t = v t . Let P (ϕ) denote the term between brackets in (4.6) divided by 2ρ 2 , which is non-negative by design since it corresponds to ϕ(u 2 +v 2 ). Therefore, by Lemma 4.9, where the first integrand is smaller than P (ϕ) t /ϕ t for all t ∈ D and this upper bound belongs to L 1 by assumptions. Hence the dominated convergence theorem implies that the first integral goes to zero. From the calculations above we deduce that P (ϕ) t − P (ϕ δ ) t tends to zero uniformly as δ goes to zero, hence the second integrand converges pointwise and, for δ small enough, is dominated by P (ϕ)/ϕ. A second application of DCT yields convergence of the integral, and the claim follows.
Therefore the large and moderate deviations from Propositions 4.3 and 4.6 apply if the coefficients belong to U and H ∈ (0, 1 2 ). Observe that Proposition 4.6(M3) agrees with [42, Section 3.5], although the routes taken differ significantly. El Euch and Rosenbaum [40,Appendix B] showed that Assumption 4.7 is satisfied, and the implied volatility behaviour thus follows from Corollary 4.8.

4.2.4.
Multi-factor rough Bergomi. Let W be an R m+1 -Brownian motion, Z := Z (1) , · · · , Z (m) † where and we allow K (j) ∈ L 2 (T, R + ), j ∈ 1, m , to be homogeneous of different degrees H j − 1 2 with H j ∈ (0, 1]. Therefore, the variance of Z is proportional to for all t ∈ T. Assume without loss of generality that the H j are ordered by increasing values, then we will design the rescaling at the speed ε −2H1 . Denote m ⋆ := max{j ∈ 1, m : H j = H 1 }. Let U and V be m-dimensional square matrices, and Y an m-dimensional process defined for all t ∈ T by The log-price reads As we will shift each BM by ε −H1 v (j) , we notice that ε Hj −H1 U ij goes to zero if H j > H 1 , i.e. if j > m ⋆ . It means that the roughest component(s) (the one(s) with H 1 ) will outweigh the others, and only the former will make a contribution to the rate function.
Although similar to its one-dimensional counterpart, this model does not fit into the framework of (4.1). Regarding the assumptions of Theorems 3.11, we only check H3b and Assumption 3.9 because the others are standard and similar to the one-dimensional case. Clearly (Y ε, (1) , · · · , Y ε,(m) ) is an autonomous subsystem. As a Gaussian process, UZ has exponential moments of all orders and for all N > 0, j ∈ 1, m and v (j) ∈ A N : thus exp Y ε,(i),v ∈ L p (Ω) for all p ≥ 1. Therefore the bound (3.6) and Assumption 3.9 are satisfied. This estimate also checks that (3.17) and thus H8 stand. Since ϑ ε = ε H1 , we define for the moderate deviations regime h ε = ε β , β ∈ (0, H 1 ).
where for all φ ∈ W and ϕ ∈ W m : where for all φ ∈ W and ϕ ∈ W m : Proof. The LDP is a direct application of Theorem 3.11 and the MDP of Theorem 3.20.
One can also recover the LDP and MDP for (X ε ) ε>0 as well as the small-time LDP and MDP by contraction principle, as in Propositions 4.3 and 4.6.
If U is lower triangular (i.e. U ij = 0 for all i < j), for instance if it arises from the Cholesky decomposition of a covariance matrix, then for all φ ∈ W, ϕ ∈ W m , one can derive the vector v recursively, followed by u. Note that if m ⋆ < m, ϕ may not be attainable by the restrained number of controls {v (j) , j ∈ 1, m ⋆ }.  .7): Remark 4.14. We can similarly consider multidimensional versions of the other models presented in this chapter and derive large and moderate deviation principles. We only work out the computations for the multi-factor rough Bergomi model because it is the most relevant in the literature.

Tail rescaling.
We now investigate tail rescalings, which generally have the form X ε = εX, such that an LDP provides asymptotic estimates on P(X ε ≥ 1) = P(X ≥ ε −1 ). The MDP for the whole system is not available in this case because Y := lim ε↓0 Y ε ≡ 0 hence the limit equation for X ε , arising from (3.16), would be independent of the control. Note that the theory does not break down but the rate function is trivial (equals zero at zero and +∞ everywhere else). Furthermore, the exponential function prevents the study of such a rescaling in the rough Bergomi model.

Rough Stein-Stein.
This model was defined in Section 4.2.1, but with the rescaling Y ε t := εY t and X ε t := ε 2 X t , the system becomes where the coefficients are identical to the small-time case. Although the rescaling is different, Assumption 2.3 and H1 -H4 are easily satisfied in a similar way, the limit equation (3.5) has a unique solution, and therefore Theorem 3.11 applies.
Corollary 4.15. The following hold: and infinity otherwise.
Proof. For (L1), Theorem 3.11 entails that the rate function is Inverting the integrals as in Corollary 4.4 to obtain the unique controls and using D H+ 1 2 I 1 = I 1 2 −H yields the claim. Similarly to the small-time case, (L2) follows from the contraction principle, and one only needs to fix t ∈ T to prove (L3).
One can prove an LDP for Y ε in a similar way; a more interesting problem is the moderate deviations setting. Recall that an MDP for the couple (X ε , Y ε ) would have a trivial rate function because the limit equation of X ε is independent of the control. However, since the diffusion coefficient of Y ε is constant equal to ξ, one can obtain an MDP for Y ε . More surprisingly, the limit equations in the large deviations (3.5) and moderate deviations (3.16) regimes coincide, which leads to identical rate functions. Notice that ϑ ε = ε in this example, and therefore let h ε = ε −β where β ∈ (0, 1). Proof. From (4.8), b ε (y) = κ(εθ − y) converges to b(y) = −κy and the diffusion coefficient is constant, hence H2 -H6 are easily satisfied. Moreover, b ε − b ≡ εκθ and ε 1−(H−β) tends to zero therefore H7 and H8 also hold. Theorem 3.20 thus yields an MDP with rate function Inverting the integral yields the claim.

Rough Heston.
After the rescaling Y ε t := ε 2 Y t and X ε t := ε 2 X t , this model introduced in Section 4.2.3 takes the form Clearly H1 -H3 hold and we recall that U is the set of coefficients such that pathwise uniqueness, and hence H4, hold. We appeal to the uniqueness relaxation in the same way as the small-time case to prove the following result, which extends [27, Theorem 1.1] to the rough case.
if φ ∈ AC 0 and infinity otherwise.
The proof is similar to the small-time case. The potential rate function for the couple is The same arguments that were used to prove Lemmata 4.9 and 4.11 in the small-time case can be applied again here. They entail that Assumption 3.3 holds and hence Theorem 3.11 applies, and the form of the rate function in (L1) follows by inverting the relationships between (u, v) and (φ, ϕ). (L2) and (L3) follow from the same steps as in Corollary 4.15.

Implied volatility asymptotics.
We can also obtain implied volatility asymptotics since, by the same arguments as before, exp(X ε ) is a martingale in both the rough Stein-Stein and rough Heston models.
Corollary 4.18. In both the rough Stein-Stein and rough Heston models, for each t ∈ T, the implied volatility σ satisfies where I X t is the respective rate function, given in Corollaries 4.15(L3) and 4.17(L3). Proof. Mapping ε 2 to 1/k we have from Corollaries 4.15 and 4.17 respectively that, for each t ∈ T, In the Black-Scholes model with constant volatility σ > 0, we can directly compute and, similarly to the small-time case, the proof follows from [ Theorem 3.5(i) entails the existence of φ ∈ G 0 v,N such that the subsequence y : R + → R defined as y(ε) := inf α<ε x(α) has a subsequence converging to E [F (φ)]. By definition lim inf ε↓0 x ε = lim ε↓0 y ε , which implies that y has a limit in [−∞, +∞] and by uniqueness this limit must be E [F (φ)]. Therefore we deduce: which suggests the potential rate function I defined in (3.2) and concludes the proof of the lower bound.
Then we prove the Laplace principle upper bound, for all F ∈ C b (W d : R): Assume that the right-hand side is finite otherwise there is nothing to prove. Fix ǫ > 0 and let φ ∈ W d such that Since F is continuous at φ, there exists δ ∈ (0, ǫ) such that |F (φ) − F (ϕ)| ≤ ǫ for all ϕ ∈ W d such that φ − ϕ T ≤ δ. If φ is uniquely characterised then the proof is the same as in [13]. Otherwise, by Theorem 3.5(iii), we can choose φ δ uniquely characterised such that φ − φ δ T ≤ δ and I(φ) − I(φ δ ) ≤ δ, which implies I(φ) + F (φ) − I(φ δ ) − F (φ δ ) ≤ 2ǫ. Hence, combining inequalities we obtain Moreover, there exist {v n } n∈N in L 2 such that (3.3) is satisfied with φ δ and m ≥ 1/ǫ such that and therefore the remainder of the upper bound proof unfolds identically. Along the subsequence {ε n } n≥0 , G εn (W + ε −1 n · 0 v m t dt) converges in distribution to φ δ by item (i). Using the variational representation formula (1.3) and the convergence we obtain The following lemma (Lemma A.1) yields a uniform bound in both n ∈ N and t ∈ T for f n t . Taking the limit as n goes to infinity and using Fatou's lemma concludes the first part of the proof.
where R is the (non-positive) resolvent of second kind of − K [1, Equation (2.11)], proving the lemma.
If only H3b and Assumption 3.9 hold with an autonomous sub-system Υ (see Definition 2.6), then by the previous calculations for all l ∈ Υ, the components (X ε,v ) (l) satisfy the bound (3.8) because their coefficients have linear growth. Then we turn our attention to the components (X ε,v ) (i) , i / ∈ Υ. Using (3.6) and Hölder's inequality as in (A.2), we obtain that for all 1 ≤ j ≤ d such that K ij = 0 and for all 1 ≤ k ≤ m: for some C 2 > 0. Applying the same calculations to the other terms and summing all the coefficients we fall back on (A.3). Taking the limit and applying Fatou's lemma again conclude the proof.
A.3. LDP tightness: Proof of Lemma 3.15. Let us fix p > 2 ∨ 2/γ, N > 0, a family {v ε } ε>0 in A N and ε > 0. For clarity we will write b u := b ε (u, X ε,v u ) and σ u := σ ε (u, X ε,v u ) for all u ∈ T. Then, for all 0 ≤ s < t ≤ T , using Cauchy-Schwarz and BDG inequalities as in the previous proof we obtain: In a first step we assume that all the coefficients satisfy the linear growth condition H3a.  ≤ C 2 (t − s) γp/2 , for some C 1 , C 2 > 0 independent of ε, v ε , s, t. Again, if there are components such that only H3b holds with Assumption 3.9 then following the example of (A.4) yields the same result. Then Kolmogorov continuity theorem asserts that X ε,v ε admits a version which is Hölder continuous on T of any order α < γ/2 − 1/p, uniformly in ε > 0 because C 2 does not depend on ε, and which satisfies (3.9). Furthermore, Aldous theorem [10,Theorem 16.10] states that the sequence {X ε,v ε } ε>0 is tight. T 0 |v s | 2 ds : v ∈ L 2 , φ t = x 0 + t 0 K(t − s) b(s, φ s ) + σ(s, φ s )v s ds are compact. Fix N > 0 and consider an arbitrary sequence J := {φ n } n∈N ⊂ L N ; we will show that there exists a converging subsequence the limit of which belongs to L N . Interestingly enough, the proof parallels, in a deterministic context, the proofs of bound, Hölder continuity and convergence of X ε,v . Relative compactness. According to Arzelà-Ascoli's theorem, the family J is relatively compact in W d if and only if {φ n t } is bounded uniformly in n ∈ N and in t ∈ T and J is equicontinuous.
Moreover, for all n ∈ N and all t ∈ T, there exists v n ∈ L 2 such that 1 2 T 0 |v n t | 2 dt ≤ N and φ n ∈ G 0 v n , which means v n ∈ S 2N and φ n t = x 0 + t 0 K(t − s) b(s, φ n s ) + σ(s, φ n s )v n s ds.
Hence Remarks 3.14 and 3.16 grant the uniform bound and equicontinuity respectively. Therefore J is relatively compact which entails that L N is relatively compact for any N > 0. Closure. Let {φ n } n∈N be a converging sequence of L N and denote its limit by φ ∈ W d . The controls v n associated to φ n through (A.5) belong to S 2N which is a compact space with respect to the weak topology. Hence there exists a subsequence {n k } k∈N such that v n k converges weakly in L 2 to a limit v ∈ S 2N and lim k↑∞ φ n k = φ. Now let us prove that φ ∈ L N . For clarity we replace n k by n from now on. The convergence as n goes to +∞ and the continuity of the paths entail sup n∈N sup t∈T |φ n t | + |φ t | < +∞, such that the paths lie in compact subsets of R d and H2 asserts that uniform continuity of the coefficients b and σ hold. Therefore they admit continuous moduli of continuity that we respectively name ρ b and ρ σ . Using Cauchy-Schwarz inequality and H3 we get that for all t ∈ T:  Let p ≥ 2, N > 0, v ∈ A N , ε > 0 and t ∈ T. Starting from (3.15), we use Cauchy-Schwarz and BDG inequalities to obtain 3. MDP weak convergence: Proof of Lemma 3.23. We have shown in Lemma 3.22 that for any subsequence {ε k } k∈N , {η ε k ,v ε k } k∈N and {v ε k } k∈N are tight as families of random variables with values in W d and S N respectively. By Skorohod representation theorem we can work with almost sure convergence for the purpose of identifying the limit. Hence there exists a subsubsequence, denoted hereafter η k , v k , that converges almost surely in the product topology on W d × S N to some W d × S N -valued limit (η 0 , v) in a possibly different probability space (Ω 0 , F 0 , P 0 ) as n tends to +∞. We also denote ε k , b k , σ k , X k 0 , Θ k along this subsequence. The convergence of the couple also takes place in distribution, and we follow the same method as in the LDP case which comes from [22]. For all t ∈ [0, T ], let Ψ t : S N × W d → R such that The modulus of continuity of σ is only available on compact sets of T × R d so we define a constant M > X T and introduce the following sets, for each k ∈ N: with the observation that lim k↑∞ P(E k ) = 1 thanks to the previous argument. Since X is uniformly bounded, Θ k t (ω) ≤ 2M for all t ∈ T, ω ∈ E k , k ≥ 0. Therefore using Cauchy-Schwarz inequality, where we will use the localisation to obtain convergence in the first term and Hölder continuity in the second. Let us assume for the moment that linear growth H3a holds. We use that Θ k is uniformly bounded by 2M in E k and linear growth for both σ k and σ to obtain which tends to zero as k goes to infinity because of H2 for the first term and because P(Ω \ E k ) tends to zero for the second. Moreover, by H6, there exists δ > 0 such that σ is locally δ-Hölder continuous thus there exist C 2M > 0 such that which tends to zero. Finally, linear growth leads to (B.7) sup s∈T E ½ E c k σ s, Θ k s − σ(s, X s ) which also tends to zero because P(Ω\E k ) tends to zero. If only H3b with Assumption 3.9 hold then a different bound depending on (2.2) would replace those in (B.6) and (B.7), by noticing that X = G 0 (0). In both cases the above estimates tend to zero as k tends to infinity, hence E [IV k ] converges towards zero. Finally, {V k } k∈N is uniformly bounded across k ≥ 0 as (B.3) shows, thus h −p ε k V k tends to zero. We have proved that lim k↑∞ E Ψ t (v k , η k ) = 0, and this entails that the limit η 0 satisfies (3.16) P 0 -almost surely, for all t ∈ T. Since η 0 has continuous paths, this holds for all t ∈ T, P 0 -almost surely and the solution is unique therefore we conclude that η 0 = ψ. Every subsequence has a subsequence for which this convergence holds therefore η ε,v ε converges weakly towards ψ as ε goes to zero.