Fluctuations for mean-field interacting age-dependent Hawkes processes

The propagation of chaos and associated law of large numbers for mean-field interacting age-dependent Hawkes processes (when the number of processes n goes to +$\infty$) being granted by the study performed in (Chevallier, 2015), the aim of the present paper is to prove the resulting functional central limit theorem. It involves the study of a measure-valued process describing the fluctuations (at scale n --1/2) of the empirical measure of the ages around its limit value. This fluctuation process is proved to converge towards a limit process characterized by a limit system of stochastic differential equations driven by a Gaussian noise instead of Poisson (which occurs for the law of large numbers limit).


Introduction
In the recent years, the self-exciting point process known as the Hawkes process [21] has been used in very diverse areas. First introduced to model earthquake replicas [25] or [33] (ETAS model), it has been used in criminology to model burglary [32], in genomic data analysis to model occurrences of genes [20,38], in social networks analysis to model viewing or popularity [4,12], as well as in finance [2,3]. We refer to [26] or [46] for more extensive reviews on applications of Hawkes processes.
Part of our analysis finds its motivation in the use of Hawkes processes for the modelling in neuroscience. They are used to describe spike trains associated with several neurons (see e.g. [11]). In that case, it is common to consider a multivariate Fluctuations for age-dependent Hawkes processes framework : multivariate Hawkes processes consist of multivariate point processes (N 1 , . . . , N n ) whose intensities are respectively given for i = 1, . . . , n by where Φ : R → R + is called the intensity function and h j→i is the interaction function describing the influence of each point of N j on the appearance of a new point onto N i , via its intensity λ i . Notice that we implicitly assume here that there is no influence of the possible points of N j that are before time 0.
In the present paper, as in [9], we study a generalization of multivariate Hawkes process by adding an age dependence. Definition 1.1. For any point process N , we call predictable age process associated with N , the (non negative) process defined by S t− := t − sup{T ∈ N, T < t} = t − T Nt− , for all t > 0, (1.2) and extended by continuity in t = 0. In particular, its value in t = 0 is entirely determined by N ∩ R − and is well-defined as soon as there is a point therein.
In comparison with the standard multivariate Hawkes processes (1.1), we add an age dependence, as it is done in [9], by assuming that the intensity function Φ in (1.1) (which is then denoted by Ψ to avoid confusion) may also depend on the predictable age process (S i t− ) t≥0 associated with the point process N i , like for instance We refer to [9] where the neurobiological motivation for such a form of intensity is given. Under suitable assumptions, it is shown in [9] that a multivariate point process satisfying (1.3) exists and we call it an age dependent Hawkes process (ADHP). Furthermore, ADHPs are well approximated, when the dimension n goes to infinity, by i.i.d. limit point processes of the McKean-Vlasov type whose stochastic intensity depends on the time t and on the age [9, Theorem 4. 1.]. More precisely, the intensity of the limit process associated with the framework (1.3), denoted by N , is given by the following implicit formula λ t = Ψ(S t− , t 0 h(t − z)E λ z dz) where (S t− ) t≥0 is the predictable age process associated with N .
As usual with McKean-Vlasov dynamics, the asymptotic evolution (when n goes to infinity) of the distribution of the population at hand can be described as the solution of a nonlinear partial differential equation (PDE). In our case, it is shown that, starting from a density, the distribution of the limit predictable age process (S t− ) t≥0 , denoted by u t , admits a density (supported on R + ) for all time t ≥ 0 which is furthermore the unique solution of the non-linear system ∂u (t, s) ∂t + ∂u (t, s) ∂s + Ψ (s, X(t)) u (t, s) = 0, u (t, 0) = s∈R+ Ψ (s, X(t)) u (t, s) ds, (1.4) with initial condition that u(0, ·) = u 0 (the initial density of the age at time 0), where for all t ≥ 0, X(t) = Fluctuations for age-dependent Hawkes processes The relation between mean-field age dependent Hawkes processes and the PDE system (1.4) is completed by a law of large numbers (consequence of the functional law of large numbers [9,Corollary 4.5.]): the following convergence holds in probability for random variables in P(R + ), (1.5) Moreover, the rate of this convergence is at least n −1/2 . In light of this bound obtained on the rate of convergence, the fluctuation process defined, for all t ≥ 0, by η n t = √ n(µ n St− −u t ) is expected to describe, on the right scale, the second order term appearing in the expansion of the mean-field approximation, the first order term being given by the law of large numbers.
The study of the random fluctuations allows to go beyond the first order mean field limit and its main drawback: propagation of chaos. It means independence of the neurons' activities which is unrealistic from the biological viewpoint [43,15]. Hence, the derivation of the second order term is of great importance regarding neural networks modelling since it gives an approximation of the fluctuations coming from the finiteness of the number of neurons n (finite size effects) [7,8,28]. A partial but promising answer to this problematic is given by highlighting a stochastic partial differential equation system which could be interpreted as an intermediate modelling scale between the microscopic scale given by ADHP and the macroscopic one given by (1.4).
Following the approach developed in [16,17], we prove in the present article that the fluctuations satisfy a functional central limit theorem (CLT) in a suitable distributional space: the limit of the normalized fluctuations is described by means of a stochastic differential equation in infinite dimension driven by a Gaussian noise in comparison with the Poisson noise appearing in [9]. To do so, we regard the fluctuation process η n as taking values in a Hilbert space, namely the dual of some Sobolev space of test functions. The index of regularity of the dual space, in one-to-one correspondence with the regularity of the test functions in the Sobolev space, is prescribed by the tightness property we are able to provide to the sequence (η n ) n≥1 and by the form of the generator of the limiting McKean-Vlasov dynamics identified in [9]. Let us precise that this generator is the one associated with the renewal dynamics of the system (1.4) as highlighted by Proposition 2.4 given hereafter.
Although the choice of this index of regularity is rather constrained, the choice of the domain supporting the Sobolev space is somewhat larger. Indeed, two options are available, depending on the way we consider the process η n , either over a finite time horizon, namely (η n t ) 0≤t≤θ for some θ ≥ 0, or in infinite horizon, namely (η n t ) t≥0 . In the first case, we may use the fact that there exists a compact K θ (which is growing with θ) such that η n t is supported in K θ for all t in [0, θ]. Hence, one could regard, for all θ ≥ 0, the fluctuation process (η n t ) 0≤t≤θ as a process with values in the dual of a standard Sobolev space of functions with support in K θ . The main drawback of such an approach is that the space of trajectories within which the CLT takes place depends on the time horizon θ. To bypass this issue, one may be willing to work directly on the entire positive time line R + , but then, it is not possible anymore to find a compact subset K supporting the measures η n t , for all t ≥ 0, since ∪ θ≥0 K θ = R + . A convenient strategy to sidestep this fact is to use a Sobolev space supported by the entire R + . Yet, standard Sobolev spaces supported by R + fail to accommodate with our purpose, since, as made clear by the proof below, constant functions are required to belong to the space of test functions. Therefore, instead of a standard Sobolev space, we may use a weighted Sobolev space, provided that the weight satisfies suitable integrability properties. In order to state our CLT on the whole time interval, the second approach is preferred. Furthermore, the weights of the Sobolev spaces are chosen to be polynomial (see Section 4.1 below). This choice is quite convenient because Sobolev spaces with polynomial weights are well-documented in the literature. In particular, results on the connection between spaces weighted by different powers, Sobolev embedding theorems and Maurin's theorem, are well-known. It is worth noting that, provided that constant functions can be chosen as test functions, the precise value of the power in the polynomial weight of the Sobolev space does not really matter in our analysis: more generally, a different choice of family of weights would have been possible and, somehow, it would have led to a result equivalent to ours. In this regard, we stress, at the end of the paper, the fact that our result in infinite horizon is in fact equivalent to what we would have obtained by implementing the first of the two approaches mentioned above instead of the second one: roughly speaking, one can recover our result by sticking together the CLTs obtained on each finite interval of the form [0, θ], for θ ≥ 0; conversely, one can prove, from our statement, that, on any finite interval [0, θ], the CLT holds true in the dual space of a standard Sobolev space supported by K θ .
The Hilbertian approach used in this article has been already implemented in the diffusion processes framework [17,24,27,29]. Let us mention here what are the main differences between these earlier results and ours: • Under general non-degeneracy conditions, the marginal laws of a diffusion process are not compactly supported. The unboundedness of the support imposes the choice of weighted Sobolev spaces even in finite time horizon. In this framework, Sobolev spaces with polynomial weights are especially adapted to carry solutions with moments that are finite up to some order only. In that case, the choice of the power in the weight is explicitly prescribed by the maximal order up to which the solution has a finite moment. As already mentioned, this differs from our case: in the present article, the particles (namely, the ages of the neurons) are compactly supported over any finite time interval and thus, have finite moments of any order. Once again, this is the reason why the choice of the power, and more generally of the weight, in the Sobolev space is much larger.
• Unlike point processes, diffusion processes are time continuous. Also, their generator is both local and of second order, whereas the generator for the point process identified in the mean-field limit in [9] is both of the first order and nonlocal. As a first consequence, the indices of regularity of the various Sobolev spaces used in this paper differ from those used in the diffusive framework. Also, the space of trajectories cannot be the same: although the limit process in our CLT has continuous trajectories, we must work with a space of càdlàg functions in order to accommodate with the jumps of the fluctuation process. Surprisingly, jumps do not just affect the choice of the functional space used to state the CLT (namely space of càdlàg versus space of continuous functions) but it also dictates the metric used to estimate the error in the Sznitman coupling between the age-dependent Hawkes process and its mean-field limit (which is also a point process). Indeed, the standard trick used for diffusion processes that consists in getting stronger estimates for the Sznitman coupling by considering L p -norms, for p > 2, is not adapted to point processes. Therefore, we develop a specific approach by providing higher order estimates of the error in the Sznitman coupling in the total variation sense. Up to our knowledge, this argument is completely new.
Let us mention that the fluctuations of jump processes have been the object of previous publications [16,30,39,44]. However, the CLTs are established in the fluid limit, namely small jumps at high frequency so that the jumps vanish at the limit. The techniques EJP 22 (2017), paper 42.

Page 4/49
http://www.imstat.org/ejp/ developed in those articles are useless here since the framework of the present article does not fall into the fluid limit framework: in our case, the limit processes are also jump processes.
Finally, let us mention the PhD thesis of Tran [42] where a Markovian age-structured population model is approximated by a von Foerster-McKendrick PDE system in the large population limit and a functional central limit theorem is derived. There, the system is not mass conservative (the solution of the PDE is not a probability) which brings some technical difficulties and the lack of a canonical limit process. In this respect, the main contribution of the present paper is to get rid of the Markovian assumption by the use of Sznitman's coupling argument and estimates in total variation.
The present paper is organized as follows. The model is described in Section 2. Then, the main estimates required in this work are given in Section 3. These can be seen as the extension, to higher orders, of the estimates used in [9] to get the bound n −1/2 on the rate of the convergence (1.5). These key estimates are used to prove tightness for the distribution η n in a Hilbert space that is the dual of some weighted Sobolev space. Under regularity assumptions on the intensity function Ψ and the interaction function h, we finally prove in Section 5.2 the convergence of the fluctuation process which states our CLT. Furthermore, its limit is characterized by a system of stochastic differential equations, driven by a Gaussian process with explicit covariance, and involving an auxiliary process with values in R (Theorem 5.12). Finally, the CLT is applied to give some justification to a stochastic partial differential equation which can be seen as a better approximation than the PDE system (1.4) in the mean-field limit.

General notations
• Statistical distributions are referred to as laws of random variables to avoid confusion with distributions in the analytical sense that are linear forms acting on some test function space.
• The space of bounded functions of class C k , with bounded derivatives of each order less than k is denoted by C k b .
• The space of càdlàg (right continuous with left limits) functions is denoted by D.
• For µ a measure on E and ϕ a function on E, we denote µ, ϕ := E ϕ(x)µ(dx) when it makes sense.
• If a quantity Q depends on the time variable t, then we most often use the notation Q t when it is a random process in comparison with Q(t) when it is a deterministic function.
• We say that the quantity Q n (σ), which depends on an integer n and a parameter σ ∈ R d , is bounded up to a locally bounded function (which does not depend on n) by f (n), denoted by Q n (σ) σ f (n), if there exists a locally bounded function g : R d → R + such that, for all n, |Q n (σ)| ≤ g(σ)f (n). • Throughout this paper, C denotes a constant that may change from line to line.

Definitions and propagation of chaos
In all the sequel, we focus on locally finite point processes, N , on (R, B(R)) that are random countable sets of points of R such that for any bounded measurable set A ⊂ R, the number of points in N ∩ A is finite almost surely (a.s.). The associated points define an ordered sequence (T n ) n∈Z . For a measurable set A, N (A) denotes the number of points of N in A. We are interested in the behaviour of N on (0, +∞) and we denote t ∈ R + → N t := N ((0, t]) the associated counting process. Furthermore, the point measure associated with N is denoted by N (dt). In particular, for any non-negative EJP 22 (2017), paper 42.

Page 5/49
http://www.imstat.org/ejp/ measurable function f , R f (t)N (dt) = i∈Z f (T i ). For any point process N , we call age process associated with N the process (S t ) t≥0 given by S t = t − sup{T ∈ N, T ≤ t}, for all t ≥ 0. (2.1) In comparison with the age process, we call predictable age process associated with N the predictable process (S t− ) t≥0 given by S t− = t − sup{T ∈ N, T < t}, for all t > 0, (2.2) and extended by continuity in t = 0. Notice that these two processes take values in the state space R + .
We work on a filtered probability space (Ω, F, (F t ) t≥0 , P) and suppose that the canonical filtration associated with N , namely ( is an F-local martingale. Informally, λ t dt represents the probability that the process N has a new point in [t, t + dt] given F t− . Under some assumptions that are supposed here, this intensity process exists, is essentially unique and characterizes the point process (see [6] for more insights). In particular, since N admits an intensity, for any t ≥ 0, the probability that t belongs to N is null. Moreover, notice the following properties satisfied by the age processes: • the two age processes are equal for all t ≥ 0 except the positive times T in N (almost surely a set of null measure in R + ), • for any fixed t ≥ 0, S t− = S t almost surely (since N admits an intensity), • and the value S 0− = S 0 is entirely determined by N ∩ R − and is well-defined as soon as there is a point therein.
The exact behaviour of N ∩ R − is not of great interest in the present article. We only assume that there is a point in it almost surely such that S 0− = S 0 is well-defined. Furthermore, we assume that the random variable S 0 admits u 0 as a probability density.

Parameters and list of assumptions
The definition of an age dependent Hawkes process (ADHP) is given bellow, but let us first introduce the parameters of the model: • a positive integer n which is the number of particles (e.g. neurons) in the network (for i = 1, . . . , n, N i represents the occurrences of the events, e.g. spikes, associated with the particle i); • a probability density u 0 ; • an interaction function h : R + → R; • an intensity function Ψ : For sake of simplicity, all the assumptions made on the parameters are gathered here: The probability density u 0 is uniformly bounded with compact support so that there exists a constant C > 0 such that S 0 ≤ C almost surely (a.s. For all s ≥ 0, the function Ψ s : y → Ψ(s, y) is of class C 2 . Furthermore, || ∂Ψ ∂y || ∞ := sup s,y | ∂Ψ ∂y (s, y)| < +∞ and || ∂ 2 Ψ ∂y 2 || ∞ < +∞. The constant || ∂Ψ ∂y || ∞ is denoted by Lip(Ψ).
For all y in R, the function s → Ψ(s, y) belongs to C 4 b and y → ||Ψ(·, y)|| C 4 b is locally bounded.
Remark 2.1. Note that: • the assumptions regarding the intensity function Ψ are rather technical, neverthe- These four assumptions also appear in [9], where they are used to prove propagation of chaos as

Already known results
Below is given the definition of an ADHP by providing its representation as a system of stochastic differential equations (SDE) driven by Poisson noise.
be a family of counting processes such that, for i = 1, .., n, and all t ≥ 0, where (S i t− ) t≥0 is the predictable age process associated with N i . Then, (N i ) i=1,..,n is an age dependent Hawkes process (ADHP) with parameters (n, h, Ψ, u 0 ).
Here, we give a brief overview of the results obtained in [9] in order to set the context of the present article. We expect ADHPs to be well approximated, when n goes to infinity, by i.i.d. solutions of the following limit equation, where Π(dt , dx) is an F-Poisson measure on R 2 + with intensity 1 and (S t− ) t≥0 is the predictable age process associated with N where S 0 is distributed according to u 0 .
Under Assumption (A LLN ), [9, Proposition 3.7.] states existence and uniqueness of the limit process N . In particular, there exists a continuous function λ : R + → R (which depends on the parameters h, Ψ and u 0 ) such that if (N t ) t≥0 is a solution of (2.5) then E[N (dt)] = λ(t)dt. Let us define the deterministic function γ by, for all t ≥ 0, (2.5). Furthermore, the limit predictable age process (S t− ) t≥0 is closely related to the PDE system (1.4). Proposition 2.4 ([9, Proposition 3.9.]). Under Assumption (A LLN ), the unique solution u to the system (1.4) with initial condition that u 0 is such that u(t, ·) is the density of the age S t− (or S t since they are equal a.s.).
Once the limit equation is well-posed, following the ideas of Sznitman in [41], it is easy to construct a suitable coupling between ADHPs and i.i.d. solutions of the limit equation (2.5). More precisely, consider • a sequence (S i 0 ) i≥1 of i.i.d. random variables distributed according to u 0 ; • a sequence (Π i (dt , dx)) i≥1 of i.i.d. F-Poisson measures with intensity 1 on R 2 + .
Under Assumption (A LLN ), we have existence of both ADHPs and the limit process N .

Remark 2.5.
Notice that the coupling above is based on the sharing of common initial conditions (S i 0 ) i≥1 and a common underlying randomness, that are the F-Poisson measures (Π i (dt , dx)) i≥1 . Note also that the sequence of ADHPs is indexed by the size of the network n whereas the solutions of the limit equation which represent the behaviour under the mean field approximation are not.

What next? The purpose of the present paper
As a straight follow-up to the convergence of the empirical measure µ n St , we are interested in the dynamics of the fluctuations of this empirical measure around its limit.
For any t ≥ 0, S The analysis of the coupling (Equation (2.9)) gives a rate of convergence at least in n −1/2 so we want to find the limit law of the fluctuation process defined, for all t ≥ 0, by Notice that η n t is a distribution in the functional analysis sense on the state space of the ages, i.e. R + , and is devoted to be considered as a linear form acting on test functions ϕ by means of η n t , ϕ .

Estimates in total variation norm
The bound (n −1/2 ) on the rate of convergence, given by (2.9), is not sufficient in order to prove convergence or even tightness of the fluctuation process η n . Some refined estimates are necessary. For instance, when dealing with diffusions, one looks for higher order moment estimates on the difference between the particles driven by the real dynamics and the limit particles (see [17,24,27,29] for instance). Here, we deal with pure jump processes and, up to our knowledge, there is no reason why one could obtain better rates for higher order moments. A simple way to catch this fact is by looking at the coupling between the counting processes. Indeed, the difference between two counting processes, say δ n,i In order to accommodate this fact, the key idea is to estimate the coupling (2.7)-(2.8) in the total variation distance. Hence, the estimates needed in the next section (and proved in the present section) are the analogous of higher order moments but with respect to the total variation norm, i.e. the probabilities for all positive integer k and real number θ ≥ 0.
The heuristics underlying the result stated below, in Proposition 3.1, relies on the asymptotic independence between the k age processes (S n,k if they were independent then we would have (remind (2.9)), which is exactly the rate of convergence we find below.

Remark 3.2.
In addition to the explanation given in the beginning of this section, let us mention that the analogous to the higher moment estimates obtained for diffusions is obtained here for the difference between γ n t and γ(t). Indeed, as k grows, the convergence of ξ Denote by A B the symmetric difference of the sets A and B. Then, for any i ≤ n, let us define ∆ n,i := N n,i ∆N i that is the set of points that are not common to N n,i and Then, the intensity of the point process ∆ n,i is given by λ ∆,n,i since counting processes take value in N. For any positive integers k and p, let us denote, for all n ≥ k, which will end the proof thanks to (3.2). First, note that the case k = 1 and p = 1 is already treated. Indeed, [9, Theorem 4.1.] gives Then, note that for any two positive integers p and q, This is due to the fact that counting processes take value in N. The rest of the proof is divided in two steps: initialization and inductive step.
Step one For k = 1 and p a positive integer, it holds that The right-hand side of (3.6) involves integrals of predictable processes, that are the (∆ n,1 t− ) p , with respect to a point measure under which it is convenient to take expectation. More precisely, since (∆ n,1 t− ) p ≤ (∆ n,1 t− ) p as soon as 0 < p ≤ p − 1, it holds that Yet the intensity λ ∆,n,1 t is bounded by ||Ψ|| ∞ and ε Step two For all integers k ≥ 2 and p ≥ 1, one can generalize the argument used to prove (3.6) in order to end up with Hence, thanks to the exchangeability of the processes (∆ n,i ) i=1,...,n and the predictability of the integrated processes, we have where we used that (∆ n,1 t− ) p ≤ (∆ n,1 t− ) p as soon as 0 < p ≤ p − 1. On the one hand, using that λ ∆,n,1 On the other hand, we use (A Ψ y,C 2 ) which gives the following bound on the intensity, Hence the first expectation in (3.8) is bounded by The second term of (3.9) is convenient to use a Grönwall-type lemma. To deal with the first term, we use a trick involving the exchangeability of the particles. Indeed, using the exchangeability we can replace each of the without modifying the value of the expectation since the sums are taken on disjoined indices. Hence, using for the second line a generalization of Hölder's inequality with k exponents equal to 1/k, we have Yet, computations given in Section A.1 give the two following statements: there exists a constant C(k) which does not depend on n or p such that Gathering (3.8), (3.9), (3.10) and (3.13) gives (remind that ε and so the Grönwall-type Lemma B.1 gives ε (k,p) n (θ) (θ,k,p) n −k/2 which ends the proof thanks to (3.2).

Tightness
The aim of this section is to prove tightness of the sequence of the laws of (η n ) n≥1 regarded as stochastic processes (in time) with values in a suitable space of distributions.
Thus, we consider (η n t ) t≥0 as a random process with values in the dual space of some well-chosen space of test functions. In Section 4.1, we give the definition of these spaces of test functions. Following the Hilbertian approach developed in [17], we work with weighted Sobolev Hilbert spaces. Finally, the tightness result is stated in Theorem 4.11.
The following study takes benefit of the Hilbert structure of the Sobolev spaces considered. Let us state here the Aldous tightness criterion for Hilbert space valued stochastic processes (cf. [23, p. 34-35]) used in the present paper. Let H be a separable Hilbert space. A sequence of processes (X n ) n≥1 in D(R + , H) defined on the respective filtered probability spaces (Ω n , F n , (F n t ) t≥0 , P n ) is tight if both conditions below hold true: for every ε 1 , ε 2 > 0 and θ ≥ 0, there exists δ * > 0 and an integer n 0 such that for all (F n t ) t≥0 -stopping time τ n ≤ θ, Note that (A 1 ) is implied by the condition (A 1 ) stated below which is much easier to ensue.
where the notation → K means that the embedding is compact and E n denotes the expectation associated with the probability P n .
The fact that (A 1 ) implies (A 1 ) is easily checked: by compactness of the embedding, closed balls in H 0 are compact in H so, Markov's inequality gives (A 1 ).

Preliminaries on weighted Sobolev spaces
Here are listed some definitions and technical results about the weighted Sobolev spaces used in the present article. To avoid confusion, let us stress the fact that the test functions we use are supported in the state space of the ages, namely R + . For any EJP 22 (2017), paper 42.
Page 13/49 http://www.imstat.org/ejp/ integer k and any real α in R + , we denote by W k,α 0 := W k,α 0 (R + ) the completion of the set of compactly supported (in R + ) functions of class C ∞ for the following norm where the notation → means that the embedding is continuous.
Let C k,α be the space of functions f on R + with continuous derivatives up to order k such that, for all k ≤ k, Recall that C k b is the space of bounded functions of class C k with bounded derivatives of every order less than k. Notice that C k b = C k,0 as normed spaces. Denote by C −k b its dual space. For any α > 1/2 and any integer k (so that We recall the following Sobolev embeddings (see [17, Section 2.1.]): (i) Sobolev embedding theorem: W m+k,α 0 → C k,α for m ≥ 1, k ≥ 0 and α in R + , i.e. there exists a constant C such that ||f || C k,α ≤ C||f || m+k,α . for m ≥ 1, k ≥ 0, α in R + and β > 1/2, where H.S. means that the embedding is of Hilbert-Schmidt type 4 . In particular, the embedding is compact and there exists a constant C such that ||f || k,α+β ≤ C||f || k+m,α . Hence, the following dual embeddings hold true: In some of the proofs given in the next section, we consider an orthonormal basis (ϕ j ) j≥1 of W k,α 0 composed of C ∞ functions with compact support. The existence of such a basis follows from the fact that the functions of class C ∞ with compact support are dense in W k,α 0 . Furthermore, if (ϕ j ) j≥1 is an orthonormal basis of W k,α 0 and w belongs to W −k,α 0 , then ||w|| 2 −k,α = j≥1 w, ϕ j 2 thanks to Parseval's identity. Let us precise that we stick with the notation (ϕ j ) j≥1 even if the space W k,α 0 (in particular the regularity k) may differ from page to page. The three lemmas below are useful throughout the analysis. 4 Here, it means that Proof. The first assertion follows from the definition of || · || 2,α , and the second one follows from Leibniz's rule and the definition of || · || k,α .
Let us denote R (for reset ) the linear mapping defined by Rϕ := ϕ(0) − ϕ(·) where ϕ is some test function. This mapping naturally appears in our problem since the age process jumps to the value 0 at each point of the underlying point process, as it appears below in Proposition 4.5.

Lemma 4.2.
For any integer k ≥ 1 and α > 1/2, the linear mapping R is continuous from W k,α 0 to itself.
Proof. The function Rϕ only differs from ϕ by a constant so the derivatives of Rϕ are equal to the derivatives of ϕ. Hence, using the convexity of the square function, we have  • we want to be able to consider functions of C k b as test functions: indeed, Ψ must be considered as a test function, in Equation (5.6) below for instance, yet we do not want Ψ to be compactly supported with respect to the age s or even to rapidly decrease when s goes to infinity. The natural space to which Ψ belongs is some C k b space, • in order to ensue criterion (A 1 ), a compact embedding is required but Maurin's theorem does not apply for standard Sobolev spaces on R + (see [1,Theorem 6.37]).
In order to apply Lemma 4.2 and to satisfy the first point in the remark above, the weight α is assumed to be greater than 1/2 in all the next sections so that (4.2) holds true.

Decomposition of the fluctuations
Here, we give a semi-martingale representation of η n used to simplify the study of tightness (recall that R is defined above in Lemma 4.2).
with L z ϕ(s) = ϕ (s) + Ψ(s, γ(z))Rϕ(s) for all z ≥ 0 and s in R, where γ is defined by (2.6), Rϕ(S n,i z− ) λ n,i z − Ψ(S n,i z− , γ(z)) .  Remark 4.6. To avoid confusion, let us mention that (4.8) defines M n t and A n z as distributions acting on test functions. More precisely, we show below that they can be seen as distributions in W −2,α 0 (Proposition 4.7). However, we do not use the notation for the dual action ·, · to avoid tricky notation involving several angle brackets in (4.9) for instance.
The proof of Proposition 4.5 relies on the integrability properties of the stochastic intensity and is given in Appendix A.2.

Estimates in dual spaces
Below are stated estimates of the terms η n , A n and M n -appearing in (4. in [24] for instance). Usually, like in [17,24,27,29], the weight is linked to the maximal order of the moment estimates obtained on the positions of the particles. Here, the age processes are bounded in finite time horizon (remind (2.3)) so the weight α of the Sobolev space can be taken as large as wanted. The weighted Sobolev spaces are nevertheless interesting here since, in particular, the distribution η n t belongs to W −1,α 0 for all t ≥ 0 (see Proposition 4.7 below). We refer to the introductory discussion in Section 1 for complements on the usefulness of the weights.
We first give estimates in the smaller space W −1,α 0 . This is later used in order to prove tightness (remember condition (A 1 ) of the Aldous type criterion stated on page 13).   E ||A n t || 2 −2,α < +∞.  The proof of Proposition 4.7 is given in Appendix A.3 and mainly relies on the estimates given in Lemma 4.3. However, let us mention that: • the following expansion is used in the proof of (iii) as well as in Section 5.1: using that λ n,i t = Ψ(S n,i t− , γ n t ) and (A Ψ y,C 2 ), it follows from Taylor's inequality that for ϕ in W 2,α 0 , with the rests satisfying |r n,i t | ≤ sup s,y | ∂ 2 Ψ ∂y 2 (s, y)||γ n t − γ(t)| 2 /2. This upper-bound does not depend on ϕ. Let us denote Γ n t− := √ n(γ n t − γ(t)) and where L * z is the adjoint operator of L z .   Furthermore, the Doob-Meyer process (< < M n > > t ) t≥0 associated with the square integrable F-martingale (M n t ) t≥0 satisfies the following: for any t ≥ 0, < < M n > > t is the linear continuous mapping from W 2,α 0 to W −2,α 0 given, for all ϕ 1 , ϕ 2 in W 2,α 0 , by This last equation can be retrieved thanks to the polarization identity from (4.9).
Yet, to give sense to Equation (4.18), we need the lemma stated below. The first condition is immediate. The second one follows from the controls we have shown.
Indeed, on the one hand, it follows from Equation We deduce from Equation (4.13) that Hence, taking the expectation in both sides of the inequality above and applying Proposition 4.7 (remind (4.5)), we get (4.20). Starting from (4.18) and using that the integrals are continuous from Lemma 4.9 and M n is càdlàg from Proposition 4.7-(ii), it follows that η n is càdlàg.

Tightness result
Using the estimates proved in Section 4.3, the tightness criterion stated on page 13 can be checked. On the one hand, condition (A 2 ) holds for (M n ) n≥1 as soon as it holds for the trace of the processes (< < M n > >) n≥1 given below (4.18) [23, Rebolledo's theorem, p. 40]. Let (ϕ k ) k≥1 be an orthonormal basis of W 2,α 0 . Let θ ≥ 0, δ * > 0 and δ ≤ δ * . Furthermore, let τ n be an F-stopping time smaller than θ.
This last bound is arbitrarily small for δ * small enough which gives condition (A 2 ) thanks to Markov's inequality.
On the other hand, using decomposition (4.18) and the fact that (M n ) n≥1 is tight, it suffices to show the tightness of the remaining terms (R n t = η n 0 + t 0 L * z η n z dz + t 0 A n z dz) n≥1 in order to show tightness of (η n ) n≥1 . Yet, using Equation (4.19), we have where C depends on θ and δ * . Then, Proposition 4.7 implies that sup n≥1 E[||R n τn+δ − R n τn || 2 −2,α ] ≤ Cδ * for δ * small enough. Finally, Markov's inequality gives condition (A 2 ) for (R n ) n≥1 and so the tightness of (η n ) n≥1 .

Remark 4.12.
For any α > 1/2, every limit (with respect to the convergence in law) where we used the fact that (u t ) t≥0 is continuous in W −2,α 0 (see Lemma B.2). Since almost surely there is no common point to any two of the point processes (N n,i ) i=1,...,n , there is, almost surely, for all t ≥ 0, at most one of the 1 t∈N n,i which is non null. Then, , which gives the desired convergence to 0.

Characterization of the limit
The aim of this section is to prove convergence of the sequence (η n ) n≥1 by identifying the limit fluctuation process η as the unique solution of a SDE in infinite dimension. We first prove, in Section 5.1, that every possible limit process η satisfies a certain SDE (Theorem 5.6). Then, we show, in Section 5.2, that this SDE uniquely characterizes the limit law, which completes the proof of the convergence in law of (η n ) n≥1 to η.

Candidate for the limit equation
In this section, the limit version of Equation (4.18) is stated. Apart from η n , there are two random processes in (4.18) that are A n and M n . The following notation encompasses the source of the stochasticity of both A n and M n and is mainly used in order to track the correlations between those two quantities: for all n ≥ 1, let W n be the W −1,α 0 -valued martingale defined, for all t ≥ 0 and ϕ in W 1,α 0 , by Notice that M n t (ϕ) = W n t (Rϕ). Furthermore, as for M n , the Doob-Meyer process (< < W n > > t ) t≥0 associated with (W n t ) t≥0 satisfies the following: for any t ≥ 0, < < W n > > t is the linear continuous mapping from W 2,α 0 to W −2,α 0 given, for all ϕ 1 and ϕ 2 in W 2,α 0 , by All the results given for M n in the previous section can be extended to W n . In particular, the sequence (W n ) n≥1 is tight in D(R + , W −2,α 0 ).

(5.2)
Next, we prove that it converges towards the Gaussian process W defined below. where u is the unique solution of (1.4).

Remark 5.2.
We refer to the PhD manuscript of the author [10] for the existence and uniqueness in law of such a process W . Yet, let us mention here that the process W defined above does not depend on the weight α in the sense that the definition is consistent with respect to the weights. Indeed, say W α and W β are two processes is the sense of Denote by 1 : R + → R the constant function equal to 1 (which belongs to W 2,α 0 since we assume α > 1/2) and note that W n t (1) is the rescaled canonical martingale associated with the system of age-dependent Hawkes processes, namely Below, we use the fact that this rest term converges to 0 in L 1 norm: indeed, recall that |r n,i t | |γ n t − γ(t)| 2 (5.4) and, thanks to Proposition 3.1, Since Γ n t− (as part of A n t (ϕ)) only appears in (4.18) as an integrand and is only discontinuous on a set of Lebesgue measure equal to zero, we can replace it by its càdlàg version denoted by Γ n t . Let us consider the decomposition Γ n t = Υ 1 (λ n,i z − Ψ(S n,i z− , γ(z)))dz, where we used, in the last line, the fact that µ n Sz− = µ n Sz for almost every z in R + , and λ(z) = u z , Ψ(·, γ(z)) .
Based on Assumption (A Ψ y,C 2 ), as for Equation (4.14), one can give the Taylor expansion of the term On the one hand, gathering the decomposition (4.7) with (4.15) and on the other hand gathering Γ n t = Υ 1 t + Υ 2 t + Υ 3 t with the Taylor expansion of Υ 2 t give that (η n , Γ n ) satisfies the following closed system for all ϕ in W 2,α 0 , where the rest term R n,(2) z is defined by Once again, notice that Γ n z− , which naturally appears in the first integral term of (5.6), is replaced by its càdlàg version Γ n z since they are equal except on a null measure set.
Let us denote V n t := where R * denotes the adjoint of R.
The proof of Corollary 5.4 uses Billingsley tightness criterion for real-valued stochastic processes and is given in Appendix A.5.
Before taking the limit n → +∞ in the system (5.5)-(5.6), we state the tightness of (Γ n ) n≥1 . Nevertheless, let us first mention that we use the following estimates: as a consequence of Proposition 3.1, for all k ≥ 0 and θ ≥ 0, since sup t∈[0,θ] E |Γ n t | k = sup t∈[0,θ] E |Γ n t− | k because the underlying point processes admit intensities so that there is almost surely no jump at time θ.

Proposition 5.5. Under (A TGN ) and (A h
Höl ), the sequence of the laws of (Γ n ) n≥1 is tight in D(R + , R). Furthermore, the possible limit laws are supported in C(R + , R) and satisfy, for all k ≥ 0, sup The proof of Proposition 5.5 uses Aldous tightness criterion for real-valued stochastic processes and is given in Appendix A.6.
Both sequences (η n ) n≥1 and (Γ n ) n≥1 are tight with continuous limit trajectories. Tightness of (η n , Γ n ) n≥1 hence follows and we are now in position to give the system satisfied by any limit (η, Γ).

Theorem 5.6. Under (A TGN ) and (A h
Höl ), for all α > 1/2, any limit (η, Γ) of the sequence (η n , Γ n ) n≥1 is a solution in C(R + , W −2,α 0 × R) of the following system (formulated in The proof of Theorem 5.6 consists in proving continuity properties to apply the continuous mapping theorem. It is given in Appendix A.7. we have the convergence of the real-valued random variables η n 0 , ϕ = √ n µ n S0 − u 0 , ϕ by applying the standard central limit theorem since the initial conditions are i.i.d.

Uniqueness of the limit law
The next step in order to prove convergence of the sequence (η n , Γ n ) n≥1 is to prove uniqueness of the solutions of the limit system (5.9)-(5.10). Since the system is linear, the standard argument is to consider the system satisfied by the difference between two solutions and show that its unique solution is trivial. Let (η, Γ) and (η,Γ) be two solutions associated with the same "noise" W and the same initial condition η 0 . Denote byη := η −η andΓ := Γ −Γ the differences. Then, (η,Γ) is a solution of the following The standard follow-up is to use Grönwall's lemma. Let us show here why it is not sufficient in our case. For instance, assume we want to prove that ||η|| −3,α = 0: heuristically, when applied to (5.12), Grönwall's argument gives that |Γ t | is bounded by some use this bound forΓ in (5.11), Grönwall's argument cannot be applied since the term t 0 η z , L z ϕ dz involves ||η z || −2,α which is greater than the desired norm ||η z || −3,α . This problem cannot be bypassed by upgrading the regularity as we have done before to deal with the fact that the operator L z reduces the regularity of the test functions.
Since the main limitation comes from the differential part of the operator L z , let us consider L z as the sum of the first order differential operator plus a perturbation. More precisely, let L : ϕ → ϕ and G t : ϕ → Ψ(·, γ(t))Rϕ so that L t = L + G t . Let us present here the heuristics behind the argument we use to bypass the issue induced by the differential operator L: instead of studying the time derivative d dt η t , ϕ in (5.11), the idea is to find some family of test functions (ϕ t ) t≥0 such that η t , d dt ϕ t = − η t , Lϕ t ; thus the differential operator L vanishes in d dt η t , ϕ t and Grönwall's argument can be applied.
Notice that these shift operators are linked with the method of characteristics applied to a transport equation with constant speed equal to 1 which is exactly the dynamics described by the differential operator L. Below are given some bounds for the operators L, G t and τ t when acting on the space C 4 b .
is locally bounded. Then, is locally bounded.
Proof. The first two assertions follow from the definition of the norms || · || C k b . The third and last one follows from Leibniz rule. Let t ≥ t and s in R. Then, Moreover, since τ t and L commute, one has Yet, Lemma 5.9 gives that ||L(τ t−z ϕ) thus t t Lτ t−z ϕdz makes sense as a Bochner integral in C 3 b as soon as ϕ is in C 4 b . Hence, in the proof below we use the Proof. Let (η, Γ) and (η,Γ) be two solutions of (5.9)-(5.10) in C(R + , W −2,α 0 ×R) associated with the the same "noise" W and the same initial condition η 0 . Denote byη := η −η andΓ := Γ −Γ the differences. Since α > 1/2, we have W −2,α dz. Now, let ϕ be in C 4 b and use (5.13) and the fact thatη is in W −2,α (5.14) The linearity of the operators allows to write Then, the idea is to use Fubini's theorem to exchange the two integrals t 0 and t t . On the one hand,  On the other hand, Now, for any z in [0, t], Equation (5.11) with ϕ = L(τ t−z ϕ) (it is a valid test function since it belongs to C 3 b ⊂ W 3,α 0 ) gives Gathering the equation above with (5.15) and (5.16) gives Hence, using the bound we proved onΓ, we have for all ϕ in C 4 b , We are now in position to conclude with the convergence of (η n , Γ n ) n≥1 . Proof. Since (η n , Γ n ) n≥1 is tight (Theorem 4.11 and Proposition 5.5), let (η, Γ) be a limit point. According to Theorem 5.6, (η, Γ) is a solution of the limit system (5.9)-(5.10) in C(R + , W −2,α 0 × R). Finally, the law of (η, Γ) is uniquely characterized by the limit system (Proposition 5.11 gives path-wise uniqueness and so Yamada-Watanabe theorem gives weak uniqueness by the same argument as [37, Theorem IX.1.7(i)]) and uniqueness of the limit law implies convergence of (η n , Γ n ) n≥1 . Remark 5.13. As mentioned in the introduction, considering processes over finite time horizons would have lead to equivalent results. This claim is based on the fact that the limit equation (5.9) is independent of the values of the test function ϕ outside the support K t of η n t . Indeed, • on the one hand, the test function ϕ appears in the drift term, more precisely t 0 u z , ∂Ψ ∂y (·, γ(z))Rϕ Γ z dz, evaluated against the measure u z which is supported in K t ; • on the other hand, the covariance structure of the Gaussian process W implies this independence property for W t (Rϕ).
In that sense, the convergence stated for the whole positive time line R + in Theorem 5.12 implies that the central limit theorem also holds true for the process (η n t ) 0≤t≤θ as taking values in the dual of a standard Sobolev space of functions supported by K θ .
Conversely, the limit equation is consistent in time in the sense that one can recover our result by sticking together the CLTs obtained for the finite time horizon processes (η n t ) 0≤t≤θ .

Application to the "almost" derivation of an SPDE
This section focuses on a system of stochastic partial differential equations (SPDE), introduced and studied in [13] where some qualitative properties are discussed. The SPDE is a noisy version of the PDE system (1.4) and is expected to be a more precise approximation of the age-dependent Hawkes processes in a mean-field framework. The SPDE system associated with the system size n is the following ∂ũ n (t, s) ∂t + ∂ũ n (t, s) ∂s + Ψ (s, X n t )ũ n (t, s) + Ψ (s, X n t )ũ n (t, s) n ζ(t, s) = 0, u n (t, 0) = s∈R+ Ψ (s, X n t )ũ n (t, s) + Ψ (s, X n t )ũ n (t, s) n ζ(t, s)ds, (6.1) where for all t ≥ 0, X n t = t 0 h(t − z)ũ n (z, 0)dz and ζ(t, s) is a Gaussian space-time whitenoise. The important thing to note about ζ is that the W −2,α 0 -valued process defined by, for all t ≥ 0 and ϕ in W 2,α 0 , t 0 +∞ 0 ϕ(s) Ψ(s, γ(z))u(z, s)ζ(z, s)dsdz, is a Gaussian process with the same law as W defined in Definition 5.1.
Hence, at a first sight, there are some similarities between the system above and the limit system obtained for the fluctuation process, i.e. (5.9)-(5.10). Let us give here some heuristics: assume that u, the solution of the PDE system (1.4), andũ n are close to each other, and similarly for the auxiliary variables X(t) and X n t , then • the "non-noisy" spiking dynamics term Ψ (s, X n t )ũ n (t, s) is close to the mean-field spiking dynamics, appearing in the operator L t in the limit system (5.9)-(5.10), modulo an error term which is expected to appear, in a linear approximation, as the term involving the derivative ∂Ψ ∂y in (5.9); • the covariance structure of the Gaussian process W , appearing in the limit system (5.9)-(5.10), is close to the covariance structure of the noise term appearing in the SPDE system above.
What is proposed in this section is to consider the second-order approximation of the empirical measure given by the central limit theorem, namelŷ u n t := u t + n −1/2 η t ∈ W −2,α 0 (6.2) where u t = u(t, ·) is the probability distribution solution of (1.4), and show that is is an "almost" solution of the SPDE system in some sense defined below. Up to our knowledge, this kind of result is novel and deserves to be developed in this article. Let us remind that this section is devoted to an application of the CLT so Assumption (A CLT ) is supposed to hold true below. Furthermore, the stronger assumption that Ψ is in C 4 b is made.

Theoretical frame for the SPDE
Up to our knowledge, there is no theoretical frame well established for the SPDE system (6.1). A pathwise notion of solution seems to be hard to handle because of the square root term appearing in front of the Gaussian white noise. In particular, it is not trivial to show that the argument of the square root remains non negative. That is why we consider the following weak variational sense of solutions. Definition 6.1. The measure-valued process (ũ n t ) t≥0 is a solution of (6.1) if it satisfies: whereW n is a Gaussian process with Doob-Meyer process given by, < <W n > > t (ϕ 1 ), ϕ 2 = 1 n t 0 ũ n z , ϕ 1 ϕ 2 Ψ(·,γ n z ) dz, The well-posedness of such definition is not addressed here. We only stress the fact thatû n is an "almost" solution in some sense related to Definition 6.1.

Weak sense dynamics for the second-order approximation
To catch the dynamics ofû n , we somehow want to add up the dynamics of u, given by the PDE system (1.4), and the dynamics of η, given by the limit system. These two are formulated by different means, the main difference being that the PDE formulation involves (in the weak sense) bivariate test functions whereas the formulation for η involves univariate test functions: we turn to the second one in order to get a system like (6.3)-(6.5).
On the one hand, the dynamics of η is given by the limit equation (5.9) that we remind here: for all ϕ in W 3,α 0 , On the other hand, the dynamics of u is given (see [9,Theorem 3.5.]) by the weak sense formulation of the PDE system (1.4) (which is driven by the generator L t ): for all where the test function space C ∞ c,b (R 2 + ) is defined as follows, The function φ belongs to C ∞ c,b (R 2 + ) if • φ is continuous, uniformly bounded, • φ has uniformly bounded derivatives of every order, • there exists T > 0 such that φ(t, s) = 0 for all t > T and s ≥ 0.
Then, taking φ(t, s) that converges to a product function of the form ϕ(s)1 t≤T , we get that for all ϕ in C ∞ b (R + ), Combining (6.7) with the limit equation (5.9), we prove thatû n satisfies: for all ϕ in This last equation can be rewritten, with anything new but some notation, as a system in the flavour of (6.3)-(6.5) as stated in the proposition below.
Proposition 6.2. The processû n defined by (6.2) satisfies: for all ϕ in C ∞ b , û n t , ϕ − û n 0 , ϕ = t 0 û n z ,L n z ϕ dz +Ŵ n t (Rϕ) + r n t (ϕ), (6.8) whereŴ n is a Gaussian process with Doob-Meyer process given by, DM t (ϕ 1 , ϕ 2 ) := < <Ŵ n > > t (ϕ 1 ), ϕ 2 = 1 n t 0 u z , ϕ 1 ϕ 2 Ψ(·, γ(z)) dz, (6.10) The following notation is used above:Ŵ n := n −1/2 W where W is the Gaussian process of Definition 5.1 and Comparing (6.3)-(6.5) with (6.8)-(6.10), the only differences between the two systems are, the additional term r n t (ϕ) in (6.8), and the substitution of the Doob-Meyer process associated with the noise term: it should be given bŷ DM t (ϕ 1 , ϕ 2 ) := 1 n t 0 û n z , ϕ 1 ϕ 2 Ψ(·,γ n z ) dz. (6.12) As stated below, these two differences are negligeable, as n → +∞, with respect to the other terms of the system that are at least of order n −1/2 . Proposition 6.4. The distribution of the processû n is an "almost" solution of (6.3)-(6.5) in the sense that: the rest term in (6.8) is negligible since E [|r n t (ϕ)|] t n −1 ||ϕ|| 2,1 , and the covariance structures are "almost" the same since The proof of Proposition 6.4 relies on some refined versions of already established estimates and is given in Appendix A.8. The main difficulty and difference in comparison with the preceding sections is thatû n Lemma 6.5. For any ϕ in We mainly use this estimate with the intensity function Ψ: for all y 1 , y 2 , The second line of (6.13) is very useful to use some kind of Lipschitz control even when integrating with respect toû n t , which is not as direct as in the case when the integration is done with respect to a probability measure.
Proof. The first assertion is a direct consequence of the definition ofû n and the embedding (4.2). Since Ψ is in C 4 b (R + × R), the next ones are direct from To conclude, let us remind that we have shown how a second-order approximation of the empirical measure µ n t , namelyû n t since √ n(µ n t −û n t ) goes to 0 as a consequence of Theorem 5.12, can be considered as an "almost" solution of the SPDE (6.1). The next step would be to prove that the/any solution of the SPDE is a second order approximation of µ n t . To address such a question, we would first need a suitable theoretical framework to treat the well-posedness of the SPDE. This is the subject of a future work.

A.1 Proofs linked with Proposition 3.1
Proof of (3.11) For simplicity, we show that, for every m ≤ n, there exists a constant C which is independent of n, p and m such that Let us recall the multinomial formula using multi-indices q = (q 1 , . . . , q m ), Denote by k(q) the number of strictly positive indices in q. Since the q i 's are integers, |q| = k implies k(q) ≤ k. First, let us remark that, for all k = 1, . . . , k, the number of multi-indices q such that k(q) = k and |q| = k is bounded by p(k , k)m k with p(k , k) := k −1 k−1 being the number of partitions of k into exactly k parts. Indeed, the vector consisting in the k strictly positive indices forms a partition of k and there are at most m k ways to complete it by m − k zeros to build a vector of length m.
Then, using the exchangeability of the processes ∆ n,j , we have • if k(q) = k, then all the positive q i 's are equal to one and E[ • if k(q) < k, we can bound all the positive q i 's by k so that E[ Hence, using that k q ≤ k!, (A.1) holds with C = max k =1,...,k p(k , k)k! for instance.
Proof of (3.12) Let us first recall that ξ (k) n (t) = E |γ n t − γ(t)| k where γ n t and γ(t) are respectively defined below (2.7) and in (2.6). By convexity of the function x → |x| k (remind that k ≥ 2), let us consider the decomposition (N n,j (dz) − λ n,j z dz).
Using the convexity of the power function (since k/2 ≥ 1) and exchangeability, one has -Study of B n (t). Here, we use the fact that S n,j z− = S j z− with high probability and more precisely we recover the quantities ε where the last line comes from the fact that the ε (k ,k) n 's are non-decreasing functions of t. Hence, B n t (t,k) -Study of C n (t). Using the Lipschitz continuity of Ψ, Assumption (A h ∞ ), one has Yet, the λ j z 's are i.i.d. with mean λ(z) and they are bounded by ||Ψ|| ∞ . Hence, Rosenthal inequality [31] gives the existence of a constant C(k) which depends only on k and ||Ψ|| ∞ such that It then follows that D n (t) (t,k) n −k/2 . One deduces from the decomposition (A.2) and the four bounds on A n , B n , C n and D n that and so Lemma B.1 below gives the desired bound.

A.2 Proof of Proposition 4.5
By definition of η n (Equation (2.10)), Since, for all i = 1, . . . , n, the age process (S n,i t ) t≥0 is piece-wise continuous, increasing with rate 1 and jumps from S n,i t− to 0 when N n,i t − N n,i t− = 1, we have and, by definition of (u t ) t≥0 , On the other hand, expanding the square and using exchangeability of the age Since the ages S n,1 t , S 1 t , S n,2 t and S 2 t are upper bounded by M S0 + θ and (ϕ k (x 1 ) − ϕ k (x 2 ))(ϕ k (y 1 ) − ϕ k (y 2 )) = 0 as soon as x 1 = x 2 or y 1 = y 2 , we have where χ and it follows from Proposition 3.1 that sup n≥1 sup t∈[0,θ] E[ k≥1 S n t (ϕ k ) 2 ] < +∞. Finally, by convexity of the square function, ||η n t || 2 −1,α ≤ 2 k≥1 S n t (ϕ k ) 2 + T n t (ϕ k ) 2 so that (4.10) follows from the two steps above.
Proof of (ii) We first show (4.11) and then use it in order to prove that (M n t ) t≥0 is càdlàg. Let (ϕ k ) k≥1 be an orthonormal basis of W 1,α 0 composed of C ∞ functions with compact support. For all k ≥ 1, the test function ϕ k belongs to C 1 b so that (M n t (ϕ k )) t≥0 is an F-martingale (Proposition 4.5). Using Doob's inequality for real-valued martingales [ where the last inequality comes from exchangeability and boundedness of the intensity.
Noticing that Rϕ k (S n,1 z− ) = D 0,S n,1 z− (ϕ k ) and then using Lemma 4.3 as we have done in the proof of (i), it follows that which does not depend on n and gives (4.11). Moreover, gathering the integrability property given by (4.11) and the fact that, for all k ≥ 1, the process (M n t (ϕ k )) t≥0 is an F-martingale, we have that M n is a W −1,α 0 -valued F-martingale. It remains to show that (M n t ) t≥0 is càdlàg. First remark that for any k, the Fmartingale (M n t (ϕ k )) t≥0 is càdlàg. Let ε > 0 and t 0 > 0. For any n ≥ 1, M n t (ω), ϕ k 2 < +∞.
Once ω is fixed in Ω n , there exists an integer k 0 (which depends on ω) such that k>k0 sup t∈[0,t0+1] M n t (ω), ϕ k 2 < ε. Let t be such that t 0 < t ≤ t 0 + 1, using the right continuity of t → M n t (ω), ϕ k , we have, dropping ω for simplicity of notations, ε + 4ε = (k 0 + 4)ε, as soon as |t − t 0 | is small enough. Hence, t → M n t (ω) is right continuous with values in W −1,α 0 . In the same way, let (t m ) m≥1 be a sequence such that t m < t 0 and t m → t 0 . For any integers m and , we have, dropping ω for simplicity of notations, (M n tm (ϕ k ) − M n t (ϕ k )) 2 + 4ε.

A.4 Proof of Proposition 5.3
As stated in Equation (5.2), the sequence (W n ) n≥1 is tight. Then, let us consider the following decomposition, for any ϕ 1 and ϕ 2 in W 2,α 0 , < < W n > > t (ϕ 1 ), ϕ 2 − t 0 u z , ϕ 1 ϕ 2 Ψ(·, γ(z) dz = B n t + C n t , ϕ 1 (S n,i z− )ϕ 2 (S n,i z− ) λ n,i z − Ψ(S n,i z− , γ(z)) dz, C n t := t 0 µ n Sz − u z , ϕ 1 ϕ 2 Ψ(·, γ(z)) dz, where we used the fact that, almost surely, µ n Sz− = µ n Sz for almost every z in R + . The first term B n converges in L 1 to 0 by using the Lipschitz continuity of Ψ and the convergence of γ n to γ given by Proposition 3.1. From the convergence 1 n . Then, dominated convergence implies that the second term C n converges in expectation to 0. Hence, the bracket of W n (5.1) converges to the covariance (5.3) for t = t. Furthermore, as for M n (see the proof of Remark 4.12), the maximum jump size of W n converges to 0. Hence, Rebolledo's central limit theorem for local martingales [36] gives, for every ϕ 1 , . . . , ϕ k in W 2,α 0 and t 1 , . . . , t k ≥ 0, the convergence of (W n t1 (ϕ 1 ), . . . , W n t k (ϕ k )) to a Gaussian vector with the prescribed covariance (5.3). The limit law of (W n ) n≥1 is then characterized as the law of a continuous Gaussian process with covariance (5.3).

A.5 Proof of Corollary 5.4
First, the tightness (and convergence) of (R * W n ) n≥1 comes from the continuity of R * as a mapping from W −2,α 0 to W −2,α 0 which comes from the continuity of R as a mapping from W 2,α 0 to W 2,α 0 (Lemma 4.2). Then, let us show that (V n ) n≥1 is tight in D(R + , R). Assume that h(0) = 0 and extend the function h to the whole real line by the value 0 on the negative real numbers.
-(i) For all n ≥ 1, V n 0 = 0 a.s. so (V n 0 ) n≥1 is clearly tight. For any t > r ≥ 0, since h(r − z) = 0 as soon as z ≥ r, one has V n t − V n r = Let us denote, for all x ≥ 0, V n r,t (x) = x 0 [h (t − z) − h (r − z)] dW n z (1). It is a martingale with respect to x. Burkholder-Davis-Gundy inequality [40, p. 894] gives the existence of a universal constant C p such that Yet, the quadratic variation of V n r,t is given by  |c n (t + δ) − c n (t)| → 0 as δ → 0.
Then, Ascoli-Arzela theorem implies that the sequence (c n ) n≥1 is relatively compact. It only remains to identify the limit for all t ≥ 0. Yet, as a consequence of the dominated convergence and the fact that for almost every z, g n (z) → g(z), we have