Weighted dependency graphs

The theory of dependency graphs is a powerful toolbox to prove asymptotic normality of sums of random variables. In this article, we introduce a more general notion of weighted dependency graphs and give normality criteria in this context. We also provide generic tools to prove that some weighted graph is a weighted dependency graph for a given family of random variables. To illustrate the power of the theory, we give applications to the following objects: uniform random pair partitions, the random graph model $G(n,M)$, uniform random permutations, the symmetric simple exclusion process and multilinear statistics on Markov chains. The application to random permutations gives a bivariate extension of a functional central limit theorem of Janson and Barbour. On Markov chains, we answer positively an open question of Bourdon and Vall\'ee on the asymptotic normality of subword counts in random texts generated by a Markovian source.


Background: dependency graphs
The central limit theorem is one of the most famous results in probability theory : it states that suitably renormalized sums of independent identically distributed random variables with finite variance converge towards a standard Gaussian variable.
It is rather easy to relax the identically distributed assumption. The Lindeberg criterion, see e.g. [11,Chapter 27], gives a sufficient (and almost necessary) criterion for a sum of independent random variables to converge towards a Gaussian law (after suitable renormalization).
Relaxing independence is more delicate and there is no universal theory to do it. One of the ways, among many others, is given by the theory of dependency graphs. A dependency graph encodes the dependency structure in a family of random variables: roughly we take a vertex for each variable in the family and connect dependent random variables by edges. The idea is that, if the degrees in a sequence of dependency graphs do not grow too fast, then the corresponding variables behave as if independent and the sum of the corresponding variables is asymptotically normal. Precise normality criteria using dependency graphs have been given by Petrovskaya/Leontovich, Janson, Baldi/Rinott and Mikhailov [58,42,6,53].
These results are powerful black boxes to prove asymptotic normality of sums of partially dependent variables and can be applied in many different contexts. The original motivation of Petrovskaya and Leontovich comes from the mathematical modelization of cell populations [58]. On the other hand, Janson was interested in random graph theory: dependency graphs are used to prove central limit theorems for some statistics, such as subgraph counts, in G(n, p) [6,42,46]; see also [55] for applications to geometric random graphs. The theory has then found a field of application in geometric probability, where central limit theorems have been proven for various statistics on random point configurations: the lengths of the nearest-neighbour graph, of the Delaunay triangulation and of the Voronoi diagram of these random points [5,56], or the area of their convex hull [7]. More recently it has been used to prove asymptotic normality of pattern counts in random permutations [13,38]. Dependency graphs also generalize the notion of

Random permutations
The study of uniform random permutations is a wide subject in probability theory and, as for random graphs, it would be hopeless to try and do a comprehensive presentation of it. Relevant to this paper, Hoeffding [39] has given a central limit theorem for what can be called simply indexed permutation statistics. The latter is a statistic of the form where π is a uniform random permutation of size n and a (n) a sequence of real matrices with appropriate conditions.
Hoeffding's result has been extended and refined in many directions, including the following ones.
• In [12], Bolthausen used Stein's method to give an upper bound for the speed of convergence in Hoeffding's central limit theorem.
• This work has then been extended to doubly indexed permutation statistics (called DIPS for short) by Zhao, Bai, Chao and Liang [67]. Barbour and Chen [8] have then given new bounds on the speed of convergence, that are sharper in many situations. DIPS have been used in various contexts in statistics; we refer the reader to [67,8] and references therein for background on these objects.
• In another direction, Barbour and Janson have established a functional central limit theorem for single indexed permutation statistics [9].
Using weighted dependency graphs, we provide a functional central limit theorem for doubly indexed permutation statistics; see Theorem 8.7. This can be seen as an extension of Barbour and Janson's theorem or a functional version of Zhao, Bai, Chao and Liang's result (note however that, in the simply indexed case, our hypotheses are slightly stronger than the ones of Barbour and Janson and that we cannot provide a speed of convergence). There is a priori no obstruction in obtaining an extension for k-indexed permutation statistics, except maybe that the general statement and the computation of covariance limits in specific examples may become quickly cumbersome.

Stationary configuration of SSEP
The symmetric simple exclusion process (SSEP) is a classical model of statistical physics that represents a system outside equilibrium. Its success in the physics literature is mainly due to the fact that it is tractable mathematically and displays phase transition phenomena. We refer the reader to [25] for a survey of results on SSEP and related models from a mathematical physics viewpoint. The description of the invariant measure, or steady state, of SSEP (and more generally the asymmetric version ASEP), has also attracted the interest of the combinatorics community in the recent years. This question is indeed connected to the hierarchy of orthogonal polynomials and has led to the study of new combinatorial objects, such as permutation tableaux and staircase tableaux [22,21].
In this paper we prove that indicator random variables, which indicate the presence of particles at given locations in the steady state, have a natural weighted dependency graph structure. As an application we give a functional central limit theorem for the particle distribution function in the steady state, Theorem 9.4. An analogue result for the density function, which is roughly the derivative of the particle distribution function has been given by Derrida, Enaud, Landim and Olla [26]. Their result holds in the more general setting of ASEP and it would be interesting to generalize our approach to ASEP as well.

Markov chains
Our last application deals with the number of occurrences of a given subword in a text generated by a Markov source. More precisely, let (M k ) k≥0 be an aperiodic irreducible Markov chain on a finite state space S. Assume that M 0 is distributed according to the stationary distribution π of the chain and denote w n = (M 0 , M 1 , . . . , M n ). We are interested in the number of times X n that a given word v = s 1 · · · s m occurs as a subword of w n , possibly adding some additional constraints, such as adjacency of some letters of v in w n . This problem, motivated by intrusion detection in computer science and identifying meaningful bits of DNA in molecular biology, has attracted the attention of the analysis of algorithm community in the nineties; we refer the reader to [37] for detailed motivations and references on the subject.
A central limit theorem for X n was obtained in some particular cases: • when we are only counting consecutive occurrences of v, i.e. the number of factors of w n that are equal to v (see Régnier and Szpankowski [59], or Bourdon and Vallée [15] for an extension to probabilistic dynamical sources); • or when the letters M 1 , M 2 , . . . , M n of w n are independent (see Flajolet, Szpankowski and Vallée [37]).
• Another related result is a central limit theorem by Nicodème, Salvy and Flajolet [54] for the number of occurrence positions, i.e. positions where an occurrence of the pattern terminates. This statistics is quite different from the number of occurrences itself, since the number of occurrence positions is always bounded by the length of the word.
Despite all these results, the number of occurrences in the general subword case with a Markov source was left open by these authors; see [14,Section 4.4]. Using weighted dependency graphs, we are able to fill this gap; see Theorem 10.5. Note that there is a rich literature on central limit theorems for linear statistics on Markov chains (M n ) n≥0 , that is statistics of the form S f N := N i=0 f (M n ) for a function f on the state space. We refer the reader to [47] and references therein for numerous results in this direction, in particular on infinite state spaces. In [61], the authors study through cumulants linear statistics on mixing sequences (including Markov chains; Chapter 4) and multilinear statistics on independent identically distributed random variables (Chapter 5). It seems however that there is a lack of tools to study multilinear statistics on Markov chains such as the above considered subword count statistics. The theory of weighted dependency graphs introduced here is such a tool. EJP 23 (2018), paper 93.

Homogeneity versus spatial structure
It is worth noticing that the previous examples have various structures. The first three are homogeneous in the sense that there is a transitive automorphism group acting on the model. This is reflected in the corresponding weighted dependency graphs that have all equal weights.
In comparison, the last two examples have a linear structure: particles in SSEP are living on a line and a Markov chain is canonically indexed by N. For Markov chains, this is reflected in the corresponding weighted dependency graph, since the weights decrease exponentially with the distance. On the contrary, SSEP has a homogeneous weighted dependency graph (all weights are equal to 1/n), which comes as a surprise for the author and indicates a quite different dependency structure from the Markov chain setting.
The possibility to cover models with various dependency structures is, in the author's opinion, a nice feature of weighted dependency graphs.

Finding weighted dependency graphs
The proof of our normality criterion (Theorem 4.11) is quite elementary and easy. Therefore, one could argue that the difficulty of proving a central limit theorem has only been shifted to the difficulty of finding an appropriate weighted dependency graph. Indeed, proving that a given weighted graphL is a weighted dependency graph for a given family of random variables {Y α , α ∈ A} consists in establishing bounds on all joint cumulants κ(Y α ; α ∈ B), where B is a multiset of elements of A. We refer to this problem as proving the correctness of the weighted dependency graphL. Attacking it head-on is rather challenging. (The definition of joint cumulants is given in Eq. (2.2); the precise bound that should be proved can be found in Eq. (4.3), but is not relevant for the discussion here.) To avoid this difficulty, we give in Section 5 three general results that help proving the correctness of a weighted dependency graph. These results make the application of our normality criterion much easier in general, and almost immediate in some cases.
Before describing these three tools, let us observe that proving the correctness of a usual dependency graph L is usually straightforward; it is most of the time an immediate consequence of the definition of the model we are working on. Therefore the existing literature does not provide any tool for that.
1. Our first tool (Theorem 5.2) is an equivalence of the definition with a slightly different set of inequalities involving cumulants of product of random variables.
When the random variables Y α are Bernoulli random variables, we can then use the trivial fact Y m α = Y α to reduce (most of the time significantly) the number of inequalities to establish.
2. The second tool (Theorem 5.8) shows the equivalence of bounds on cumulants and bounds on an auxiliary quantity defined as At first sight, one might think that this new expression is not simpler to bound than cumulants, but its advantage is that it is multiplicative: if moments E i∈δ Y αi have a natural factorization, then P r factorizes accordingly and we can bound each factor separately. EJP 23 (2018), paper 93. 3. The third tool (Theorem 5.11) is a stability property of weighted dependency graphs by products. Namely, if we prove that some basic variables admit a weighted dependency graph, we obtain for free a weighted dependency graphs for monomials in these basic variables. A typical example of application is the following: in the random graph setting, we prove that the indicator variables corresponding to presence of edges have a weighted dependency graph and we automatically obtain a similar result for presence of triangles or of copies of any given fixed graph.
Items 1 and 3 are both used in all applications described in Section 1.2 and reduces the proof of the correctness of the relevant weighted dependency graph to bounding specific simple cumulants. For random pair partitions, random permutations and random graphs, this bound directly follows from an easy computation of joint moments and item 2 above. In summary, the proof of correctness of the weighted dependency graph is rather immediate in these cases. For SSEP, we also make use of an induction relation for joint cumulants obtained by Derrida, Lebowitz and Speer [28] (joint cumulants are called truncated correlation functions in this context). The Markov chain setting uses linear algebra considerations and a recent expression of joint cumulants in terms of the so-called boolean cumulants, due to Arizmendi, Hasebe, Lehner and Vargas [2] (see also [61,Lemma 1.1]). Boolean cumulants have been introduced in non-commutative probability theory [64,49] and their appearance here is rather intriguing.
To conclude this section, let us mention that in each case, the proof of correctness of the weighted dependency graph relies on some expression for the joint moments of the variables Y α . This expression might be of various forms: explicit expressions in the first three cases, an induction relation in the case of SSEP or a matrix expression for Markov chains, but we need such an expression. In other words, weighted dependency graphs can be used to study what could be called locally integrable systems, that is systems in which the joint moments of the basic variables Y α can be computed. Such systems are not necessarily integrable in the sense that there is no tractable expression for the generating function or the moments of X = α∈A Y α , so that classical asymptotic methods can a priori not be used. In particular, in all the examples above, it seems hopeless to analyse the moments E[X r ] by expanding them directly in terms of joint moments.

Usual dependency graphs: behind the central limit theorem.
We have focused so far on the question of asymptotic normality. However, usual dependency graphs can be used to establish other kinds of results. The first family of such results consists in refinements of central limit theorems.
• In their original paper [6] Baldi and Rinott have combined dependency graphs with Stein's method. In addition to providing a central limit theorem, this approach yields precise estimates for the Kolmogorov distance between a renormalized version of X n and the Gaussian distribution. For more general and in some cases sharper bounds, we also refer the reader to [16]. An alternate approach to Stein's method, based on mod-Gaussian convergence and Fourier analysis, can also be used to establish sharp bounds in Kolmogorov distance in the context of dependency graphs, see [34].
• Another direction, addressed in [29,35], is the validity domain of the central limit theorem.
The Gaussian law is not the only limit law that is accessible with the dependency graph approach. Convergence to Poisson distribution can also be proved this way, as EJP 23 (2018), paper 93. demonstrated in [4]; again, this result has found applications, e.g., in the theory of random geometric graphs [55].
We now leave convergence in distribution to discuss probabilities of rare events: • In [45], S. Janson has established some large deviation upper bound involving the fractional chromatic number of the dependency graph.
• Another important, historically first use of dependency graphs is the Lovász local lemma [31,65]. The goal here is to find a lower bound for the probability that X n = 0 when Y n,i are indicator random variables, that is the probability that none of the Y n,i is equal to 1. This inequality has found a large range of application to prove by probabilistic arguments the existence of an object (often a graph) with given properties: this is known as the probabilistic method, see [1, Chapter 5].

Future work
We believe that weighted dependency graphs may be useful in a number of different models and that they are worth being studied further. An application of weighted dependency graphs to the d-dimensional Ising model is given in a joint paper with Dousse [30]. In a work in progress, we also use them to study statistics in uniform set-partitions and obtain a far-reaching generalization of a result of Chern, Diaconis, Kane and Rhoades [18].
Proving the correctness of these weighted dependency graphs again use the tools from Section 5 of this paper. In the case of Ising model, we also need the theory of cluster expansions.
Another source of examples of weighted dependency graphs is given by determinantal point processes (see, e.g., [41,Chapter 4]): indeed, for such processes, it has been observed by Soshnikov that cumulants have rather nice expressions [63,Lemma 1]. This fits in the framework of weighted dependency graphs and the stability by taking monomials in the initial variables may enable to study multilinear statistics on such models. This is a direction that we plan to investigate in future work.
The results of the present article also invite to consider the following models.
• Uniform d-regular graphs: the weighted dependency graph for pair partitions presented in Section 6 gives bounds on joint cumulants in the configuration model. It would be interesting to have similar bounds for uniform d-regular graphs, especially when d tends to infinity, in which case the graph given by the configuration model is simple with probability tending to 0. The fact that joint moments of presence of edges have no simple expression for d-regular graphs is an important source of difficulty here.
• The asymmetric version of SSEP, called ASEP: finding a weighted dependency graph for this statistical mechanics model is closely related to the conjecture made in [28], on the scaling limit of the truncated correlation functions. approach to do this would be to use recent results on mod-Gaussian convergence [35,34]. Unfortunately, this requires uniform bounds on cumulants of the sum X n , which are at the moment out of reach for weighted dependency graphs in general.

Outline of the paper
The paper is organized as follows.
• Standard notation and definitions are given in Section 2.
• Section 3 gives some background about maximum spanning trees, a notion used in our bounds for cumulants.
• The definition of weighted dependency graphs and the associated normality criterion are given in Section 4.
• Section 5 provides tools to prove the correctness of weighted dependency graphs.
• The next five sections (from 6 to 10) are devoted to the applications described in Section 1.2.
• Appendices give a technical proof, some variance estimations and adequate tightness criteria for the functional central limit theorems, respectively.

Set partitions
The combinatorics of set partitions is central in the theory of cumulants and is important in this article. We recall here some well-known facts about them.
A set partition of a set S is a (non-ordered) family of non-empty disjoint subsets of S (called blocks of the partition), whose union is S. We denote by #(π) the number of blocks of π.
Denote P(S) the set of set partitions of a given set S. Then P(S) may be endowed with a natural partial order: the refinement order. We say that π is finer than π or π coarser than π (and denote π ≤ π ) if every part of π is included in a part of π .
Endowed with this order, P(S) is a complete lattice, which means that each family F of set partitions admits a join (the finest set partition which is coarser than all set partitions in F , denoted with ∨) and a meet (the coarsest set partition which is finer than all set partitions in F , denoted with ∧). In particular, there is a maximal element {S} (the partition in only one part) and a minimal element {{x}, x ∈ S} (the partition in singletons).
Lastly, denote µ the Möbius function of the partition lattice P(S). In this paper, we only use evaluations of µ at pairs (π, {S}), i.e. where the second argument is the maximum element of P(S). In this case, the value of the Möbius function is given by:

Joint cumulants
For random variables X 1 , . . . , X r with finite moments living in the same probability space (with expectation denoted E), we define their joint cumulant (or mixed cumulant) as κ(X 1 , . . . , X r ) = [t 1 . . . t r ] log E exp(t 1 X 1 + · · · + t r X r ) . analytic around t 1 = · · · = t r = 0. If all random variables X 1 , · · · , X r are equal to the same variable X, we denote κ r (X) = κ(X, . . . , X) and this is the usual cumulant of a single random variable. . Joint cumulants have a long history in statistics and theoretical physics and it is rather hard to give a reference for their first appearance. Their most useful properties are summarized in [46,Proposition 6.16] -see also [50].
• It is a symmetric multilinear functional.
• If the set of variables {X 1 , . . . , X r } can be split into two mutually independent sets of variables, then the joint cumulant vanishes; • Cumulants can be expressed in terms of joint moments and vice-versa, as follows: Hence, knowing all joint cumulants amounts to knowing all joint moments.
Because of the symmetry, it is natural to consider joint cumulants of multisets of random variables.
The second property above has a converse. Since we have not been able to find it in the literature, we provide it with a proof.

Multisets
As mentioned above it is natural to consider joint cumulants of multisets of random variables, so let us fix some terminology.
For a multiset B, we denote by |B| the total number of elements (i.e. counted with multiplicities) and #(B) the number of distinct elements. Furthermore B 1 B 2 is by definition the disjoint union of the multisets B 1 and B 2 , i.e. the multiplicity of an element in B 1 B 2 is the sum of its multiplicity in B 1 and B 2 .
The set of multisets of elements of A is denoted by MSet(A), while MSet ≤m (A) is the subset of multisets with |B| ≤ m.

Graphs
where V is the vertex set and E the edge set. Elements of E are 2-element subsets of V (our graphs are simple loopless graphs). All graphs considered in this paper are finite.
We denote by CC(L) the partition of the vertex set of a graph L into connected components. Consequently, | CC(L)| is the number of connected components of L.
Two types of graphs appear here: dependency graphs throughout the paper and random graphs in Section 7. The former are tools to prove central limit theorems, while the latter are the objects of study, and they should not be confused. Following [46], we use the letter L for dependency graphs, and we reserve the more classical G for random graphs.
If B is a multiset of vertices of L, we can consider the graph L[B] induced by L on B and defined as follows: the vertices of L[B] correspond to elements of B (if B contains an element with multiplicity m, then m vertices correspond to this element), and there is an edge between two vertices if the corresponding vertices of L are equal or connected by an edge in L.
Finally we say that two subsets (or multisets) A 1 and A 2 of vertices of L are disconnected if they are disjoint and there is no edge in L that has an extremity in A 1 and an extremity in A 2 .

Weighted graphs
An edge-weighted graphL, or weighted graph for short, is a graph L in which each edge e is assigned a weight w e . In this article we restrict ourselves to weights w e with w e ∈ [0, 1]. Edges not in the graph can be thought of as edges of weight 0, all our definitions are consistent with this convention.
The induced graph of a weighted graphL on a multiset B has a natural weighted graph structure. We put on each edge ofL[B] the weight of the corresponding edge iñ L; if the edge connects two copies of the same vertex ofL, there is no corresponding edge inL and we put weight 1.

Asymptotic notation
We use the symbol u n v n (resp. u n v n , u n v n ) to say that lim n→∞ un vn is a nonzero constant (resp. 0, +∞) as n → ∞. In particular, v n should be nonzero for n sufficiently large.

Spanning trees
As we shall see in the next section, our definition of weighted dependency graphs involves the maximal weight of a spanning tree of a given weighted graph. In this section, we recall this notion and prove a few lemmas that we use later in the paper.

Maximum spanning tree
More generally, we say that a subset E of E forms a spanning subgraph of L if (V, E ) is connected.
IfL is a weighted graph, we say that the weight w(T ) of a spanning tree ofL is the product of the weights of the edges in T . The maximum weight of a spanning tree ofL is denoted M L . This parameter is central in our work.
IfL is disconnected, we set M L = 0 for convenience.

Example 3.2.
An easy case which appears a few times in the paper is the case of a connected graphL with r vertices and all weights equal to the same value, say ε. Then all spanning trees have weight ε r−1 so that M L = ε r−1 .
For a less trivial example, consider the weighted graph of Fig. 1. The red edges form a spanning tree of weight ε 2 · (ε) 2 = ε 4 . It is easy to check that there is no spanning trees with bigger weight so that M L = ε 4 in this case. Finding a spanning tree with maximum weight is a well-studied question in the algorithmics literature: see [19,Chapter 23] (the usual convention is to define the weight of a spanning tree as the sum of the weights of its edges and to look for a spanning tree of minimal weight, but this is of course equivalent, up to replacing weights with the logarithms of their inverses).

Prim's algorithm and the reordering lemma
There are several classical algorithms to find a spanning tree with maximum weight. We describe here Prim's algorithm, which is useful for our work.
AssumeL is a connected weighted graph. Choose arbitrarily a vertex v in the graph EJP 23 (2018), paper 93. and set initially A = {v} and T = ∅. We iterate the following procedure: find the edge with maximum weight connecting a vertex v in A with a vertex w outside A (sinceL is connected, there is at least one such edge), then add w to A and {v, w} to T . It is easy to check that at each step, T is always a tree with vertex set A and a general result ensures that at each step, T is included in a spanning tree of maximum weight ofL [19,Corollary 23.2]. Note also that the weight of the edge {v, w} is equal to W ({w}, A). We stop the iteration when A is the vertex set ofL, and T is then a spanning tree of maximum weight.
The correctness of this algorithm implies the following lemma.  Proof. Adding edges of weight 0 to the graph does not change any side of the above equality, so we can assume thatL is connected.
We apply Prim's algorithm, as described above, and we denote vertices ofL by β 1 , . . . , β r in the order in which they are added to the set A. Then W {β j+1 }; {β 1 , · · · , β j } is the weight of the edge added in the j-th iteration of the algorithm. Therefore the LHS of Eq. (3.1) is the weight of the spanning tree constructed by Prim's algorithm. Since this is a spanning tree of maximum weight, this weight is M L .

Remark 3.4.
In the special case whereL has only edges of weight 1, the lemma states the following: ifL is connected, there exists an ordering (β 1 , . . . , β r ) of its vertices such that each β is in the neighbourhood of (β 1 , . . . , β −1 ). This easy particular case is used in the dependency graph literature, but with weighted dependency graphs, we need Theorem 3.3 in its full generality.

Inequalities on maximal weights of spanning trees
We now state some inequalities on maximal weights, that are useful in the sequel. We first introduce some notation.
Finally, edges of weight 1 will play a somewhat special role in weighted dependency graphs. We therefore denoteL 1 the subgraph formed by edges with weight 1.  that the edge set S forms a spanning subgraph ofL. Therefore we can extract from it a spanning tree T . Then But, since T is a spanning tree ofL, we have w(T ) ≤ M L , which completes the proof.
Our next lemma uses the notion of m-th power of a weighted graph, which was defined in Section 2.5.
Lemma 3.6. Let I 1 , · · · , I r be multisets of vertices of a weighted graphL. We consider a partition π of I 1 · · · I r such that π ∨ I 1 , · · · , I r = I 1 · · · I r . Then we have whereL m is the m-th power ofL.
Proof. The multiset B := I 1 · · · I r can be explicitly represented by Let π i be a part of π and consider a spanning tree T i of minimum weight ofL[π i ]. Edges of T i are pairs {(v, j), (v , j )}. For such an edge e with j = j , we can consider the corresponding edgeē = {I j , I j } inL m . By definition of power graphs,ē has at least the same weight as e. Doing so for each edge of T i with j = j , we get a set S i of edges iñ L m such that As in the proof of the previous lemma, we now consider the union S of the S i 's. The condition (3.2) ensures that S forms a spanning subgraph ofL m [{I 1 , · · · , I r }] and hence we can extract from it a spanning tree T . Then which concludes the proof.

Usual dependency graphs
Consider a family of random variables {Y α , α ∈ A}. A dependency graph for this family is an encoding of the dependency relations between the variables Y α in a graph structure. We take here the definition given by Janson [42]; see also papers of Malyshev [51] and Petrovskaya/Leontovich [58] for earlier appearances of the notion with slightly different names. 2. if A 1 and A 2 are disconnected subsets in L, then {Y α , α ∈ A 1 } and {Y α , α ∈ A 2 } are independent.
A trivial example is that any family of independent variables {Y α , α ∈ A} admits the graph with vertex-set A and no edges as a dependency graph. A more interesting example is the following.

Example 4.2.
Consider the Erdős-Rényi random graph model G(n, p n ), that is G has vertex set [n] := {1, . . . , n} and it has an edge between i and j with probability p n , all these events being independent from each other. Let A be the set of 3-element subsets of [n] and if α = {i, j, k} ∈ A, let Y α be the indicator function of the event "the graph G contains the triangle with vertices i, j and k".
Let L be the graph with vertex set A and the following edge set: α and β are linked if |α ∩ β| = 2 (that is, if the corresponding triangles share an edge in G). Then L is a dependency graph for the family {Y α , α ∈ A}.
Note also that the complete graph on A is a dependency graph for any family of variables indexed by A. In particular, given a family of variables, it may admit several dependency graphs. The fewer edges a dependency graph has, the more information it encodes and, thus, the more interesting it is. It would be tempting to consider the dependency graph with fewest edges, but such a graph is not always uniquely defined.
As said in the introduction, dependency graphs are a valuable toolbox to prove central limit theorems for sums of partially dependent variables. Denote N (0, 1) a standard normal random variable. The following theorem is due to Janson [42,Theorem 2]. Theorem 4.3 (Janson's normality criterion). Suppose that, for each n, {Y n,i , 1 ≤ i ≤ N n } is a family of bounded random variables; |Y n,i | < M n a.s. Suppose further that L n is a dependency graph for this family and let ∆ n − 1 be the maximal degree of L n . Let X n = Nn i=1 Y n,i and σ 2 n = Var(X n ). Assume that there exists an integer s such that   Example 4.4. We use the same model and notation as in Theorem 4.2. Assume to simplify that p n is bounded away from 1. Then one has N n n 3 , ∆ n n and M n = 1. An easy computation -see, e.g., [46,Lemma 3.5] -gives σ 2 n max(n 3 p 3 n , n 4 p 5 n ). Thus the hypothesis (4.1) in Janson's theorem is fulfilled if p n n −1/3+ε for some ε > 0. When this holds, Theorem 4.3 implies that, after rescaling, the number X n of triangles in G(n, p n ) is asymptotically normal. The latter is in fact true under the less restrictive hypothesis p n n −1 , as proved by Ruciński [60], but this cannot be obtained from To finish this section, let us mention a stonger normality criterion, due to Mikhailov [53]. Roughly, he replaces the number of vertices N n and the degree ∆ n by some quantities defined using conditional expectations of variables. If (4.1) holds with these new quantities, then we can also conclude that one has Gaussian fluctuations. His theorem has a larger range of applications than Janson's: e.g., for triangles in random graphs, it proves asymptotic normality in its whole range of validity, that is if p n n −1 and 1 − p n n −2 ; see [46,Example 6.19].

Definition of weighted dependency graphs
The goal of the present article is to relax the independence hypothesis in the definition of dependency graphs. As we shall see in the next sections, this enables to include many more examples.
As above, {Y α , α ∈ A} is a family of random variables defined on the same probability space. We suggest the following definition. Definition 4.5. Let C = (C 1 , C 2 , · · · ) be a sequence of positive real numbers. Let Ψ be a function on multisets of elements of A.
A weighted graphL is a (Ψ, C) weighted dependency graph for {Y α , α ∈ A} if, for any multiset B = {α 1 , . . . , α r } of elements of A, one has Our definition implies in particular that all cumulants, or equivalently all moments of the Y α are finite. This might seem restrictive but in most applications, the Y α are Bernoulli random variables. Note also that we already have this restriction in Janson's and Mikhailov's normality criteria.
andL the complete graph on A with weight 1 on each edge. ThenL is trivially a (Ψ, C) weighted dependency graph for {Y α ; α ∈ A}. But this type of examples do not yield interesting results.
We are interested in constructing examples, where: • C r may depend on r, but is constant along a sequence of weighted dependency graphs; • Ψ has a rather simple form, such as p #(B) for some p (the case Ψ ≡ 1 gives a good intuition); • Edge weights also have a very simple expression and most of them tend to 0 along a sequence of weighted dependency graphs; Intuitively, Eq.  Consider the Erdős-Rényi random graph model G(n, m n ), i.e. G is a graph with vertex set [n] and an edge set E of size m n , chosen uniformly at random among all possible edge set of size m n .
If we set p n = m n / n 2 , then each edge {i, j} belongs to E with probability p n , but the corresponding events are not independent anymore. Indeed, since the total number of edges is fixed, if we know that one given edge is in G, it is less likely that another given edge is also in G. As in Theorem 4.2, let A be the set of 3-element subsets of [n] and if α = {i, j, k} ∈ A, let Y α be the indicator function of the event "the graph G contains the triangle with EJP 23 (2018), paper 93. vertices i, j and k". Since presences of edges are no longer independent event, neither are presences of edge-disjoint triangles and the only dependency graph of this family in the classical sense is the complete graph on A.
Consider the complete graphL with vertex set A and weights on the edges determined as follows: • If |α ∩ β| ≥ 2 (that is, if the corresponding triangles share an edge in G), then the edge {α, β} inL has weight 1; • If |α ∩ β| ≤ 1, then the edge {α, β} inL has weight 1/m n .
We will prove in Section 7 thatL is a (Ψ n , C) weighted dependency graph with Ψ n (B) = p e(B) n where e(B) is the total number of distinct edges in B (recall that B is here a multiset of triangles) and the sequence C = (C r ) does not depend on n.
Intuitively, this means that presences of edge-disjoint triangles are almost independent events. Moreover, the weight 1/m n quantifies this almost-independence. This is rather logical: the bigger m n is, the less knowing that a given edge is in G influences the probability that another given edge is also in G (and hence the same holds for presence of edge-disjoint triangles).

A criterion for asymptotic normality
LetL be a (Ψ, C) weighted dependency graph for a family of variables {Y α , α ∈ A}. We introduce the following parameters (for ≥ 1)

Remark 4.9.
Let us consider the special case where Ψ is the constant function equal to 1. One has • R = |A|, which is the number of vertices of L; • using the easy observation note that ∆ − 1 is the maximal weighted degree inL (the weighted degree of a vertex is β∈α,β =α w {β,α} ; the condition β = α in the summation index explains the shift by −1). In particular, each T has the same order of magnitude as ∆.
In general, R and T should be thought of as deformations of the number of vertices and the maximal weighted degree. Considering R and T rather than simply |A| and ∆ leads to a more general normality criterion, in a similar way that Mikhailov's criterion extends Janson's.
The following lemma bounds cumulants in terms of the two above defined quantities.  LetL be a (Ψ, C) weighted dependency graph for a family of variables {Y α , α ∈ A}. Define R and T (for ≥ 1) as above. Then, for r ≥ 1, Proof. By multilinearity Applying the triangular inequality and Eq. (4.3), ..,αr is invariant by permutation of the indices).
Together with Eq. (4.8), this ends the proof of the lemma.
We can now give an asymptotic normality criterion, using weighted dependency graphs.
Theorem 4.11. Suppose that, for each n, {Y n,i , 1 ≤ i ≤ N n } is a family of random variables with finite moments defined on the same probability space. For each n, let Ψ n a function on multisets of elements of [N n ]. We also fix a sequence C = (C r ) r≥1 , not depending on n.
Let X n = Nn i=1 Y n,i and σ 2 n = Var(X n ). Assume that there exist numbers D r and Q n and an integer s ≥ 3 such that Rn Qn 1/s Qn σn → 0 as n → ∞, (4.10) then, in distribution, Proof. From Theorem 4.10, we know that, for r ≥ 2, (4.12) Setting C r = C r r! D 1 · · · D r−1 and X n = (X n − EX n )/σ n , we get that for r ≥ s, Eq. (4.12) for r = 2 ensures that the last factor is bounded while the middle factor tends to 0 from our hypothesis (4.10). We conclude that κ r ( X n ) tends to 0 for r ≥ s. The convergence towards a normal law then follows from [42, Theorem 1].

Remark 4.12.
Continuing Theorem 4.9, when Ψ is constant equal to 1, one can choose D r = r and Q n = ∆ n , where ∆ n is the maximal weighted degree inL n . Then hypothesis Eq. (4.10) says that the quotient ∆n σn tends to 0 reasonably fast (faster than some power of Rn ∆n ). Roughly, one has a central limit theorem as soon as the weighted degree is smaller than the standard deviation. (In particular, except in pathological cases, the standard deviation should tend to infinity.)

Remark 4.13.
In most examples of application, R n is immediate to evaluate, while a good upper bound for T ,n and thus a sequence Q n as in the theorem can be found by a relatively easy combinatorial case analysis. The most difficult part in applying the theorem is to find a lower bound for σ n (Theorem 4.10 gives a usually sharp upper bound).
In this sense, the weighted dependency graph structure, once uncovered, reduces the central limit theorem to a variance estimation.  Remark 4.15. Except Theorem 3.3 -see Theorem 3.4 -, the proof of our normality criterion is largely inspired from the case of usual dependency graphs. The difficulty here was to find a good definition of weighted dependency graphs, not to adapt the theorem to this new setting.

Multidimensional convergence and bounds for joint cumulants
Bounds on cumulants, and thus weighted dependency graphs, can also be used to obtain the convergence of a random vector towards a multidimensional Gaussian vector or the convergence of a random function towards a Gaussian process.
To avoid a heavily technical theorem, we do not state a general result, but refer the reader to examples in Sections 8.2, 8.3 and 9.3. We nevertheless give here a useful bound on joint cumulants, whose proof is a straightforward adaptation of the one of Theorem 4.10.

Lemma 4.16.
LetL be a (Ψ, C) be a weighted dependency graph for a family of variables {Y α , α ∈ A}. Consider subsets A 1 , · · · , A r of A. Then, with the notation of the previous section,

Remark 4.17.
It is also possible in the above bound to replace R by and/or the product T 1 · · · T r−1 by T 2 ≤r−1 · · · T r ≤r−1 , where The maximum over in the equation above comes from the reordering argument, that is the use of Theorem 3.3 in the proof of Theorem 4.10. We do not know what is the index of the element taken from A i in the reordered sequence (β 1 , · · · , β r ). The only thing we can ensure is that β 1 = α 1 (since we can choose arbitrarily the first vertex in Prim's algorithm; see the proof of Theorem 3.3), which allows us to use R 1 instead of R.
This slight improvement of the bound is not used in the applications given in this paper. It could however be useful if we wanted to prove, say, a multivariate convergence result for numbers of copies of subgraphs of different sizes in G(n, m); see Section 7 for the corresponding univariate statement.
Note that, with this improvement, the bound given for the joint cumulant is not symmetric in A 1 ,. . . ,A r , while the quantity to bound obviously is.

Comparison between usual and weighted dependency graphs
In this Section, we compare at a formal level the notions of weighted dependency graphs and of usual dependency graphs. The results of this Section are not needed in the rest of the paper and it can safely be skipped.
The  For the next proposition, we need to introduce some terminology. Let {Y α , α ∈ A} be a family of random variables defined on the same probability space. We say that a function Ψ on multisets of A dominates joint moments, if for any multiset B and multiset partition π of B: Examples include: • Assume that the variables {Y α , α ∈ A} are uniformly bounded by a constant M , i.e. , for any α, one has |Y α | ≤ M a.s. Then for any multiset B and multiset partition π of B, one has In other terms, the function Ψ defined by Ψ(B) = M |B| dominates joint moments.
• More generally, a repetitive use of Hölder inequality, together with the monotonicity of the r-th norm yields the following: for any multiset B and multiset partition π of B, one has In other terms, the function Ψ defined by Ψ(B) = α∈B E |Y α | |B| 1/|B| dominates joint moments.
• As a more concrete example, consider triangles in random graphs, as in Theo-  Let {Y α , α ∈ A} be a family of random variables defined on the same probability space, with a dependency graph L. Set C r = (r!) 2 and consider a function Ψ on multisets of A that dominates joint moments. Consider also the weighted graphL, obtained by assigning weight 1 to each edge.
Proof. We have to check that the inequality (4.3) holds for any multiset B. Consider two cases: this implies that the set of variables {Y α , α ∈ A} can be split into two mutually independent sets of variables and κ(Y α ; α ∈ B) = 0, as wanted.
• Otherwise,L contains at least one spanning tree, and since all edges have weight 1, all spanning trees have weight 1. Thus M L [B] = 1 and we should prove: Conversely, the unweighted version of a (Ψ, C) weighted dependency graph is also a usual dependency graph, as soon as each variable Y α is determined by its moments, as shown by the following proposition.
Proposition 4.19. Let {Y α , α ∈ A} be a family of random variables with finite moments defined on the same probability space, such that each Y α is determined by its moments. Let C and Ψ be arbitrary and assume that we have a (Ψ, C) weighted dependency graph L for the family {Y α , α ∈ A}. Denote L the unweighted version ofL. Then L is a usual dependency graph for the family {Y α , α ∈ A}.
Proof. Let A 1 and A 2 be disconnected subsets of A in L. We should prove that {Y α , α ∈ Let B be a multiset of elements of A 1 A 2 that contains elements in both A 1 and A 2 .
Then the induced weighted graphL[B] has at least two connected component because We can now argue that Theorem 4.11 contains Janson's normality criterion. For each Define R n and T ,n as in Section 4.3. If ∆ n − 1 is the maximal degree in L n , then R n = M n N n and T ,n ≤ M n (∆ n ) forL n . In particular we can choose Q n = M n ∆ n and condition (4.10) in our normality criterion reduces to (4.1) in Janson's.
On the other hand our theorem does not contain formally Mikhailov normality criterion [53]. But it contains classical examples. Again, one should see the dependency graph in each example as a weighted dependency graph with weight 1 on each edge and choose Ψ as follows: • in the example at the end of Mikhailov's paper [53] In each case, we leave details to the reader.

Finding weighted dependency graphs
In general, the main difficulty in order to apply Theorem 4.11 is to check that L n is indeed a weighted dependency graph for the family {Y n,i , 1 ≤ i ≤ N n } of random variables. Indeed, one should establish the bound (4.3), which may be quite cumbersome. In this section, we give a few lemmas and propositions that help in this task in different contexts.

An alternate formulation
In this section, we will see that instead of (4.3), one can show a slightly different set of inequalities. Intuitively, this set of inequalities puts an emphasis on edges of weight 1, which, in most applications, relate incompatible events.
We require an extra assumption on the function Ψ. Definition 5.1. Let A be a set and Ψ a function on multisets of elements of A. Then Ψ is called super-multiplicative if, for any multisets B 1 and B 2 , Proposition 5.2. Let {Y α , α ∈ A} be a family of random variables defined on the same probability space. Consider a weighted graphL with vertex set A, a super-multiplicative function Ψ on multisets of elements of A and a sequence D = (D r ) r≥1 .
Assume that, for any multiset B of elements of A, one has ThenL is a (Ψ, C) weighted dependency graph for the family {Y α , α ∈ A}, for some sequence C that depends only on D.
Proof. We have to check that the inequality (4.3) holds for any multiset B. We proceed by induction on the size r of the multiset B.
Consider the case r = 1. From Eq. (5.1), we know that, for any α ∈ A, one has: so that, if we set C 1 = D 1 , Eq.
where the sum runs over multiset partitions π = {π 1 , · · · , π s } of B such that π ∨ {B 1 , · · · , B } = {B}; we denote this condition by π ⊥ B (for a discussion on multiset partitions, see Theorem 5.3 at the end of the proof). We isolate the term corresponding to π = {B} on the right hand-side and rewrites this as: Moreover, the induction hypothesis asserts that if π i is a strict subset of B, one has If π is a set partition of B different from {B}, all its parts are strict subsets of B and we From the super-multiplicativity, the middle factor is at most Ψ(B). Moreover, under the hypothesis π ⊥ B, the last factor is at most M L [B] , as proved in Theorem 3.5 (for the graphL[B] with ∆ i = π i ). Finally, from Eqs. (5.2) to (5.4), we get: This ends the proof of (4.3) by setting observe that the right-hand side depends indeed only on the size r of B, and not on B itself. With this convention, Leonov and Shiryaev formula clearly holds with cumulants of multisets. Indeed the case with equal variables can be obtained from specialization of the generic case and this does not change the summation set.
Remark 5.4. We will see in Section 5.3 a converse of Theorem 5.2: for any weighted dependency graph with a super-multiplicative function Ψ, Eq. (5.1) holds. In fact, a more general bound for cumulants of products of the Y α holds; see Eq. (5.13) and Theorem 5.12.
Remark 5.5. Theorem 5.2 is in particularly useful whenL has no edges of weight 1 and Y α are Bernoulli variables. In this case, each connected component where the last equality comes from the assumption that Y βi is a Bernoulli variable. Therefore, to prove thatL is a (Ψ, C) weighted dependency graph for the family {Y α , α ∈ A}, it is enough to bound κ(Y α , α ∈ B), for subsets B of A (and not all multisets).

Small cumulants and quasi-factorization
Let ≥ 1 and u = (u ∆ ) ∆⊆[ ] be a family of real numbers indexed by subsets of [ ]. We shall always assume u ∅ = 0. Typically, u ∆ are the joint moments E j∈∆ Y j of a family (Y 1 , · · · , Y ) of random variables, but it is convenient not to assume this.
If u is the family of joint moments of (Y 1 , · · · , Y ), then u ∅ = 1 and κ ∆ (u) is simply the joint cumulant of the subfamily {Y j , j ∈ ∆}. We say that u (n) n≥1 has theL n small cumulant property if, for any subset ∆ ⊆ [ ] of size at least 2, one has Note that Eq. (5.6) is similar to Eq. (4.3), so that we are interested in establishing the small cumulant property. We will see that it is equivalent to another property, that we call quasi-factorization property and is in some cases easier to establish.
We now assume that, for any ∆ ⊆ [ ], one has u ∆ = 0. Then we also introduce the auxiliary quantity P ∆ (u) implicitly defined by the property: for any subset ∆ ⊆ [ ], In particular, we always have P ∅ (u) = 1 and P {i} (u) = u {i} /u ∅ . Using Möbius inversion on the boolean lattice, we have explicitly: for any subset ∆ ⊆ [ ] with ∆ = ∅,  The following proposition, generalizing [33, Lemma 2.2], is be used repeatedly in this article. It says that the two above properties are equivalent.
n≥n0 has theL n quasi-factorization property, then it also has theL n small cumulant property. Assume moreover that the maximal weight ofL n tends to 0. Then the converse also holds: u (n) n≥n0 has theL n small cumulant property if and only if it has theL n quasi-factorization property.
Proof. The proof is an adaptation of the one of [33, Lemma 2.2].
We first assume that u ∅ = 1 and u {i} = 1 for all i in [ ], so that the product in Eq. (5.7) can be taken over subsets δ with |δ| ≥ 2. n≥n0 , indexed by subsets of ∆. Fix a set partition π ∈ P( ). For a block B of π, one has, expanding the product in where the sum runs over all finite sets of (distinct) subsets of B of size at least 2 (in particular, the size m of the set is not fixed). Therefore, where the sum runs over all finite sets of (distinct) subsets of [ ] of size at least 2 such that each ∆ i is contained in a block of π. In other terms, for each i ∈ [m], π must be coarser than the partition Π(∆ i ), which, by definition, has ∆ i and singletons as blocks.
In other words, all non-zero summands in (5.9) are O(M L n ). Since the summation index set in (5.9) does not depend on n, we conclude that κ [ ] (u (n) ) = O M L n , which ends the proof of the first implication.
Let us now consider the converse statement. We proceed by induction on and we assume that, for all smaller than a given ≥ 2, theL n small cumulant property implies theL n quasi factorization property.
Back to the proof, we have to establish that Thanks to the estimates above for u (n) ∆ , this is equivalent to the fact that Define now an auxiliary family (v (n) ) n≥n0 defined by: Clearly, P ∆ (v) = P ∆ (u) for ∆ [ ] and P [ ] (v) = 1, so that the family v has theL n quasi-factorization property. Thus, using the first part of the proof, it also has theL n small cumulant property. In particular: But, by hypothesis As v ∆ = u ∆ for ∆ [ ], one has: which proves (5.10).
The general case follows directly from the case u ∅ = u {i} = 1 by considering the family w (n) Indeed, for |∆| ≥ 2, When the maximal weight inL n tends to zero, we write " u (n) n≥n0 has theL n SC/QF property" (since the two properties are equivalent in this case). Furthermore, wheñ L n is a complete graph with weight ε n on each edge, we say that " u (n) n≥n0 has the ε n SC/QF property" (instead of the "L n SC/QF property"). In the following lemma, we collect a few easy facts on the SC/QF property. 2. If, for each n, (u (n) ) is multiplicative, that is u {i} , then (u (n) ) n≥1 has the 0-SC/QF property.
3. Let (L n ) n≥n0 and (K n ) n≥n0 two sequences of weighted graphs with maximal weight tending to 0 and assume that the weight of {i, j} inL n is always smaller than or equal to the corresponding weight inK n .
If a sequence (u (n) ) n≥1 has theL n -SC/QF property, then it also has theK n -SC/QF property.
4. Consider two sequences u (n) n≥n0 and v (n) n≥n0 , both with theL n SC/QF property. Then their entry-wise product u (n) ·v (n) and their entry-wise quotient u (n) /v (n) both have theL n SC/QF property.
5. Moreover, if u ∅ = v ∅ , then any linear combination λu (n) + µv (n) with only non-zero terms for n sufficiently large also has theL n -SC/QF property.
We end this section by a family of examples, for which the SC/QF property holds.
Let (X n ) n≥1 be a sequence of integers such that X n ≥ 1 (for all n ≥ 1) and lim n→∞ X n = +∞. Fix ≥ 1 and nonnegative integers a 1 , · · · , a . We consider the factorial sequences u (n) ∆ (a 1 , · · · , a ) = X n − i∈∆ a i ! For n sufficiently large, say n ≥ n 0 , the integer X n − i=1 a i is non-negative and the truncated family u (n) (a 1 , · · · , a ) n≥n0 is well-defined. Proposition 5.10. We use the notation above and set ε n = 1/X n . Then the family u (n) (a 1 , · · · , a ) n≥n0 has the ε n SC/QF property.
The proof is a combination of easy but technical inductions. It is given in Appendix A. Combining this result with Theorem 5.9 (item 4), we get that products and quotients of these factorial sequences have the SC/QF property. Therefore, if the joint moments of some random variables are of this form, we get bounds on their joint cumulants without any computation. This is used in Sections 6 to 8.

Powers of weighted dependency graphs
The propositions and lemmas of the two previous sections help to establish that a family of random variables admits a given weighted dependency graph. In this section, we shall see that when we have a weighted dependency graph for a family {Y α , α ∈ A}, we can automatically construct a new one for monomials Y I = α∈I Y α in the original variables Y α (here, the index I is a multiset of elements of A). Ψ({I 1 , · · · , I r }) = Ψ(I 1 · · · I r ) (5.11) and D m,r depends only on m, r, C 1 , . . . , C mr .
On the other hand, from Theorem 3.6, when (5.12) is satisfied, one has Bringing everything together, we have , · · · , I r }] Ψ(I 1 · · · I r ).  Observe that, by definition for each integer i in [2n], there is a unique j = i such {i, j} is in H. We call j the partner of i.
We are interested in the uniform model on pair partitions of [2n]. A uniform random pair partition of [2n] can be constructed as follows. Take i 1 arbitrarily (e.g. i 1 = 1) and choose its partner j 1 uniformly at random among numbers different for i 1 (i.e. each number different from i 1 is taken with probability 1/(2n − 1)); then take i 2 arbitrarily different from i 1 and j 1 and choose its partner j 2 uniformly at random among numbers different from i 1 , j 1 and i 2 (each such number is taken with probability 1/(2n − 3)); and so on, until all pairs are created. In particular, given distinct numbers i 1 , · · · , i t and j 1 , · · · , j t , the probability that all pairs ({i s , j s }) s≤t belong to a uniform random pair This simple observation is the key to find a weighted dependency graph associated to uniform random pair partitions.
To illustrate the use of this weighted dependency graph, we study a classical statistics on pair partitions, called crossing; see, e.g., [17] and references therein for enumerative results on this statistics.

A weighted dependency graphs for random pair partitions
Let A n be the set of two element subsets of [2n]. For {i, j} ∈ A n , we define a random variable Y i,j such that Y i,j = 1 if {i, j} belongs to the random pair partition H n , and 0 otherwise. Proposition 6.3. Consider the weighted graphL on vertex set A n defined as follows: • if two pairs α 1 and α 2 in A n have an element in common, then they are linked inL by an edge of weight 1; • if two pairs α 1 and α 2 in A n are disjoint, then they are linked inL by an edge of weight 1/n. Proof. Clearly Ψ n is super-multiplicative. From Theorem 5.2, it is enough to prove that, for any multiset B of elements of A n of size r, one has , and for some D r that does not depend on n. But, if α 1 and α 2 are different and linked by an edge of weight 1, the product Y α1 Y α2 is identically equal to 0. Therefore the left-hand side of Eq. (6.1) is 0 unless each B i contains only one element (possibly with multiplicity m). Since the Y α take value in {0, 1}, we have Y m α = Y α and the multiplicity does not play any role. Finally, it is enough to prove that for disjoint pairs α 1 , · · · , α r in A n , we have: Assume n ≥ r, otherwise the above statement is vacuous. From the discussion in Section 6.1, we have that, for any subset ∆ of [r], Note that it does not depend on α 1 , . . . , α r . From Theorem 5.9 (items 1, 2 and 3) and But κ [r] (M (n) ) = κ(Y α1 , · · · , Y αr ) and, for each i, one has M (n) {i} = 1 2n−1 , so that Eq. (6.2) is proved.

Asymptotic normality of the number of crossings
Let A n be the set of quadruples (i, j, k, l) of elements of [2n] with i < j < k < l. For (i, j, k, l) in A n , we set Y i,j,k,l = Y i,k Y j,l . Equivalently, Y i,j,k,l = 1 if (i, j, k, l) is a crossing in the random pair partition H n and 0 otherwise. We also consider which is the number of crossings in the random pair partition H n . We will prove the asymptotic normality of Cr n , using the weighted dependency graph of the previous section.
First, we use Theorem 5.11 to find a weighted dependency graph for the variables Y i,j,k,l . For a multiset B = {Y it,jt,kt,lt , 1 ≤ t ≤ |B|}, we define pairs(B) as This is the number of distinct Y variables that appear in the (Y α ) α∈B . EJP 23 (2018), paper 93. Proposition 6.4. LetL be the complete graph on A n with the following weights: • if two quadruples α 1 and α 2 have a non-empty intersection, they are linked by an edge of weight 1; • if they are disjoint, then they are linked by an edge of weight 1/n.
ThenL is a (Ψ n , C ) weighted dependency graph for the family {Y i,j,k,l , (i, j, k, l) ∈ A n }, where Ψ n (B) = n − pairs(B) and C = (C r ) r≥1 is a sequence that does not depend on n.
Proof. This is a direct application of Theorem 5.11 to the weighted dependency graph given in Theorem 6.3.
The number of such terms is obviously bounded by O(n 4 ) (which bounds the total number of terms in A n ) so that the total contribution of this case is O(n).
Finally, we see that, for any α 1 , · · · , α in A n , the quantity (6.4) is O(n), with a constant in O symbol depending on , but not on α 1 , · · · , α . Thus T ,n is O(n) and we can choose Q n = n in Theorem 4.11. The variance of Cr n is computed in Appendix B.1 and we see that σ n n 3/2 . Therefore (4.10) is fulfilled for s = 3 and we infer from Theorem 4.11 the asymptotic normality of Cr n .

The model
For each n, let m n be an integer between 0 and n 2 . As in Theorem 4.7, we consider the Erdős-Rényi random graph model G(n, m n ), i.e. G is a graph with vertex set V := [n] and an edge set E of size m n , chosen uniformly at random among all possible edge sets of size m n . Set p n = m n / n 2 . For any 2-element subset {i, j} of V , we define a random variable Y i,j such that Y i,j = 1 if the edge {i, j} belongs to the random graph G, and 0 otherwise. Clear, Y i,j = 1 with probability p n . However, unlike in G(n, p n ), these random variables are not independent. We can nevertheless compute their joint moments: if α 1 ,. . . ,α r are distinct 2-element subsets of V , then where E n = n 2 . Indeed, the numerator is the number of graphs with vertex set [n] and m n edges containing α 1 ,. . . ,α r , while the denominator is the total number of graphs with vertex set [n] and m n edges. This simple explicit formula for joint moments is the starting point to find a weighted dependency graph in G(n, m), as we shall do in Section 7.2.
We then use this dependency graph structure to give a new proof of Janson's central limit theorem for subgraph count statistics in G(n, m n ); see Section 7.3.

A weighted dependency graph in G(n, m).
Let A n be the set of two element subsets of [n]. Then the complete graph on A n with weight ε n on each edge is a (Ψ n , C) weighted dependency graph for the family {Y i,j , {i, j} ∈ A n }, where C = (C r ) r≥1 is a sequence that does not depend on n.
Proof. Clearly Ψ n is super-multiplicative. From Theorem 5.2 -see also Theorem 5.5 -, it is enough to prove that, for any distinct α 1 , · · · , α r , one has |κ(Y α1 , · · · , Y αr )| ≤ C r 1 mn r−1 p r n , for some C r that does not depend on n.
Recall from the previous section that this has an explicit expression: Note that it does not depend on α 1 , · · · , α r . Moreover, as soon as m n ≥ r, which happens for n big enough, say n ≥ n 0 , one can write But κ [r] (M (n) ) = κ(Y α1 , · · · , Y αr ) and, for each i, one has M (n) {i} = mn En = p n , so that Eq. (7.1) is proved.

A CLT for subgraph counts in G(n, m n )
Fix some graph H with at least one edge. Let A H n be the set of subgraphs H of the complete graph K n on vertex set [n] that are isomorphic to H: there are n(n − 1) · · · (n − v H + 1)/ Aut(H) such subgraphs, where Aut(H) is the number of automorphisms of H.
As before, let G be a random graph with the distribution of the model G(n, m n ). For H in A H n , we denote Then the random variable counts the number of subgraphs of G that are isomorphic to H. This is called the subgraph count statistics and is a classical object of study in random graph theory -see, e.g., [46,Sections 3 and 6]. The goal of this section is to prove the asymptotic normality of this statistics, using weighted dependency graphs.
We first observe that the above-defined family {Y H , H ∈ A n } admits a weighted dependency graph. To do that, if B = {H 1 , · · · , H r } is a multiset of elements of A H n , we define e(B) as the total number of edges in this multiset, that is: Consider the complete graph with vertex set A H n and assign weights on edges as follows: • if two copies H 1 and H 2 of H have an edge in common (as subgraphs of K n ), then the edge (H 1 , H 2 ) gets weight 1; • otherwise, the edge (H 1 , H 2 ) gets weight 1/m n .
We denote the resulting weighted graphL H .
ThenL H is a (Ψ n , C) weighted dependency graph for the family {Y H , H ∈ A H n }, for some sequence C = (C r ) r≥1 that does not depend on n (but depends on H).
Proof. Indeed,L H is a subgraph of the e H -th power of the weighted dependency graph L given in Theorem 7.1 -see Theorem 5.11.
We use the notation of Theorem 4.11. Then we have Estimates of T ,n and the variance Var(X H n ) are given in the Theorem 7.3 below. Let us introduce the notation involved in these estimates.
• As in [46], we denote In particular, Φ H ≤ n 2 p n : indeed, H has at least one subgraph K with two vertices and one edge. In the following, we assume Φ H tends to infinity.
• We also consider the following quantity: Note that, unlike in the definition of Φ, the minimum is taken over graphs K with at least 2 edges. In the following, we assume that the graph L 2 with three vertices and two edges is included in H -see a discussion on this hypothesis at the end of the Section. In particular, this implies that Φ H , Φ H ≤ n 3 p 2 n and n 3 p 2 n → ∞ (since Φ H → ∞).  n v H p e H n Φ H , (7.5) for some constant C H, depending on H and , but not on n. Assume furthermore n(1 − p n ) 2 1. Then we have the following estimate for the variance: for some constant C > 0 and n sufficiently large.
Remark 7.4. Note in particular that, in many case (e.g. p n = p constant) the variance of Var(X H n ) has a different order of magnitude than in the independent model G(n, p n ).
This phenomenon has already been observed by Janson [43].
Proof. We prove here only Eq. (7.5). The proof of Eq. (7.6) is postponed to Appendix B.2.
We denote Λ =L H 1 the subgraph ofL H formed by edges of weight 1. SinceL H has only edges of weight 1 and 1/m n , we have: On the other hand, for a fixed K, the number of graphs H with V H ⊂ [n], which are isomorphic to H, and whose intersection with i=1 H i is given by K is bounded by  We can now establish the following central limit theorem, originally proved by Janson [43,44].
Theorem 7.5. [44,Theorem 19] Let m n be an integer sequence tending to infinity with m n ≤ n 2 . Set p n = m n / n 2 and consider a random graph G taken with Erdős-Rényi distribution G(n, m n ).
Fix some graph H that contains L 2 . Assume Φ H tends to infinity and that for some ε > 0, we have n 1−ε (1 − p n ) 2 1. We denote X H the number of copies of H in the random graph G. , while σ 2 n is bounded from below by Eq. (7.6). We therefore have

Then, in distribution
Qn Note also that Rn Qn = Φ H ≤ n 2 . We distinguish two cases. • If the minimum in (7.4) (the definition of Φ H ) is achieved by the graph H with two vertices and one edge, then Φ H = n 2 p n and we use the inequality Φ H ≤ n 3 p 2 n . Thus Qn In particular (4.10) is fulfilled for any integer s ≥ 4/ε.
• Otherwise, one has Φ H = Φ H . We also know that p n tends to 0 (otherwise n 2 p n clearly minimizes (7.4)), so that Since Φ H tends to infinity, (4.10) is fulfilled for s = 3.
Remark 7.6 (Discussion of the hypotheses). The hypothesis "Φ H → ∞" is clearly necessary for asymptotic normality: otherwise, with probability not tending to zero, G(n, m n ) does not contain any copy of H [46, Section 3.1], which rules out the possibility that X H n satisfies a central limit theorem.
Janson describes also the limit distributions of induced subgraph counts [44,Theorems 21 and 23]. Some of these results could be also derived with weighted dependency graphs, but certainly not all since the limit law is not always Gaussian.
The method presented in this article has nevertheless an important advantage: it can be applied to other combinatorial objects where a coupling with an independent model is not available, as illustrated in the other sections of this article. EJP 23 (2018), paper 93.

Random permutations 8.1 A weighted dependency graph for random permutations
We consider in this section a uniform random permutation Π n of size n. Let A n be the set [n] 2 . For (i, l) ∈ A n , we denote Joint moments of these variables have simple expressions. If either i = j or l = k, but not both, then Y i,l and Y j,k are incompatible, i.e. Y i,l Y j,k = 0. Moreover, if we consider distinct integers i 1 , · · · , i r and l 1 , · · · , l r , then . • if two pairs α 1 = (i 1 , l 1 ) and α 2 = (i 2 , l 2 ) in A n satisfy either i 1 = i 2 or l 1 = l 2 , then they are linked inL by an edge of weight 1.
• otherwise, they are linked inL by an edge of weight 1/n.
ThenL is a (Ψ n , C) dependency graph, for the family for any multiset B of elements of A n • and C = (C r ) r≥1 is a sequence that does not depend on n.

(8.2)
This inequality is proved exactly as in Theorem 6.3, using the explicit expression Eq. (8.1) for joint moments.
Using Theorem 5.11, we also have dependency graphs for monomials in the variables Y i,l . In particular, in Section 8.3, we consider degree 2 monomials Y i,j Y k,l . Following Section 5.3, we denote: is the set of multisets of size 2 of elements of A n .
•L 2 is the complete graph on A n such that the weight of the edge between {α 1 , α 2 } and {β 1 , β 2 } is 1 if some α i shares its first, respectively second, element with some β j and 1/n otherwise.

Remark 8.3.
This weighted dependency graph and its powers (see Theorem 5.11) correspond to the bounds on cumulants given in [33,Theorem 1.4]. Thanks to the results of this article, proving these bounds on cumulants is now easier (in particular we do not need to consider truncated cumulants anymore as in [33,Section 2.4]). Yet, some ideas of this article dedicated to random permutations are crucial here to build the general theory of weighted dependency graphs.
Remark 8.4. When Π n is distributed with Ewens distribution -see, e.g., [3] for background on this measure -, the family {Y i,l , (i, l) ∈ A n } still admits a weighted dependency graph. The only difference is that Y i,l and Y j,k share an edge of weight 1 as soon as {i, l} ∩ {j, k} = ∅. Nevertheless, most central limit theorems for Ewens distribution can be inferred from a corresponding central limit theorem for uniform random permutations using a coupling argument (the Chinese restaurant process yields a coupling between Ewens distributed permutations and uniform permutations, where only O p (ln(n)) values differ). Therefore we have decided to restrict here to the uniform model.

A functional central limit theorem for simply indexed permutation statistics
In this section, we prove a weaker version of a functional central limit theorem, due to Barbour and Janson [9].
Let (a (n) 0 (i, l)) i,l≤n (n ≥ 1) be a sequence of real matrices. Take t in [0, 1], an integer n and a permutation π of size n. If nt is an integer, then we define We then extend X π n to a continuous function on [0, 1], by requiring that X π n is affine on each interval [j/n, (j + 1)/n] (for 0 ≤ j ≤ n − 1). More explicitly we set, for t in [0, 1], where x denotes, as usual, the integer value of x. Consider now a uniform random permutation Π of size n and set X n = X Π n . Then X n is a random continuous function on [0, 1] and we want to study its asymptotics.
The quantity X n (1) = n i=1 a (n) 0 (i, π(i)) is a classical combinatorial statistics on permutation, originally introduced by Hoeffding [39], while the process X n is a slight deformation of the one considered by Barbour and Janson in [9] (theirs is a step function, while ours is continuous piecewise-affine).
We now perform a centering by defining Then, for all i and n, n k=1 a (n) (i, k) = 0 and, for t in [0, 1], We assume that: • the entries of the matrices a (n) are uniformly bounded by a constant M ; EJP 23 (2018), paper 93.
• The functions f n and g n defined by have pointwise limits f and g.
Note that these hypotheses are in particular fulfilled when a (n) (i, l) = α(i/n, l/n) for some fixed piecewise continuous function α : [0, 1] 2 → R independent of n. The latter is a natural hypothesis to get a limit for a renormalized version of X n . We consider convergence in the space C[0, 1] of real-valued continuous functions on [0, 1], endowed with the uniform metric. Denote t ∧ u = min(t, u). Theorem 8.5. We use the notation and assumptions above. Then there exists a zeromean continuous Gaussian process Z on [0, 1] with covariance function given by Proof. The first step is to prove the convergence of the finite-dimensional laws (note that this step does not require the existence of Z). We do that by proving the convergence of joint cumulants; since a multidimensional Gaussian vector is determined by its joint moments, this is enough to establish convergence in distribution. Both sides are centered so that there is nothing to prove for the expectation.
For covariances, first write, for t ∈ [0, 1], n if l = k and 0 otherwise. Thus the expression in the bracket reduces to n −1 n l=1 a (n) (i, l) 2 and the total contribution of terms with i = j in (8.5) is f n (t ∧ u).
On the other hand, if i = j then E(Y i,l Y j,k ) = 0 if l = k and 1 n(n−1) otherwise. Thus, 1≤l,k≤n l =k a (n) (i, l)a (n) (j, k). Since a (n) is centered, the same sum without the restriction l = k equals to 0. Thus, the sum with condition l = k is the opposite of the sum with condition l = k and, if i = j, one has   1≤l,k≤n As a consequence, the total contribution of terms with i = j in (8.5) is Finally we get the piecewise limit as wanted.
Let us now consider higher order cumulants. Recall that the family {Y (i,l) , (i, l) ∈ A n } admitsL as a (Ψ n , C) weighted dependency graph whereL, Ψ n and C are defined in Theorem 8.1. Since a (n) (i, l) is uniformly bounded by M , the family a (n) (i, l)Y (i,l) , (i, l) ∈ A n has the same dependency graph, replacing simply Ψ n by Ψ n (B) := M |B| Ψ n (B).
For this dependency graph, using the notation of Section 4.3, one has R n = (i,l)∈An M n = M n.
Since there are O(n 2 ) such terms, the total contribution of these terms is O(1).
There are O(n) such terms which gives a total contribution of O(1).
But the number of such terms is bounded by r, so that their total contribution is also O(1).
The right hand side tends to 0 so that κ r X n (t 1 ), · · · , X n (t r ) tends to 0. This proves the convergence of the finite-dimensional laws towards Gaussian vectors.
It remains now to prove that the sequence of random functions X n is tight in C[0, 1]. This will prove the existence of the continuous Gaussian process Z, and the convergence of X n towards Z as well.
To do this, we use a moment criterion that can be found in a book of Kallenberg [48, Corollary 16.9 for d = 1]: a sufficient condition for X n to be tight is that X n (0) is tight and that, for some positive constants a, b and λ, In our case, X n (0) is identically equal to 0 so that only the inequality (8.7) needs to be checked. Moreover, since X n is affine in each interval [j/n, (j + 1)/n], it is in fact sufficient to prove this inequality when nt and ns are integers; see Appendix C (this reduction needs a ≥ 1 + b, which is the case in what follows).
Let n ≥ 1 be an integer and s and t in [0, 1] such that ns and nt are integers. Assume t < s. We consider the case a = 4, that is the fourth moment of X n (s) − X n (t). Since X n (s) − X n (t) is centered, from the moment cumulant formula (2.3), we get E ( X n (s) − X n (t)) 4 = κ 4 ( X n (s) − X n (t)) + 3κ 2 ( X n (s) − X n (t)) 2 . But n 1/2 X n (s) − X n (t) = ns i=nt+1 n l=1 a (n) (i, l)Y i,l and its cumulants can be bounded by Theorem 4.10. Note that we consider here the restriction of the dependency graph above to the family  On the other hand the parameter T ,n (s, t) associated to this restricted graph is bounded by the same bound as in the non-restricted case above: T ,n (s, t) = O(1). Therefore, from Theorem 4.10, we have |nκ 2 X n (s) − X n (t) | ≤ D 2 n (s − t), |n 2 κ 4 X n (s) − X n (t) | ≤ D 4 n (s − t), where the last equality comes from the fact that s − t ≥ n −1 since ns and nt are distinct integers. Thus (8.7) is proved for a = 4, b = 1 and λ = D 4 + 3D 2 , which ends the proof of the theorem.
To illustrate this theorem, we use the same example as Barbour and Janson [9, , that is 1 if l ≥ i and 0 otherwise. Then X n (t) is the number of weak exceedances of Π of index at most nt.
After centering, we have a (n) (i, l) = [l ≥ i] − (n − i + 1)/n, which is obviously uniformly bounded. As explained in [9], if 0 ≤ t ≤ u ≤ 1, then one has All our hypotheses are fulfilled and we obtain that there exists a continuous Gaussian Remark 8.6. The hypotheses given here are stronger than the ones of Barbour and Janson [9], who use a bound on the Lyapounov ratio, instead of our uniformly bounded assumption. However, as seen above, the example of exceedances, which motivated their work, also fits in our framework. Note also that Barbour and Janson also give a bound on the speed of convergence, which we cannot achieve. Another difference between their theorem and ours is that they consider convergence in Skorohod space D[0, 1], while we work in C[0, 1], but since the limit is continuous, this is just a matter of taste.

A functional central limit theorem for doubly indexed permutation statistics
An advantage of the method of the previous section is that it can be easily adapted to more involved permutation statistics, such as doubly indexed permutation statistics (DIPS). By definition a DIPS is a statistics of the following form: let ζ (n) 0 (i, j, k, l) i,j,k,l∈[n] be a sequence of multi-indexed real numbers, then, for a permutation π of size n, we set X n (π) = 1≤i,j≤n ζ (n) 0 (i, j, π(i), π(j)).
A central limit theorem for DIPS with control on the speed of convergence is given in [67]. In this section, we provide a functional CLT for this class of statistics.
To this end let us associate with a DIPS and a permutation π a continuous function on [0, 1] 2 as follows. If nt 1 and nt 2 are integers, then 0 (i, j, π(i), π(j)).
The function X π n is then extended to [0, 1] 2 by requiring that, for any pair (i, j) with 0 ≤ i, j ≤ n − 1, the function X π is affine on the square [i/n; (i + 1)/n] × [j/n; (j + 1)/n]. We now consider a uniform random permutation Π of size n and the associated random function X n := X Π n . We perform the following centering: With this definition, if nt 1 and nt 2 are integers, we have nt2 j=1 ζ (n) (i, j, π(i), π(j)).
We assume that: • the real numbers ζ (n) (i, j, k, l) (n ≥ 1, i, j, k, l ≤ n) are uniformly bounded by a constant M ; • the rescaled covariance n −3 Cov X n (t 1 , t 2 ), X n (u 1 , u 2 ) has a pointwise limit that we denote σ(t 1 , t 2 ; u 1 , u 2 ). Theorem 8.7. We use the notation and assumptions above. Then there exists a zeromean continuous Gaussian process Z on [0, 1] 2 with covariance function given by and, in distribution in C[0, 1] 2 , we have Proof. The structure of the proof is the same as for simply-indexed permutation statistics. We first prove the convergence of finite-dimensional laws by controlling joint cumulants. Both sides are centered and we have assumed the convergence of covariances, so that we can focus on joint cumulants of order at least 3.
Note that, if nt 1 and nt 2 are integers, we can rewrite X n (t 1 , t 2 ) as Recall from Theorem 8.2 that the family {Y i,k Y j,l , (i, k), (j, l) ∈ A n } admitsL 2 as a (Ψ, D) weighted dependency graph. Since ζ (n) (i, j, k, l) is uniformly bounded by M , the family has the same dependency graph, replacing Ψ by Ψ (B) := M |B| Ψ(B). For this dependency graph, we have R n = M n 2 . A case analysis similar to the one above shows that T r,n = O(n) (with a constant depending on r). We sketch here briefly the argument. Recall that we want to bound, for fixed α 1 , · · · , α r , the sum β ∈A n W ({β }, {α 1 , · · · , α r }) Ψ n {α 1 , · · · , α , β } Ψ n {α 1 , · · · , α r } . • If β in A n does not share any element with α 1 , · · · , α r , then the quotient of Ψ is M/n 2 and the W factor is equal to 1/n. Since there are fewer than |A n | n 4 such terms, their total contribution is O(n).
• If β has an element, but no pair in common with one of the α i , then the quotient of Ψ is also M/n 2 , while the W factor is 1. But there are O(n 3 ) such terms, so that the total contribution of such terms is also O(n).
• If β has exactly pair in common with one of the α i , then the quotient of Ψ is also M/n and the W factor is also 1. There are O(n 2 ) such terms, so that the total contribution of such terms is also O(n).
• Finally if both pairs in β already appear in the α i , then the quotient of Ψ is M and the W factor is also 1. But this implies that the pairs of β are chosen within a finite family, so that there is only a constant number of such terms and their total contribution is O(1).
The right hand side tends to 0 so that all joint cumulants of the family (X n (t 1 , t 2 )) (t1,t2)∈[0,1] 2 of order at least 3 tend to 0. This proves the convergence of the finite-dimensional laws towards Gaussian vectors.
We now prove the tightness of the random functions ( X n ) n≥1 in the space C[0, 1] 2 . We again use the moment criterion [48,Corollary 16.9], but this time for d = 2. Since X n (0, 0) is tight (it is identically equal to 0, for all n), we should prove that there exist positive constants a, b and λ, for all (s 1 , s 2 ), (t 1 , t 2 ) in [0, 1] 2 and n ≥ 1. As in dimension 1, since X n is affine on each square [i/n; (i + 1)/n] × [j/n; (j + 1)/n] (0 ≤ i, j ≤ n − 1), it is enough to prove (8.9) when ns 1 , ns 2 , nt 1 and nt 2 are integers; see Appendix C.
Let us first give bounds depending on (s 1 , s 2 ), (t 1 , t 2 ) for cumulants of the difference X n (s 1 , s 2 ) − X n (t 1 , t 2 ). If t 1 < s 1 and t 2 < s 2 , then where the first sum runs over pairs (i, j) such that i ≤ ns 1 , j ≤ s 2 and either nt 1 < i ≤ ns 1 or nt 2 < j ≤ ns 2 . There are fewer than n 2 (s 1 − t 1 + s 2 − t 2 ) such pairs (i, j), so that, by the same argument as in the one-dimensional case (restricting the dependency graph), we have κ r X n (s 1 , s 2 ) − X n (t 1 , t 2 ) ≤ D r (|s 1 − t 1 | + |s 2 − t 2 |) n 1−r/2 , (8.10) for some constant D r that depends only on r. The same bound obviously holds without the assumption t 1 < s 1 and t 2 < s 2 .
for some constant D. For the last inequality, note that since ns 1 , ns 2 , nt 1 and nt 2 are integers, we have n −1 ≤ δ = |s 1 − t 1 | + |s 2 − t 2 | (we can assume that either s 1 = t 1 or s 2 = t 2 , otherwise (8.9) is trivial). This ends the proof of (8.9) (for a = 6 and b = 1) and hence of the theorem.
As an example we consider positive alignments in random permutations. A positive alignment in a permutation π is a pair (i, j) such that j < i ≤ σ(i) < σ(j). This statistics mixes somehow the classical notions of inversions and exceedances: it is studied together with many similar statistics in [20]. Let us set ζ (n) 0 (i, j, k, l) = [j < i ≤ k < l] (i.e. 1 if j < i ≤ k < l and 0 otherwise) and define the associated random function X Π n in C[0, 1] 2 as above. In particular X Π n (1, 1) is the number of positive alignments in the uniform random permutation Π.
It is clear that ζ (n) 0 (i, j, k, l) and hence ζ (n) 0 (i, j, k, l) is uniformly bounded. Besides, an easy adaptation of Theorem B.1 shows that Cov X n (t 1 , t 2 ), X n (u 1 , u 2 ) is a polynomial in n, t 1 , t 2 , u 1 , u 2 . Moreover, from the same arguments as in the proof above to bound general joint cumulants, we know that, for fixed t 1 , t 2 , u 1 u 2 , it behaves as O(n 3 ). Thus, for any t 1 , t 2 , u 1 u 2 in [0, 1], the rescaled covariance n −3 Cov X n (t 1 , t 2 ), X n (u 1 , u 2 ) has indeed a limit.
Thus, our theorem applies and X n converges in probability in C[0, 1] 2 towards a zeromean continuous Gaussian process in [0, 1] 2 . It is possible to compute the covariances of the limiting process, but it would be a lengthly computation. of the occupied sites. The system evolves as follows: • each particle has an exponential clock with rate 1. When it rings the particle jumps to the right if it is not in the right-most site and if the site at its right is empty. Otherwise, the jump is suppressed.
• Similarly, each particle has another exponential clock with rate 1 and attempts to jump to its left when it rings (with similar rules as above).
• if the left-most (resp. right-most) site is empty, an exponential clock with rate α (resp. δ) is associated with it. When it rings, a particle is added to the left-most (resp. right-most) site. To state it, we first need to introduce the discrete difference operator ∆. If f in a function on positive integers, we set ∆f (N ) = f (N ) − f (N − 1). Note that ∆f is not defined for N = 1, but this is irrelevant as we shall make N tends to infinity, while we apply ∆ a fixed number of times.
Fix a positive integer r and a some integers 1 ≤ i 1 < · · · < i r+1 ≤ N . As the formula involves SSEP with different number N of sites, we make it explicit in the notation and denote κ N r (τ i1 , . . . , τ ir ) the joint cumulants of τ i1 ,. . . , τ ir . Derrida, Lebowitz and Speer [28,Eq (A.11)] have proved that Expectations can be easily computed (see, e.g. [25,Eq. (42)]): Eqs. (9.1) and (9.2) determine the joint cumulants of distinct variables in the family (τ i ) 1≤i≤N . We will use this to find a weighted dependency graph for this family in the next section.

A weighted dependency graph in SSEP
We start by a lemma, bounding repetition-free joint cumulants of the family (τ i ) 1≤i≤N .
Lemma 9.1. Let r ≥ 1. Then there exists a constant D r such that for each N ≥ r and (i 1 , · · · , i r ) with 1 ≤ i 1 < · · · < i r ≤ N , we have Proof. We will in fact prove a stronger statement: EJP 23 (2018), paper 93.
The quantity κ N r (τ i1 , . . . , τ ir ) is a polynomial in i 1 , · · · , i r with coefficients that are rational functions in N . Moreover, its total degree in N, i 1 , · · · , i r is at most −r + 1.
To simplify the discussion below, we call such a function a nice function of degree at most −r + 1. It is clear that, if f (N ; i 1 , · · · , i r ) is a nice function of degree at most d, then max i1,...,ir∈ [N ] f (N ; i 1 , · · · , i r ) = O(N d ).
Therefore proving the above claim proves the lemma.
We prove this statement by induction on r. For r = 1, it follows immediately from the explicit formula (9.2). Take r ≥ 1 and suppose that our statement holds for any r ≤ r. We consider the quantity κ N r+1 (τ i1 , . . . , τ ir , τ ir+1 ) and its expression given in Eq. (9.1). Fix a set partition π in P([r]).
• By induction hypothesis, for each block B of π, κ N |B| (τ it ; t ∈ B) is a nice function of degree at most −|B| + 1.
• Applying the operator ∆ turns it into a nice function of degree at most −|B|.
• Multiplying these nice functions for different blocks B of π gives a nice function of degree at most − B∈π |B| = −r.
The sum of these nice functions (over set partitions π in P([r])) is also a nice function of degree at most −r. We then multiply by E(τ ir+1 ) − ρ b which, as can be seen on Eq. (9.2), is a nice function of degree 0 and we still have a nice function of degree at most −r. Therefore κ N r+1 (τ i1 , . . . , τ ir , τ ir+1 ) is a nice function of degree at most −r, which ends the proof of the lemma.
We are ready to present a weighted dependency graph associated with SSEP. ThenL is a (Ψ N , C) weighted dependency graph for the family {τ i , i ∈ N }, for some sequence C = (C r ) r≥1 that does not depend on N .
Proof. We note the three following fact: (1) Ψ n is trivially super-multiplicative, (2) the τ i are Bernoulli variables and (3)L has no edges of weight 1. From Theorem 5.5 (which uses Theorem 5.2), it is enough to prove bounds on cumulants of sets of distinct variables (instead of cumulants of all multisets of variables). Namely, we should prove that, for any r ≥ 1 and any distinct i 1 , · · · , i r in [N ], one has for a constant D r that does not depend on N . But this is exactly Theorem 9.1.

Remark 9.3.
In [28], Derrida, Lebowitz and Speer have proved that, for any x 1 , · · · , x r in [0, 1] the quantity N r−1 κ N r τ N x1 , . . . , τ N xr has a limit when N tends to infinity. This of course implies that the joint cumulant is O(N −r+1 ). However the constant in the O symbol could a priori depend on x 1 , · · · , x r , while we need a bound which is uniform in i 1 , · · · , i r . This explains why we need Theorem 9.1 and we can not use directly the result of Derrida, Lebowitz and Speer. Nevertheless, the key identity in the proof is the induction formula (9.1), due to these authors. EJP 23 (2018), paper 93.

A functional central limit for the number of particles
Let N ≥ 1 and t in [0, 1]. We consider a random state τ in {0, 1} N , distributed according to the steady state of the SSEP on N sites. If N t is an integer, we define X N (t) as the number of particles in the first N t cells of τ . Formally, this means X N (t) = N t i=1 τ i . We then extend X N to a continuous function on [0, 1], by requiring that it is affine on each segment [i/N ; (i + 1)/N ].
This function measures the repartition of the particles in τ . Informally, it is the integral of the density of particles, often considered in the physics literature; see, e.g., [25,Section 3].
Since there are explicit formulas for the expectations and covariances of the τ i [28, Eq. (2.3) and (2.4)], the expectations and covariances of (X N (t)) t∈[0,1] are easy to evaluate asymptotically: In the last formula, x ∧ y := min(x, y) and x ∨ y := max(x, y). We denote σ(u, v) the right-hand side of Eq. (9.4).
Theorem 9.4. We use the notation above. There exists a zero-mean continuous Gaussian process Z on [0, 1] with covariance function given by Cov(Z(t), Z(u)) = σ(u, t) and, in distribution in C[0, 1], we have, when N tends to infinity, Proof. As usual, we start by proving the convergence of the finite-dimensional laws. To do that, we prove the convergence of joint cumulants. Expectations clearly converge as both sides are centered. Covariances also converge, by definition of σ(u, t). Let us consider now higher order cumulants. We recall that the family {τ i , i ∈ N } admits a weighted dependency graphL; see Theorem 9.2. Call R N and Q N the associated parameters, as in Section 4.3. From Theorem 4.9, since Ψ N is the constant function equal to 1, R N is simply the number of vertices ofL, which is N . Moreover, T ≤ ∆, where ∆ − 1 is the maximal weighted degree in the graph, which is here smaller than 1 (i.e. ∆ < 2). From Theorem 4.16, we get, that for any r ≥ 3 and t 1 , · · · , t r in [0, 1] (such that N t 1 , · · · , N t r are integers), In particular, all joint cumulants of order 3 or more tend to 0, which ends the proof of multidimensional laws toward Gaussian vectors.
The proof of tightness is virtually identical to that in the proof of Theorem 8.5.
Remark 9.5. Thanks to the stability of weighted dependency graph by product (the function Ψ here, identically equally to 1, is super-multiplicative), it is possible to obtain EJP 23 (2018), paper 93.
functional central limits for more complicated quantities that involve products of τ i .
For example if we are interested in the number and repartition of particles that can jump to their right, we should define X N (t) = N t i=1 τ i (1 − τ i+1 ). Its joint cumulants are easily bounded, using the weighted dependency graphL 2 for the set of products It suffices then to compute the asymptotics of the covariances (X N (t)) t∈[0,1] , which should be an elementary but cumbersome computation starting from the explicit formulas that exist for (truncated) correlation functions [28].
Remark 9.6. In the recent years, combinatorial models have been given to describe the steady state τ of SSEP (and more generally of the asymetric simple exclusion process); see [22] and references therein. In the particular case where α = β = 1 and γ = δ = 0, this relates particles in τ to exceedances in permutations; see [33, Section 5.2] for details. In this sense, the example at the end of Section 8.2 can be seen as a particular case of Theorem 9.4.

Markov chains
We consider here an aperiodic irreducible Markov chain (M k ) k≥0 on a finite state space S. We denote by P the transition matrix, namely P (s, t) is the probability that M k+1 = t if M k = s (for any k ≥ 0). Let π 0 be the initial distribution, that is the law of M 0 . We also denote π the stationary distribution (seen as a row vector), characterized by π P = π.
i1 · · · Y sr ir ] = π 0 P i1 E s1,s1 P i2−i1 E s2,s2 · · · E sr−1,sr−1 P ir−ir−1 E sr,sr 1. From now on, we shall suppose that the initial distribution π 0 is equal to the stationary distribution π. We will prove in Section 10.2 that there is a natural weighted dependency graph structure on the (Y s i ) i≥1;s∈S . The weight of the edge joining Y s i and Y t j is λ j−i 2 , where λ 2 ∈ [0, 1) is the second biggest modulus of an eigenvalue of the transition matrix P . This encodes the fact that far apart elements of the Markov chains are almostindependent. In Section 10.3, this weighted dependency graph structure is used to prove a central limit theorem for the number of occurrences of a given subword u in w n = (M 0 , · · · , M n ), as announced in the introduction.

Bounds for boolean and classical cumulants
The goal of this section is to bound the joint cumulants of the variables (Y s i ) i≥1;s∈S .
Such bound of cumulants can be found in the monograph of Saulis and Statulevičius [61,Chapter 4]; nevertheless, to keep this section self-contained, we present a proof here for the simple case of finite-state Markov chain.
Instead of working directly with classical (joint) cumulants, we first give bounds for boolean cumulants. Corresponding bounds for classical cumulants will then follow easily, thanks to a formula linking these different types of cumulants recently established by Arizmendi, Hasebe, Lehner and Vargas in [2] (see also [61,Lemma 1.1]; in loc. cit., boolean cumulants are called centered moments).
We recall that (M k ) k≥0 is an aperiodic irreducible Markov chain with transition matrix P , such that M 0 is distributed according to the stationary distribution π of the chain. Recall also that Y s i is the indicator function of the event M i = s. Finally, λ 2 is the biggest modulus of an eigenvalue of P , except 1. Lemma 10.1. Let r > 0. With the above notation, there exists a constant C P,r depending on P and r with the following property. For any integers i 1 < i 2 < · · · < i r and states s 1 , · · · , s r , we have Proof. Fix integers i 1 < i 2 < · · · < i r and s 1 , · · · , s r in S. To make notation lighter, we write E(j) = E sj ,sj , (j) = i j+1 − i j and Z j = Y sj ij . As in the summation index of (10.2), we consider l ≥ 0 and 1 ≤ d 1 < . . . < d l ≤ r − 1. Since the initial distribution π 0 is the stationary distribution π, one has π 0 P i = π and formula (10.1) for joint moments simplifies a little. We have Multiplying such expressions, we get where we set, for 1 ≤ k ≤ r − 1, The boolean cumulant B r (Z 1 , · · · , Z r ) now writes as B r (Z 1 , · · · , Z r ) = π E(1) (P (1) − 1 π) E(2) · · · E(r − 1) (P (r−1) − 1 π) E(r) 1.
By Perron-Frobenius theorem, the matrix P has a unique eigenvalue of modulus 1 and 1π is the projector on the corresponding eigenvector; see [52, p 674]. Therefore, for any , the matrix (P − 1 π) has operator norm λ 2 . The result follows immediately.
We now recall the expression of classical cumulants in terms of boolean cumulants given in [2]. Let us first introduce some terminology. A set partition ρ of [r] is called reducible if there exists in {1, · · · , r − 1} such that ρ ≤ {1, · · · , }, { + 1, · · · , r} ; otherwise, it is called irreducible. , with the following property. For any random variables Z 1 , · · · , Z r with finite moments defined on the same probability space, one has Arizmendi, Hasebe, Lehner and Vargas relate d ρ with a specialization of the Tutte polynomial of a specific graph associated with ρ, but we do not need this description of d ρ here. For our purpose, the crucial aspect in this boolean-to-classical cumulant formula is that the sum ranges only over irreducible set partitions. We can now establish our bound on classical cumulants. Lemma 10.3. As above, let (M k ) k≥0 be an aperiodic irreducible Markov chain with transition matrix P , such that M 0 is distributed according to the stationary distribution π of the chain. Let r > 0. Then there exists a constant D P,r depending on the transition matrix P and on r with the following property. For any distinct integers i 1 < i 2 < · · · < i r and states s 1 , · · · , s r , we have Proof. For any subset C of [r], we know by Theorem 10.1 that If ρ is an irreducible set partition, one can easily check that , which proves the lemma.

A weighted dependency graph for Markov chains
We denote by N ≥0 the set of nonnegative integers. Proposition 10.4. As above, let (M k ) k≥0 be an aperiodic irreducible Markov chain on a finite state space S, such that M 0 is distributed according to the stationary distribution π of the chain. Recall that Y s i is the indicator function of the event M i = s. We consider the complete graphL on A := N ≥0 × S with weight λ j−i 2 on the edge {(i, s), (j, t)} (for any nonnegative integers i < j and states s, t in S). Finally, let Ψ be the function on multisets of elements of A that is identically equal to 1.
ThenL is a (Ψ, C) weighted dependency graph for the family {Y s i ; (i, s) ∈ A} for some sequence C = (C r ) r≥1 .
Proof. Consider a multiset B = {(i 1 , s 1 ), . . . , (i r , s r )} of elements of A and the induced graphL [B]. Assume, without loss of generality that i 1 < · · · < i r . Then it is easy to observe that the maximum weight of a spanning tree inL Therefore, it is enough to prove that, for any fixed r > 0, there exists a constant D r with the following property: for any distinct integers i 1 < · · · < i r and any states s 1 , . . . , s r , we have The existence of such a constant is given by Theorem 10.3.

Subword counts in strings generated by a Markov source
We consider the following pattern matching problem. Let u 1 , . . . , u d be finite words on a finite alphabet S of respective lengths 1 , · · · , d . An occurrence of L = (u 1 , . . . , u d ) in w is a factorization w = w 0 u 1 w 1 · · · u d w d , where the w i 's are (possibly empty) words on the alphabet S. This corresponds to an occurrence of the u = u 1 · · · u d as a subwords, where letters from the same u i are required to be consecutive.
As before, let (M k ) k≥0 be an aperiodic irreducible Markov chain on S, such that M 0 is distributed according to the stationary distribution π of the chain. We are interested in the number X N of occurrences of L in the random word W N = (M 0 , · · · , M N ).
The position of such an occurrence is a d-uple (i 1 , · · · , i d ), where each i j is the index of the first letter of u j in w (in particular, we always have i j+1 ≥ i j + j ). Denote I the set of possible positions of occurrences that is We also define I N as the same set with the additional condition i d + d − 1 ≤ N . For I ∈ I, we denote Y I the indicator function of the event "W has an occurrence of L in position I". Using the above variables Y s i , we can write An estimate for the variance of X N is given by Bourdon where σ 2 (L) is an explicit constant depending on both the pattern L and the transition matrix P of the Markov chain. Our main result in this section is that the fluctuations of order N d−1/2 of X N are Gaussian (possibly degenerate if σ(L) = 0). Theorem 10.5. With the above notation, we have the convergence in distribution Proof. Theorem It is clear that .
The corresponding function Ψ is simply the constant function equal to 1. Consider the restriction of this weighted dependency graph to I N . Using the notation of Section 4.3, we have R N = |I N | = O(N d ). To find an upper bound for T ,N , let us fix I 1 ,. . . , I and set I = j=1 I j . Then for J in I, we have The summand does not depend on J, so that the last summation symbol can be replaced with the number of sets J in I containing j. This number is smaller than N d−1 . Moreover for a fixed i, the sum Therefore cumulants of X N of order at least 3 tend to 0. On the other hand, its expectation and variance tend to 0 and σ(L) respectively. This concludes the proof using the method of moments. This upper bound alone implies the concentration result advertised by these authors. Note however that their result is proved for more general sources and pattern problems.

A Proof of Theorem 5.10
We start by a lemma.
Lemma A.1. For any nonnegative integers a 1 , . . . , a −1 , the following rational function in t has degree at most − + 1: Proof. This corresponds to [33,Lemma 2.4], but we copy the proof for completeness. Define R ev (resp. R odd ) as where ∇ is the symmetric difference operator. Thus the summand (−1) m a j1 . . . a jm t 2 −2 −m appears as many times in R ev as in R odd . Finally, all terms corresponding to values of m smaller than − 1 cancel in the difference R odd − R ev and R odd − R ev has degree at most 2 −2 − + 1. Dividing by R ev , which has degree 2 −2 , this ends the proof.
We now prove Theorem 5.10, using the notation defined there.
Proof of Theorem 5.10. We proceed by induction first on , and then on a . For = 1, there is nothing to prove. Consider > 1 and assume that the statement holds for all < . In particular, for any ∆ [ ], the subfamily u (n) δ (a i ; i ∈ ∆) δ⊆∆,n≥n0 has the ε n SC/QF property and P ∆ u (n) (a 1 , . . . , a ) − 1 = O(X −|∆|+1 n ).
We split the sum depending on which summation indices are equal. For a given set of equalities (e.g. i 1 = j 2 and l 1 = l 2 , but all other indices are distinct), the covariance is always the same and the corresponding number of terms is a polynomial in n.
Besides, from Theorem 4.10, we know that Var(Cr n ) = O(n 3 ) (recall from the proof of Theorem 6.5 that, in this case, R n = O(n 2 ) and T 1,n = O(n)). Therefore the degree of the above polynomial is at most 9.
A polynomial of degree at most 9 can be determined by polynomial interpolation from its values on the set {0, · · · , 9}. But Var(Cr n ) can be easily computed with the help of a computer algebra software for small values of n. We performed this computation using sage [66]. The code has been embedded in the pdf file for interested readers of the electronic version. We obtain the following result. We refer to [36,Theorem 3] for another proof of this result, which also explains the polynomiality in n, but relies on the "remarkable exact formula" for the generating series of crossings.
Var(X H n ) = where N K is the number of pairs (H 1 , H 2 ) with intersection isomorphic to K. Note that the summation index does not depend on n. Furthermore, all summands are nonnegative, thus the order of magnitude of the sum is simply the maximum of the orders of magnitude of the summands. It is easy to see that N K n 2v H −v K : see, e.g., [ This completes the proof of the lemma.

C Moment inequalities and tightness of piecewise-affine random functions
The goal of this last appendix section is to establish the following: for piecewise-affine random functions, tightness can be inferred from moment inequalities for points of the mesh. We start by a trivial lemma.
Lemma C.1. For any a > 1, the exists a constant C a such that, for all x, y, z ≥ 0, one has (x + y + z) a ≤ C a (x a + y a + z a ).

C.1 One-dimensional case
The following lemma can be found in unpublished lecture notes of Marckert.
Lemma C.2. Consider a sequence (X n ) of random elements in C[0, 1]. Assume that for each n, almost surely X n is affine on each segment [j/n, (j + 1)/n] (for 0 ≤ j ≤ n − 1) and that there exists positive constants a, b and λ with a ≥ 1 + b such that E |X n (s) − X n (t)| a ≤ λ |s − t| 1+b , (C. 1) as soon as ns and nt are integers (n ≥ 1 and s, t ∈ [0, 1]).
Then (C.1) holds as well for any s and t in [0, 1] with the same exponents a and b but a different constant λ instead of λ.
As a consequence, if moreover X n (0) is tight, then the sequence X n is also tight.
Proof. Let n ≥ 1 and s and t in [0, 1] with t < s. We distinguish two cases.
The last inequality comes from the fact that n(s − t) ≤ 1 and a − 1 − b ≥ 0.

C.2 Two-dimensional case
We now state and prove a two-dimensional analogue of the previous lemma.
As a consequence, if moreover X n (0, 0) is tight, then the sequence X n is also tight.
• If the segment [s, t] crosses at most two lines of the grid, call u and v the intersection points, so that |s − t| = |s − u| + |u − v| + |v − t|. Since s and u (respectively, u and v and v and t) lie in the same square, we can apply the first case to bound E |X n (s) − X n (u)| a (respectively, E |X n (u) − X n (v)| a and E |X n (v) − X n (t)| a ).
Then the same computation as in (C.2) shows that (C.3) holds in this case.