Large deviations and exact asymptotics for constrained exponential random graphs

We present a technique for approximating generic normalization constants subject to constraints. The method is then applied to derive the exact asymptotics for the conditional normalization constant of constrained exponential random graphs.


Introduction
In a recent work [2], Chatterjee and Dembo presented a general technique for computing large deviations of nonlinear functions of independent Bernoulli random variables. In detail, let f be a function from [0, 1] n to R, they considered a generic normalization constant of the form is valid, where I(x) = n i=1 I(x i ) and The sufficient condition they came up with consists of two parts. They first assumed that f is a twice continuously differentiable function on [0, 1] n and introduced some shorthand notation. Let · denote the supremum norm. For each i and j, let and define a = f , b i = f i , and c ij = f ij . In addition to this minor smoothness condition on the function f , they further assumed that the gradient vector ∇f (x) = (∂f /∂x 1 , . . . , ∂f /∂x n ) satisfies a low complexity gradient condition: For any ǫ > 0, there is a finite subset of R n denoted by D(ǫ) such that for all x ∈ [0, 1] n , there exists d = (d 1 , . . . , d n ) ∈ D(ǫ) with (1.5) Theorem 1.1 (Theorem 1.5 in [2]). Let F , a, b i , c ij , and D(ǫ) be defined as above. Let I be defined as in (1.3). Then for any ǫ > 0, F satisfies the upper bound (1.8) c ii + log 2.
Moreover, F satisfies the lower bound (1.9) Chatterjee and Dembo [2] then applied this general result in several different settings and in particular to derive the exact asymptotics for the normalization constant of exponential random graphs. Let s be a positive integer. We recall the definition of an s-parameter family of exponential random graphs. Let H 1 , . . . , H s be fixed finite simple graphs ("simple" means undirected, with no loops or multiple edges). By convention, we take H 1 to be a single edge. Let ζ 1 , . . . , ζ s be s real parameters and let N be a positive integer. Consider the set G N of all simple graphs G N on N vertices. Let hom(H i , G N ) denote the number of homomorphisms (edgepreserving vertex maps) from the vertex set V (H i ) into the vertex set V (G N ) and t(H i , G N ) denote the homomorphism density of H i in G N , (1.10) By an s-parameter family of exponential random graphs we mean a family of probability measures P ζ N on G N defined by, for G N ∈ G N , where ψ ζ N is the normalization constant, These exponential models are widely used to characterize the structure and behavior of realworld networks as they are able to predict the global structure of the networked system based on a set of tractable local features. For practitioners, one of the key objectives while studying this model is to evaluate the normalization constant ψ ζ N of the probability measure P ζ N , since averages of various quantities of interest may be obtained by differentiating ψ ζ N with respect to appropriate parameters. Computation of ψ ζ N is also important in statistics because it is crucial for carrying out maximum likelihood estimates and Bayesian inference of unknown parameters. Based on a large deviation principle for Erdős-Rényi graphs established in Chatterjee and Varadhan [4], Chatterjee and Diaconis [3] developed an asymptotic approximation for ψ ζ N and connected the occurrence of a phase transition in the exponential model with the non-analyticity of ψ ζ N . Further investigations quickly followed, see for example [1,6,7,8,9,10,11]. However, since the approximation relies on Szemerédi's lemma, the error bound on ψ ζ N is of the order of some negative power of log * N and this method is also not applicable for sparse exponential random graphs.
To improve on the approximation, Chatterjee and Dembo [2] utilized Theorem 1.1. They introduced an equivalent definition of the homomorphism density so that the normalization constant for exponential random graphs (1.12) takes the same form as the generic normalization constant (1.1). This new notion of the homomorphism density is denoted by t(H, x) and is constructed in the following way. Let k be a positive integer and let H be a finite simple graph on the vertex set [k] = {1, . . . , k}. Let E be the set of edges of H and let m = |E|. Let N be another positive integer and let n = N 2 . Index the elements of [0, 1] n as x = (x ij ) 1≤i<j≤N with the understanding that if i < j, then x ji is the same as x ij , and for all i, (1.13) For any graph G N , if x ij = 1 means there is an edge between the vertices i and j and x ij = 0 means there is no edge, then t(H, x) = t(H, G N ), where t(H, G N ) is the homomorphism density defined by (1.10). Furthermore, if we let G x denote the simple graph whose edges are independent, and edge (i, j) is present with probability x ij and absent with probability 1 − x ij , then this newly defined homomorphism density t(H, x) gives the expected value of t(H, G x ). Chatterjee and Dembo checked that T (x) satisfies both the smoothness condition and the low complexity gradient condition as assumed in Theorem 1.1. In detail, they showed in Lemmas 5.1 and 5.2 of [2] that T ≤ N 2 , ∂T ∂x ij ≤ 2m, (1.14) and for any ǫ > 0, where c and C are universal constants. By taking f (x) = ζ 1 T 1 (x) + · · · + ζ s T s (x) in Theorem 1.1, they then gave a concrete error bound for the normalization constant ψ ζ N , which is seen to be F/N 2 in this alternative interpretation of (1.1) . This error bound is significantly better than the negative power of log * N and allows a small degree of sparsity for ζ i .
where c and C are constants that may depend only on H 1 , . . . , H s .
The unconstrained exponential family of random graphs (1.11) introduced above assumes no prior knowledge of the graph before sampling, but in many situations partial information of the graph is already known beforehand. To tackle this issue, we studied the constrained exponential random graph model in [5]. For clarity, we assume that the edge density of the graph is approximately known to be e, though the argument runs through without much modification for more general constraints. Take t > 0. The conditional normalization constant ψ e,ζ N,t is defined analogously to the normalization constant for the unconstrained exponential random graph model, the difference being that we are only taking into account graphs G N whose edge density e(G N ) is within a t neighborhood of e. Correspondingly, the associated conditional probability measure P e,ζ N,t (G N ) is given by (1.19) Using the large deviation principle established in Chatterjee and Varadhan [4] and Chatterjee and Diaconis [3], we developed an asymptotic approximation for ψ e,ζ N,t [5]. Nevertheless, this approximation suffers from the same problem: the error bound on ψ e,ζ N,t is of the order of some negative power of log * N and is not applicable in the sparse setting.
As shown in [5], just like the unconstrained normalization constant ψ ζ N , the conditional normalization constant ψ e,ζ N,t encodes essential information about the constrained exponential model (1.19) and helps to predict the structure and behavior of a typical random graph drawn from this model. Seeing the power of nonlinear large deviations in deriving a concrete error bound for ψ ζ N , we naturally wonder if it is possible to likewise obtain a better estimate for ψ e,ζ N,t . The following sections will be dedicated towards this goal. Due to the imposed constraint, instead of working with a generic normalization constant of the form (1.1) as in Chatterjee and Dembo [2], we will work with a generic conditional normalization constant in Theorem 2.1 and then apply this result to derive a concrete error bound for the conditional normalization constant of constrained exponential random graphs in Theorems 3.1 and 3.2.

Nonlinear large deviations
Let f and h be two continuously differentiable functions from [0, 1] n to R. Assume that f and h satisfy both the smoothness condition and the low complexity gradient condition described at the beginning of this paper. Let a, b i , c ij be the supremum norms of f and let α, β i , γ ij be the corresponding supremum norms of h. For any ǫ > 0, let D f (ǫ) and D h (ǫ) be finite subsets of R n associated with the gradient vectors of f and h respectively. Take t > 0. Consider a generic conditional normalization constant of the form (2.1) where l = a + nK, (2.5) Moreover, F c satisfies the lower bound where
• If |h(x)| ≤ tn, then We check the smoothness condition for f + e first. Note that and for any i, and for any i, j, Next we check the low complexity gradient condition for f + e. Let and θ = jτ for some integer − ψ ′ /τ < j < ψ ′ /τ }. (2.24) Note that Let e i = ∂e/∂x i . Take any x ∈ [0, 1] n and choose d f ∈ D f (ǫ/3) and d h ∈ D h (ǫ ′ ). Choose an integer j between − ψ ′ /τ and ψ ′ /τ such that Thus D(ǫ) is a finite subset of R n associated with the gradient vector of f + e. The proof is completed by applying Theorem 1.1.

Application to exponential random graphs
As mentioned earlier, we would like to apply Theorem 2.1 to derive the exact asymptotics for the conditional normalization constant of constrained exponential random graphs. Recall the definition of an s-parameter family of conditional exponential random graphs introduced earlier, where we assume that the "ideal" edge density of the graph is e. Let where T i (x)/N 2 is the equivalent notion of homomorphism density as defined in (1.13). Let n = N 2 . We compare the conditional normalization constant ψ e,ζ N,t (1.18) for constrained exponential random graphs with the generic conditional normalization constant F c (2.1). Note that the constraint |e(G N ) − e| ≤ t may be translated into |T 1 (x) − N 2 e| ≤ N 2 t, and if we further redefine t to be (1 − 1/N )t ′ /2 then we arrive at the generic constraint |h(x)| ≤ t ′ n as in (2.1). Thus ψ e,ζ N,t = F c /N 2 . In the following we give a concrete error bound for ψ e,ζ N,t using the estimates in Theorem 2.1.
Proof. Chatterjee and Dembo [2] checked that T i (x) satisfies both the smoothness condition and the low complexity gradient condition stated at the beginning of this paper, which readily implies that f and h satisfy the assumptions of Theorem 2.1. Recall that the indexing set for quantities like b i and γ ij , instead of being {1, . . . , n}, is now {(i, j) : 1 ≤ i < j ≤ N }, and for simplicity we write (ij) instead of (i, j). Let a, b (ij) , c (ij)(i ′ j ′ ) be the supremum norms of f and let α, β (ij) , γ (ij)(i ′ j ′ ) be the corresponding supremum norms of h. For any ǫ > 0, let D f (ǫ) and D h (ǫ) be finite subsets of R n associated with the gradient vectors of f and h respectively.
Based on the bounds for T i (1.14) (1.15) (1.16), we derive the bounds for f and h.
We then estimate the lower and upper error bounds for ψ e,ζ N,t using the bounds on f and h obtained above. First the lower bound: Next the more involved upper bound: Assume that n −1/4 ≤ δ ≤ 1 and 0 < ǫ ≤ 1. Since K ≤ CB, this implies that l ≤ CBN 2 , m (ij) ≤ CBδ −1 , (3.14) The following estimates are direct consequences of the bounds on l, m (ij) , and n (ij)(i ′ j ′ ) .
Taking ǫ = (B 3 log N )/(δ 3 N ) 1/5 , this gives For n large enough, we may choose δ = cn −1/(2κ) as in (3.11), which yields a further simplification We can do a more refined analysis of Theorem 3.1 when ζ i 's are non-negative for i ≥ 2.
where e(H i ) denotes the number of edges in H i and c and C are constants that may depend only on H 1 , . . . , H s , e, and t.
Remark. If H i , i ≥ 2 are all stars, then the conclusions of Theorem 3.2 hold for any ζ 1 , . . . , ζ s .
(3.28) It was proved in Chatterjee and Diaconis [3] that when ζ i 's are non-negative for i ≥ 2, the above supremum may only be attained at constant functions on [0, 1]. Therefore On the other hand, by considering g ′ (x, y) = x ij ≡ x for any i = j, we have (3.32) The O(1/N ) factor comes from the following consideration. The difference between I(g ′ ) and I(x) is easy to estimate, while the difference between t(H i , g ′ ) and t(H i , x) = x e(H i ) is caused by the zero diagonal terms x ii . We do a broad estimate of (1.13) and find that it is bounded by c i /N , where c i is a constant that only depends on H i . Putting everything together, (3.33) The rest of the proof follows.