Statistical Inference for Network Samples Using Subgraph Counts

We consider that a network is an observation, and a collection of observed networks forms a sample. In this setting, we provide methods to test whether all observations in a network sample are drawn from a speciﬁed model. We achieve this by deriving, under the null of the graphon model, the joint asymptotic properties of average subgraph counts as the number of observed networks increases but the number of nodes in each network remains ﬁnite. In doing so, we do not require that each observed network contains the same number of nodes, or is drawn from the same distribution. Our results yield joint conﬁdence regions for subgraph counts, and therefore methods for testing whether the observations in a network sample are drawn from: a speciﬁed distribution, a speciﬁed model, or from the same model as another network sample. We present simulation experiments and an illustrative example on a sample of brain networks where we ﬁnd that highly creative individuals’ brains present signiﬁcantly more short cycles.


Introduction
We show that subgraph counts are flexible and powerful statistics for inference on collections of networks.Specifically, we use subgraph counts to test the hypotheses that all networks in a sample are generated either from a given distribution, from distributions in a given model, or from the same model as that of another sample.
Our results address the inference problem raised by the following experiment [1]: The networks connecting brain regions of individuals of varied levels of creativity is observed.However, while observations can be assumed to be independent, due to the variability of the brain structure and the instability of the observation technique, they cannot be assumed to be identically distributed; for instance, they need not contain as many nodes and edges.How, while allowing for such variations, can we test for significant differences between individuals with different levels of creativity?
Formally, we consider that a network is an observation-say G i -and a collection of observed networks form a sample-say G = (G 1 , . . ., G N ).Then, our goal is to infer distributional properties of the G i -s as N grows.This parallels more classical statistical settings, where an observation is a vector-such as X i ∈ R k -and a sample is a matrix: X = (X 1 , . . ., X N ) ∈ R k×N .However, our setting strongly differs from the one where only one very large network is observed, and for which many methods already exist (see [2][3][4][5][6][7][8][9][10], to cite but a few).Surprisingly, no statistical method exists to compare samples of small networks, and currently only tools to compare two large networks are available [11,12].
Here we provide an analog of a multivariate t-test for network samples: methods to test whether a given network sample G presents averages consistent with either a specific model, or with that of another sample.The averages we use are subgraph counts; e.g., the number of or in the sample.The choice of subgraph counts as statistics is motivated by their success in comparing large networks [13,14], but also by results in random graph theory and the study of large graphs.In both fields, subgraph counts have proved to be the most powerful tool available to compare networks [15,16], and are known to have properties similar to moments of random variables [17].
Formally, we are embedding network samples into a space defined by subgraph counts, and performing comparisons in that space.While other network comparison techniques also use embeddings [11,12], using subgraph counts presents three key advantages: first, if the G i -s are generated by a blockmodel [2]-the most popular random network model to date-and for an appropriate family of subgraphs, the embedding is one-to-one.This result is known as the finite forciblity of blockmodels [17].Second, very few assumptions on each G i need to be made as N grows to obtain consistency and asymptotic normality of the image of G in the embedding space.This enables us to work under a very flexible null model.Finally, because it relies on physical properties of the G i -s-the number of , , and so on-this embedding remains interpretable.
In the remainder of this article, we first introduce subgraph counts and the graphon model.We then present, successively, the case where all the networks in the sample come from the same graphon model (but are not necessarily of the same size), and the case where each observed network may come from a different graphon.In both cases, we prove asymptotic normality of our estimator, and present representative examples showing the practical use of the result.We conclude with an application to connectomes, and a discussion.

Subgraph Counts in the Graphon Model
We now define our statistics (subgraph counts) and our null model (the graphon model).Subgraph counts are natural statistics to compare networks for two reasons.First, subgraph counts intuitively summarize a network through its fundamental building blocks.This has historically given them purchase to address hard fundamental and empirical problems [13,14,18].Second, subgraph counts present tractable analytical properties.We will describe and leverage these properties below, in a manner paralleling what is done in related literatures [4,6,18].A subgraph count is the number of copies of a given graph in another graph (see Fig. 1).Throughout, we call the subgraph-and denote F -the graph which is counted and G the larger graph in which the counting takes place.All graphs will be simple (unweighted, no self loops or multiple edges).Subgraphs are also termed motifs, pattern graphs or shapes depending on the field [13,[19][20][21].
For clarity, we define subgraph counts formally as follows (we write |F | to be the number of nodes in F , and For two graphs F and G, we call the count of F in G-and write X F (G)-the following quantity: where F is a copy of F if there exists an adjacency preserving bijection between F and F .Furthermore, for a tuple F = (F 1 , . . ., F k ) of subgraphs, we write X F (G) to be the vector With this notation, calling G a , G b and G c the graphs in Fig. 1, we have that The power of subgraph counts to study networks stems from their inherent linearity.Indeed, products of subgraph counts are but linear combinations of other subgraph counts.Intuitively, a product of two subgraph counts will involve counting pairs of copies, and therefore can be recovered by counting the number of copies of all subgraphs that can be induced by a pair of copies.More precisely, in the Appendix we show the following: Lemma 1 (Linearity of subgraph counts).For any two graphs F and F , there are factors c H and a set H F F of subgraphs-the set of subgraphs that can be obtained using one copy of each F and F as building blocks-such that for any graph G For instance, as these will be used later on, we have that in any graph G, This algebraic property of subgraph counts underpins the proofs of [6,16,18], and the subgraph counting algorithms of [20,21], among many other examples.Crucially, as opposed to cases where the model enforces linearity-such as with assumptions of Normality-it is the nature of the statistics (subgraph counts) and the system (graphs) that makes the problem linear.The linearity of subgraph counts allows us to use as null the very flexible graphon model [17].This framework subsumes most models used in the statistical literature on networks; e..g, blockmodel [2] and dot-product models [5].It has the intuitive structure of affixing to each node i a latent feature (here x i ) and of connecting nodes i and j (conditionally independently) with a probability determined by the node features (here f (x i , x j )).
Definition 2 (Graphon f and random graph G n (f )).Fix a symmetric integrable map f : [0, 1] 2 → [0, 1], and call it a graphon.We call G n (f ) the random graph distribution over graphs with n nodes such that: to each node is randomly and independently assigned a feature x i ∈ [0, 1], with x i ∼ Unif([0, 1]); and where edges form independently conditionally on {x i } i∈[n] with probability To recover a blockmodel with K blocks, it suffices to consider a partition of [0, 1] in K sets (i.e., (P 1 , . . ., P K ) ∈ P K ([0, 1])) and set f as constant over each P u × P v .The dot-product model is recovered with a graphon f of finite rank; i.e., f (x, y) = u≤K λ u f u (x)f u (y).
In the graphon framework, subgraph counts have direct interpretation as moments of f [17].Specifically, if G ∼ G n (f ), then the moments of X F (G) are moments of f .Therefore, following results similar to the Hausdorff moment problem, subgraph counts are sufficient statistics to distinguish between any two graphons [16,17].However, there are no guarantees on which subgraphs are needed to distinguish between two graphons.For blockmodels and finite rank models, we know only that a finite number is sufficient (a concept know as finite forcibility, see [17,Chapter 16.7 & Appendix 4] for more details).
Unfortunately, all known results on subgraph counts under the graphon model consider the setting where one very large graph is observed.Here we present the tools to address the problem where a sample of small graphs is observed.

The simple case: Samples from one graphon
We now present a central limit theorem as well as practical methods to build confidence regions for the subgraph counts observed in a network sample G = (G 1 , . . ., G N ).In this section, we assume that there is a graphon f such that each G i is drawn independently from G ni (f ) (where Fix F ∈ F and G ∈ G.In this setting, X F (G) is a random variable, and the first parameter to consider is its mean.To compute this mean, let F 1 , . . .F m be all the copies of F in K G (the complete graph over the nodes of G), so that using the linearity of the expectation, we have that Then, direct computations show that E1 {Fj ⊂G} does not depend on j (see Proposition A.1), and that Observe that µ F (f ) is a moment of the graphon f , as discussed above.Similar computations for higher moments, aided by Lemma 1, enable us to use the Lindeberg-Feller central limit theorem along with the Cramer-Wold device to obtain the following: Theorem 1 (Statistical properties of subgraph counts).Fix a tuple of graphs F, a graphon f and a sequence n = (n i ) i∈N such that 2 max N -consistent and asymptotically normal estimator of µ F (f ); i.e., Eμ F (G) = µ F (f ) and there exists Σ F (n, f ) such that asymptotically in N : with F F the disjoint union of F, F , and Crucial to the following is the covariance matrix Σ F (n, f )-which can be obtained by taking the limit in N in (1) for each F, F ∈ F-and which will enable the computation of confidence regions.Interestingly, its elicitation is more involved than for the study of large graphs, where only a few terms dominate.We refer to the Appendix for the proof as well as a simulation experiment.
The density estimate μF (G) is denoted by a black cross.Overlaid are the expected densities (colored dots) and the confidence ellipse (shaded area) for two alternative graphons f a and f b .The p-values obtained with the Mahalanobis distance are respectively 0.6 and 7e−15.
Theorem 1 enables testing against the null that all G i are drawn from a given graphon.To make this concrete, we consider an example in Fig. 2. There, we observe a graph sample G = (G 1 , . . ., G 300 ), and aim to compare it to two graphon models f a (in red) and f b (in blue) using Theorem 1; i.e., we assume that for i ∈ [N ], G i ∼ G ni (f ) and consider the null hypothesis H 0 : f = f a and the alternative H 1 : f = f b .We draw as a black cross μF (G) and as smaller black points the µ F (G i ).The sizes of the networks in G, the n i , are non-random but not constant.We achieve this by using the sequence of digits of π.
First, since we have specified f a and f b , we can evaluate both µ F (f a ) and µ F (f b ) and draw them on the figure (as a red and blue dot respectively).Then, since n = (n i ) i≤N is observed, we can compute Σ F (n, f a ) and Σ F (n, f b ) using Theorem 1, which allows us to compute the confidence ellipse around µ F (f a ) and µ F (f b ) (in shaded red and blue respectively).Finally, since we know the limit distribution and covariance under the null, we can use Σ F (n, f a ) and µ F (f a ) to compute a p-value using Mahalanobis distance.
In the following we consider the case where instead of testing against the null of a single graphon, we test against the null of a graphon class.

The general case: Flexible sampling design
Here we expand our results to cases where the observed networks may be generated from different graphons.Indeed, in many settings, the sampling mechanism may distort the structure of the underlying graphon; e.g., although the network connecting brain regions can be satisfactorily modeled by a blockmodel [22], the proportion of nodes of each block may be different in different experimental settings, so that each observation is drawn from a different blockmodel.
In this practically important and conceptually challenging new setting, the proof techniques developed for Theorem 1 yield the following.
Theorem 2. Fix a tuple of graphs F, a sequence of graphons f = (f i ) i∈N and a sequence of integers n = (n i ) i∈N such that 2 max Then, asymptotically in N , and for some matrix Σ * F (n, f ), we have that Therefore, even in this much more flexible setting, we can recover the barycenter of the µ F (f i ).However, the variance has now a more complex structure, and we refer to the Appendix for details.
Following the intuition of our example of brain networks, and to make the usefulness of Theorem 2 concrete, we introduce the flexible stochastic blockmodel (FSBm).
Definition 3 (FSBm and embedding shape).For a symmetric matrix B ∈ [0, 1] K×K we call D(B) the set of all possible graphons with the same block structure as B; i.e., For a tuple F of graphs, we call embedding shape the set For instance, with B ∈ [0, 1] 2×2 and F = { , }, then: The most direct way of using the FSBm is to test for all the f i being equal to any blockmodel instance in a class; i.e assume that all G i -s are drawn from a graphon f and test for the null H 0 : f ∈ D(B).This is achieved by using a composite hypothesis test, and our results allow us to produce confidence regions and p-values using the same tools as before.We present such an example in Fig. 3. There, we observe G = (G 1 , . . ., G 200 ), and consider two FSBm classes generated from B a (in red) and B b (in blue).Then, we assume that all networks in the sample are drawn from a graphon f and test for the null H 0 : f ∈ D(B a ) and the alternative H 1 : f ∈ D(B b ).We first represent µ F (G) as a black cross.Using Definition 3, we plot the embedding shapes µ F (B a ) and µ F (B b ) in solid red and blue respectively.The confidence regions (in shaded red and blue) are the union of the confidence ellipses at all points in µ F (B a ) and µ F (B b ).
A more general use of Theorem 2 is to test for all graphs in a sample being drawn from elements of a FSBm class; i.e assume that the G i -s are drawn from the f i -s and test for the null H 0 : ∀i ∈ [N ], f i ∈ D(B) for some B. As before, we face a composite null, and we may compute the confidence region and the p-value by scanning all possible sequences f .This, however, is clearly computationally intractable.Nonetheless, the form of the variance and the structure of the FSBm allows us to propose conservative confidence regions and p-values that can be efficiently computed (we fully describe the method in the Appendix).
We present an example in Fig. 4.There we observe G = (G 1 , . . ., G 10 3 ) and consider two FSBm classes generated by B a and B b .We first plot µ F (G) as a black cross.Then, using Definition 3, we plot the convex hull of the embedding shapes µ F (B a ) and µ F (B b ) (in solid red and blue respectively) wherein-by Theorem 2µ F (f ; N ) must lie.Finally, we use a method described in the Appendix to produce the confidence region around each shape (in shaded color).

Application: Are creative brains different?
We now consider a sample of brain networks G = (G 1 , . . ., G 114 ) [23].This sample was produced in two steps: first, magnetic resonance images of each 113 subjects' brains were taken; then, the networks connecting each subjects' brain regions were estimated using these images [1].All networks in the sample contain 70 nodes.Furthermore, we have available a covariate C = (c 1 , . . ., c 113 ) measuring the subjects' creativity.
To study this network sample and use the covariate C, we introduce a direct extension of our results to compare two network samples: With F = {C 2 , C 3 , C 4 }, we estimate μF (G), and plot it as a black cross.Then, we draw in solid color the convex hulls of the embedding shapes µ F (B a ) and µ F (B b ).In shaded color we draw the associated confidence regions; approximate (and conservative) p-values can be obtained by determining the confidence level at which the observation ceases to be in the confidence region.
and Σ F (n, n , f ) may be estimated at rate |G| + |G | from the samples.
Unfortunately, estimating Σ F (n, n , f ) requires counting subgraphs of order 2 max F ∈F |F |, making the procedure computationally intensive.This compels us to work with F ⊂ { , , }.Furthermore, although our estimator of Σ F (n, n , f ) is entrywise Normal, it is not Wishart as in the classical setting.Therefore, we have no guarantee of the estimate being positive definite, and cannot use the Hotelling's T -squared distribution to compute p-values.If the estimate is positive definite, we recommend ignoring the variations in Σ F (n, n , f ) and using the χ 2 |F | distribution.If the estimate fails to be positive definite, we recommend using only the marginals.
Before analyzing G using our results, we make the following test: we subsample uniformly at random and without replacement from G, yielding G 1 and G 2 such that G 1 ∪ G 2 = G, and use Corollary 1 to test for G 1 and G 2 being drawn from the same graphon f .Unless G presents characteristics that cannot be explained by our results, G 1 and G 2 should be indistinguishable, and we expect to see p-values that are uniformly distributed in [0, 1].
We perform this experiment 100 times, and obtain a sample of p-values for which we fail to reject the null of a uniform distribution using the Kolmogorov-Smirnov test (D = 0.09, p-value = 0.3).For this test we use F = { } because of the small sample (|G 1 | + |G 2 | = 113) size and a very high level of correlation; otherwise the estimated covariance matrix often failed to be positive definite.
We now use C to split G in two samples.To do so, we choose to build a first subsample G 1 containing the less creative, and a second subsample G 2 containing Quantile (q) p-value 0.5 0.126 0.110 0.115 0.4 0.077 0.051 0.050 0.3 0.062 0.042 0.040 0.2 0.014 0.011 0.012 0.1 0.046 0.047 0.061 Table 1: Testing for differences between G q 1 and G q 2 .For each F ∈ { , , } and q ∈ {0.1, 0.2, . . ., 0.5} we produce the p-value for the null H 0 : µ F (G q 1 ) = µ F (G q 2 ).The p-values increase with q, except for q = 0.1, in which case |G q 1 | and |G q 2 | are too small for the test to be significant.the more creative.More precisely, for a quantile q and denoting Q C the empirical quantile function of C: Interestingly, for q = 0.5 and q = 0.4, we fail to reject the null that the networks in G q 1 and G q 2 come from the same graphon (see Table 1 for the p-values).However, for q = 0.3 we can reject the null of the same graphon at the 5% confidence level using or , but not .Thus, we observe that individuals with a very high level of creativity present significantly more and than those with a very low level of creativity.We now aim to understand whether the added and arise from a few edges completing partially present shapes or from fully new and .To do so, we first observe that if G ∼ G n (f ), then G ∼ G n (1 − f ), where Ḡ is the complement graph of G. Therefore, we may use our tests on G, which can be understood as estimating µ F (1 − f ) instead of µ F (f ) to compare network samples.
Then, using the G q i = {G : G ∈ G q i }, we can test whether there are significantly more fully absent subgraphs in G q 1 compared to G q 2 .There, we find we cannot reject this null; i.e., we cannot reject the null of the networks in G q 1 and G q 2 coming from the same graphon for q ≥ 0.3.Therefore, we conclude that the added and in the highly creative arise from a few edges completing partially present and .

Discussion
We provide the tools to perform statistical inference on a network sample using subgraph counts.Our two main results provide consistency and asymptotic normality of subgraph counts under very flexible conditions.Using these results, we show that subgraph counts are powerful statistics to test whether network samples come from a specified distribution, a specified model, or from the same model.
The key insight we provide is that statistical inference methods paralleling classical ones for standard samples may be obtained for network samples.From this perspective, our results may be seen as providing an analog of a multivariate t-test for network samples.However, going beyond what our results directly imply, we expect that parallels to ANOVA, model selection, model ranking, and goodness of fit may be obtained for network samples using our proof techniques.
Unfortunately, practical implementation of Corollary 1, and of any other expansion on our results, would present hard computational problems.In effect, a fully automatic algorithm to estimate Σ F (f ) would need to first automatically build the set H F F and then compute the X H (G) for each H in that set.Such an algorithm would require a number of operations growing combinatorially in the number of nodes in F and F and exponentially in the total number of nodes in the sample.Thus, even if each graph in the sample is small, such methods would face hard-to-solve computational bottlenecks.Using state of the art subgraph counting methods available today allows consideration of subgraphs F and F over at most four nodes [21,24].
It follows that, although we break the statistical inference bottleneck regarding network samples, the computational challenges now become only more salient.At this point, whether there exists compromises allowing for both rigorous statistical inference and scalable computation is the fundamental open question.

APPENDIX A: Properties of subgraph counts
In the following we formalize certain notions we use loosely in the main body (especially the notion of copy and the sets H F F , as well as the constants c H ), prove all our results, and provide more details on the numerical examples we present.
We start by formally introducing the notion of graph equivalence.
Definition A.1 (Graph equivalence '≡').Fix two graphs F and F .We say that F is equivalent to-or is a copy of-F , and write F ≡ F , if there exists a bijective map φ from the vertex set of F to the vertex set of F such that ij is an edge in F if and only if φ(i)φ(j) is an edge in F .
We can now provide a more complete, but otherwise equivalent, definition of number of copies.
Definition A.2. Fix two graphs F and G.We denote X F (G) the number of non-necessarily induced subgraphs of G equivalent to F ; i.e, where F ⊂ G if both the vertex end edge sets of F are subsets of the vertex and edge sets of G respectively.
With this notation, we may prove our first lemma, establishing the linearity of subgraph counts.We first define H F1F2 and c H , generalizing definitions given in [18].Definition A.3 (Overlapping copies).For two graphs F 1 and F 2 we write H F1F2 the set of unlabeled graphs that can be formed by two copies of F 1 and F 2 , and c H the number of ways a given H ∈ H F1F2 can be built from copies of F 1 and F 2 : and Finally, call H * F1F2 the set H F1F2 removed of F 1 F 2 , the vertex disjoint union of F 1 and F 2 .
Lemma A.1 (Copies pairwise interaction).Fix three graphs F 1 , F 2 and G.Then, Proof.We start by writing: Now, from Definition A.3, we first note that by construction of H F1F2 , for each pair F 1 , F 2 in the sum, 1 {F 1 ≡F1} 1 {F 2 ≡F2} = 1 if and only if there exists H ∈ H F1F2 such that F 1 ∪ F 2 ≡ H. Therefore we can reindex the sum in (2) as follows: We now note that by definition of c H , for each copy of H in G, there will be c H pairs (F 1 , F 2 ) of copies of F 1 and F 2 in G such that F 1 ∪ F 2 = H.Therefore we can simplify the sum above to obtain: yielding the desired result.
With these tools in hand, we compute the first two moments of X F (G) when G ∼ G(n, f ).
Proof.We prove each statement in succession.To begin, call Therefore, by linearity of the expectation, we have Let us fix i ∈ [m] and consider E1 {Fi⊂G} .To do so we will use the law of total probability: Therefore, E1 {Fi⊂G} does not depend on i, and resuming from (3), we obtain which is the desired result.We now turn to the variance.Call Now, we observe that if ) then 1 {Fi⊂G} is independent from 1 {F i ⊂G} and we have that Therefore resuming from ( 5) we obtain that Then, by the exact same transformation we used in Lemma A.1, we know that each F i ∪ F i is in H * F F (because they are not disjoint), and furthermore, for each copy of H in G there are c H pairs F u and which is the desired result.
We now turn to the proofs of Theorem 1 and Theorem 2. As the second generalizes the first, it is sufficient to prove the second.
Proof.We obtain the result by a joint application of the Lindeberg-Feller central limit theorem and the Cramer-Wold device.To do so, we fix a ∈ R |F | and compute the variance of our estimator projected along a.
Computing the variance: First recall that Therefore, denoting "•" the inner product and taking the expectation over G, let Then, using the independence of the G i s and the bi-linearity of the covariance, we have To proceed, recall that for each i ≤ N , G i ∼ G(n i , f i ).Then, we may use Proposition A.1 to obtain Because all sums are finite, we can reorder the summations, leading to where Convergence of ω H (n, f ; N ) To proceed we must show that ω H (n, f ; N ) converges to a limit as N diverges.We will achieve this by showing that ω H (n, f ; N ) is Cauchy.To do so, observe that Then, we recall that X F (K G ) = |G| |F | aut(F ) (see for instance [16]), where aut(F ) is the number of automorphisms of F (the number of bijections from the vertex set of F to itself that preserve adjacency.)Then, as .
As furthermore f N +1 is bounded by 1, we have for a constant C equal to 1 + c H aut(H)/aut(F )aut(F ).Thus, the sequence ω H (n, f ; N ) is Cauchy, and we may call ω H (n, f ) its limit; i.e., lim Then, resuming from (6), and writing Σ F the matrix indexed by F such that we have lim Satisfying the Lindeberg-Feller condition To invoke the Lindeberg-Feller central limit theorem, we must show that our sequence verifies the so called Lindeberg-Feller condition.Recall that the sequence under study is and that the variance of the partial sum is N 2 s 2 N .Therefore, the Lindeberg-Feller condition we need to satisfy is the following: To verify the condition we first fix > 0.Then, observe that since for each i and the triangle inequality.Therefore, as N s N → ∞ as N grows, we may fix an N such that for all N > N we have a 1 ≤ N s N .In this setting, the sum in ( 7) is equal to zero for all N > N , and the condition is verified.Therefore, we have that 1 N s N i Y i → Normal(0, 1).
To conclude, and reverting to the notation of the statement of the Theorem, we have that for any a which is sufficient to obtain the claimed result for the limit in distribution.
Before we proceed, we consider a simulation experiment to determine how large the sample size N must be for the asymptotic limit to be a satisfactory approximation of the statistic's distribution.We present our result in Fig. 5. There, we observe that fairly small N , on the order of 100 even with small n i -s, may be sufficient.Furthermore, results on large graphs [4] suggest that using larger networks would make this convergence even faster.
We now turn to the proof of Corollary 1.
Proof.The result follows almost immediately from Theorem 1 and Slutsky's theorem.
First, since |G| and |G | tend to infinity and both n and n are large enough, we have that Then, as both samples are independent, any linear combinations will still be multivariate Gaussian.Therefore, we multiply the first line by |G |/ |G| + |G |, the second by |G|/ |G| + |G | (as both ratios are in (0, 1) the limit in distribution is unaffected), and take their difference, to obtain which is the desired limit in distribution.
To obtain the consistent estimator of Σ F (n, n , f ) we first recall that for

Network Sample Size (N)
Root Squares Error (log scale) : Assessing the quality of the asymptotic approximation.We sample 500 replicates of network samples of sizes N ranging from 100 to 3200, each of which is drawn from a random 2-block blockmodel and the sizes of the networks are fixed by the digits of π plus 8. On each sample, we evaluate the procedure presented in Fig. 2. We first consider how the average mean squared error to the mean shrinks as N increases (left plot, trend in red solid line).There, we observe rate of convergence in line with our theoretical results, and with many outliers achieving much better convergence.Then, we consider whether the Kolmogorov-Smirnov test rejects the null of the p-values across samples are uniformly distributed.There we observe that already for samples of size 100, we fail to reject the null of a uniform distribution.
Then, with There observe that: -As |G| and |G | grow we have that .
Furthermore, both µ F (f ) and µ F (f ) may be estimated in the same way.
Then, by a direct application of the Slutsky's theorem, we have that  We now describe the construction of the confidence region and p-value under the null.The first task is to compute the subgraph densities (the µ F (f B ) and µ H (f B )).To do so, we use the formulas presented in [4,28] to compute subgraph densities in the blockmodel.More specifically, they show that for a graphon blockmodel f with K blocks such that the probability of being in each block is π i , and the block matrix is B, then Practically, this is achieved by imbricated loops.
That we can compute the subgraph densities under the null directly implies that we can compute both µ F (f B ) and Σ F (f B ).Then, the construction of the confidence ellipse as well as of the p-value using the Mahalanobis distance are classical statistical inference methods which we need not describe here.
The blue region is built with f B where We now describe how we produced the embedding shape and its confidence region.First, to produce the embedding shape, we compute µ F (f ) for a a large number of elements of f ∈ D(B).Practically, we parametrize D(B) by a Kdimensional vector π, the entries of which are the sizes of each block, and call the associated graphon f π .Then, we build a grid S over the K dimensional simplex of step-size .01,and for each π ∈ S we compute µ F (f π ).To produce the confidence region, we produce the confidence ellipse for each π ∈ S which we achieve by computing Σ F (n, f π ) for each π in our grid S.
The blue region is built with f B where f B : (0, 1) We now describe how we produced the surface where µ F (f ) may live as well as its confidence region.First, we know that µ F (f ) may realize any point in the convex hull of the embedding shape.Therefore, we build the embedding shape as for Fig. 3, and then present its convex hull.Building the confidence region is achieved using the following reasoning, assuming that the diagonal of B dominates the off diagonal entries (max i∈[K] B ii ≥ max i =j B ij ): -Observe that for any graphon f and H ∈ H F F , µ H (f ) ≥ µ F (f )µ F (f ).
This can for instance be directly recovered from the formulas in Lemma A.1.Then, for any sequences n and f , we have that (Σ F (n, f )) F F ≥ 0, so that all correlations are positive, and the greatest amplitude of the confidence ellipse will be found on its first quadrant; i.e., the variance of a • µ F (G) will be maximal for some a ∈ R |F | such that a F > 0 for all F ∈ F.
-For any sequences n and f , we have that Then, for any a in the first quadrant we have that a Σ F (n, f )a ≤ a Σ † F (n, f )a.
Therefore, the maximal radius of the ellipse associated with Σ † F (n, f ) on the first quadrant is larger than that induced by Σ F (n, f ).Thus, by the previous item, the maximal radius of Σ * F (n, f ) is larger than that of Σ F (n, f ).It follows that the auxiliary sphere of the ellipse associated with Σ † F (n, f ) contains that associated with Σ F (n, f ).
-Finally, under our assumption of diagonal dominance in B, we have that max f ∈D(B) µ H (f ) is realized by the graphon such that all nodes are in the same block r with the highest probability of within-block connection; i.e., as parametrized by π, the maximum would be realized by where r is such that B rr = max i∈[K] B ii .In this final setting, we need only compute the µ H (f π * ), where f π * describes an Erdős-Rényi random graph.This makes the computation straightforward as we then have µ H (f π * ) = B k rr , where k is the number of edges in H.Following this argument, our confidence region is the union of the auxiliary spheres of the ellipse associated with Σ * F (n, f ) centered at each point in the convex hull of the embedding shape.
Finally, the blue region is built with

Figure 1 :
Figure 1: Example of subgraph counts.There are 6 copies of the edge ( ) in all three graphs.There are 2 copies of the triangle ( ) in a) and b), but 0 in c).There are 1, 0 and 3 copies of the square ( ) in a), b) and c) respectively.

Figure 2 :
Figure 2: Testing for a blockmodel usingF = {C 2 , C 3 , C 4 } = { , , }.The sample G is such that: N = 300, n i is the i-th digit of π plus 40, G i is drawn from a two block graphon f (G i ∼ G ni (f )).The density estimate μF (G) is denoted by a black cross.Overlaid are the expected densities (colored dots) and the confidence ellipse (shaded area) for two alternative graphons f a and f b .The p-values obtained with the Mahalanobis distance are respectively 0.6 and 7e−15.

Figure 3 :
Figure 3: Testing for a FSBm class.The sample G is such that: N = 200, n i is the i-th digit of π plus 30, G i is drawn from a graphon f (G i ∼ G ni (f )).With F = {C 2 , C 3 , C 4 }, we estimate μF (G), and plot it as a black cross.Then, we draw in solid color the embedding shapes µ F (B a ) and µ F (B b ).In shaded color we draw the associated confidence regions; p-values can be obtained using the Mahalanobis distance associated with the closest point to μF (G) in µ F (B a ) and µ F (B b ).

Corollary 1 (C4Figure 4 :
Figure 4: Testing for a full FSBm class.The sample G is such that:N = 10 3 , n i is the i-th digit of π plus 50, G i is drawn from a graphon f i (G i ∼ G ni (f i )).With F = {C 2 , C 3 , C 4 },we estimate μF (G), and plot it as a black cross.Then, we draw in solid color the convex hulls of the embedding shapes µ F (B a ) and µ F (B b ).In shaded color we draw the associated confidence regions; approximate (and conservative) p-values can be obtained by determining the confidence level at which the observation ceases to be in the confidence region.

Proposition A. 1 .
Fix two graphs F and F and a random graph G ∼ G(|G|, f ) such that |F | + |F | ≤ |G|.Then, we have that

1 Figure 5
Figure5: Assessing the quality of the asymptotic approximation.We sample 500 replicates of network samples of sizes N ranging from 100 to 3200, each of which is drawn from a random 2-block blockmodel and the sizes of the networks are fixed by the digits of π plus 8. On each sample, we evaluate the procedure presented in Fig.2.We first consider how the average mean squared error to the mean shrinks as N increases (left plot, trend in red solid line).There, we observe rate of convergence in line with our theoretical results, and with many outliers achieving much better convergence.Then, we consider whether the Kolmogorov-Smirnov test rejects the null of the p-values across samples are uniformly distributed.There we observe that already for samples of size 100, we fail to reject the null of a uniform distribution.