On the Number of Non-zero Elements of Joint Degree Vectors

Joint degree vectors give the number of edges between vertices of degree $i$ and degree $j$ for $1\le i\le j\le n-1$ in an $n$-vertex graph. We find lower and upper bounds for the maximum number of nonzero elements in a joint degree vector as a function of $n$. This provides an upper bound on the number of estimable parameters in the exponential random graph model with bidegree-distribution as its sufficient statistics.


Introduction
Degree sequences and degree distributions have been subjects of study in graph theory and many other fields in the past decades. In particular, in social network analysis, they have been shown to possess a great expressive power in representing and statistically modeling networks; see, e.g., Newman [11] and Handcock and Morris [9]. Generally in this context, models are in exponential family form [2,4], known as exponential random graph models (ERGMs) [8,17]. When the sufficient statistic, i.e. the only information that the ERGM gathers from an observed network, is the degree sequence of a network, the corresponding ERGM is known as the beta-model, properties of which have been extensively studied in the recent literature; see Blitzstein and Diaconis [3], Chatterjee et al. [5], and Rinaldo et al. [13]. Degree distributions have also been used as sufficient statistics; see Sadeghi and Rinaldo [14].
The bidegree distribution generalizes the degree distribution and collects the relative frequencies of the degree combinations that appear at neighbouring vertices. The non-normalized version of the bidegree distribution is called the joint degree vector (JDV, sometimes also called joint degree matrix ) [12,1,15], i.e. the elements of the JDV represent the exact counts of edges between pairs of vertices of specified degree. Conditions for a given vector to be the JDV of a graph were provided in Patrinos and Hakimi [12], Stanton and Pinar [15], and Czabarka et al. [6]. An ERGM with bidegree distribution as sufficient statistics has been formalized in Sadeghi and Rinaldo [14].
Bidegree distributions are network statistics that belong to the more general class of joint degree distributions that count degree combinations of connected sets of vertices of given size. In the computer science literature, the family of graphs with a given joint degree distribution is called dK-graphs, where d indicates the number of vertices of the concerned subgraphs [10]. The class of dK-graphs was originally formulated as a means to capture increasingly refined properties of networks in a hierarchical manner based on higher order interactions among vertex degrees (see, e.g., Dimitropoulos et al. [7]).
One important statistical problem when working with ERGMs (or other exponential families) is the question whether the maximum likelihood estimate (MLE) exists for a given set of observations (in network theory, the observations most often consist just of a single observed network). When the MLE does not exist, one or more of the model parameters cannot be estimated. As is well-known, the information when the MLE exists can be obtained from a facet description of the so called model polytope [2]. These facets correspond to linear inequalities that hold among the different components of the sufficient statistics. When such a description is known, it is also easy to understand which parameters can be estimated [16].
Even though a complete description of the model polytope is hard to compute in general, it is often possible to obtain a subset of the valid inequalities. Such partial information about the model polytope gives partial information about MLE existence and parameter behaviour [16]. For example, the bidegree counts (and frequencies) are always non-negative. Sadeghi and Rinaldo [14] exploited these facts to show that the MLE never exists for a single observed network in the case of bidegree distribution as sufficient statistics. More importantly, parameters corresponding to zeros on the bidegree vector of the observed network are not estimable.
These results motivates us to find the maximum possible number of non-zero elements on the bidegree vector of a graph. This maximum number also tells us about the maximum number of estimable parameters. In this paper, we prove that, asymptotically for large n, the maximum number of non-zero elements lies between 0.5 n 2 and 13 24 n 2 ≈ 0.5416 n 2 . Thus, roughly half of the components are zero, and so, at most half of the parameters are estimable from a single observation.
In the next section, we provide basic graph theoretical as well as statistical definitions and preliminary results. In Section 3, we provide a lower bound for the maximum possible number of non-zero elements of a JDV by constructing a family of graphs that reaches this bound. In Section 4, we exploit conditions from Czabarka et al. [6] and use two different approaches to obtain upper bounds for this desired value. The first upper bound is presented in Theorem 1 and the second bound in Theorem 2. As shall be seen, the numerical values for the two bounds are very close.

Joint degree vectors
In this paper we only consider simple graphs without isolated vertices. Let G = (V, E) be such an n-vertex graph and for 1 ≤ i ≤ n − 1 let V i be the set of vertices of degree i. The joint degree vector (JDV) of G is the vector s(G) = (j 11 (G), j 12 (G), . . . , j n−1,n−1 (G)) of length n 2 with components defined by j ik = |{xy ∈ E(G) : x ∈ V i , y ∈ V k }| for all 1 ≤ i ≤ k ≤ n − 1. For some vector m, if there exists a graph G with s(G) = m, then m is called a graphical JDV. Note that the degree sequence of a graph is determined by its JDV in that The following characterization for a vector m with integer entries to be a graphical JDV is proved by Patrinos and Hakimi [12], Stanton and Pinar [15], and Czabarka et al. [6]. As it provides simple necesssary and sufficient conditions for a vector to be realized as a graphical JDV, we call the result an Erdös-Gallai type theorem.
Moreover, n i gives the number of vertices of degree i in the graph G.

Exponential random graph models
An exponential random graph model (ERGM) is a family of random graphs, parametrized by finitely many parameters θ i , i ∈ I. All random graphs have the same (finite) set of vertices, denoted by V . Under this model, the probability of observing a network G with vertex set V can be written as where t i (G) are canonical sufficient statistics, which capture some important feature of G, and ψ(θ) is the normalizing constant, which ensures that probabilities add to 1 when summing over all possible networks. The model is in exponential family form. Hence, the likelihood function l(θ) = P (G 1 , . . . , G m ), for generic observed networks G 1 , . . . , G m , is concave and, therefore, has a unique maximum if it exists. Existence of this maximum can be described geometrically: Suppose that the networks G 1 , . . . , G m were observed. The average observed We also define the model polytope to be the convex hull of all the points in a d-dimensional space that correspond to the sufficient statistics of all graphs with n vertices. We then have the following result [2,4]: For an ERGM, the MLE exists if and only if the average observed sufficient statistict lies in the (relative) interior of the model polytope.
In network analysis, there is usually only one network G observed, and therefore, the average observed sufficient statistic is simply t(G).
In the so-called 2K-model, the sufficient statistic t(G) in (1) is the JDV s(G). As shown in [14], if s i (G) = 0, then θ i is not estimable. It is also easy to observe that for every graph, there are always some elements of the bidegree vector that are zero. In the next sections, we investigate how many elements of the bidegree vector are always zero.

Lower bound construction
Let H n denote an n-vertex graph with vertex set V (H n ) = {v 1 , v 2 , . . . , v n } and edge set E(H n ) = {v i v j : i + j > n and i = j}. This graph, which is known as the half graph, has degree sequence n − 1, n − 2, . . . , n 2 , n 2 , . . . , 2, 1. Since a graph on n vertices cannot contain both vertices with degrees 0 and n − 1, the half graph attains the maximum number of distinct degrees.
For any graph G, let be the set of non-zero components in the JDV of G. Clearly, so about half the elements of the JDV of the half graph are non-zero. The half graphs are not optimal, and there are constructions which achieve a higher number of non-zero elements in the JDV. Consider the graph H n with n ≥ 7 odd. If one connects the degree 1 vertex to one of the vertices with degree (n − 1)/2, the JDV element j 1,n−1 becomes 0, but the elements j 2,(n+1)/2 and j (n+1)/2,(n+1)/2 become nonzero, so the new graph has one more nonzero elements in its JDV. We found even better such constructions, but all of these only improve |A(H n )| by a term that is linear in n.

Two upper bounds
In this section, we provide two upper bounds that provide numerically very close upper bounds, but use entirely different methods. Although we tried, we were unable to combine these two proof techniques. We think that it is instructive to show both of them.

Continuous optimization
The following identity is a simple consequence of Proposition 1 and is due to Sadeghi and Rinaldo [14]: where n 0 (G) is the number of isolated vertices in G.
To see this, by Proposition 1(i) we have Next, we show that we can assume that n 0 (G) = 0 without loss of generality. Consider a graph G with n 0 (G) > 0. If n 0 (G) = 1, then let v ∈ V (G) be a largest degree vertex in G and x be the isolated vertex, and let G ′ be the graph obtained by adding the edge xv to the graph G. If n 0 (G) > 1 then let G ′ be the graph obtained by adding the edges between the isolated vertices of G. In both cases G ′ is a graph on the same number of vertices as G that has at least as many nonzero entries in its JDV as G does. Thus, there are graphs without isolated vertices that have the maximum number of nonzero entries in their JDV.
The original problem of finding the maximum possible number of non-zero elements of a JDV for a fixed number of vertices can be formulated as the following optimization problem: • Maximize |A(G)| among all graphs G with n vertices.
Using the corollary, we relax this optimization problem and study the following problem, which we will refer to as the discrete relaxation (as ultimately we will solve its continuous version): • Maximize the cardinality |A| among all subsets A ⊆ P n := {(i, j) ∈ N 2 : By the above corollary, for any n, the cardinality of a subset that solves the discrete relaxation is an upper bound for the original optimization problem. The discrete relaxation can be solved on a computer as follows: First, compute all values (k 1 + k 2 )/(k 1 k 2 ) on P n . Second, order the values. Third, start adding them up as long as the sum does not exceed n. Finally, count the number of elements that have been added. Let α n be the cardinality of a solution A of the discrete relaxation divided by are plotted in Figure 1. As a function of n, the optimum α n decreases roughly (though not strictly) and reaches values below 0.56 for large n.
The limit for n → ∞ can be computed by approximating the discrete relaxation by the following optimization problem, which we call the continuous relaxation: Let α ′ n be the maximum of the continuous relaxation.
Lemma 1. α n ≤ n−1 n α ′ n + 1 n . Proof. To each (i, j) ∈ P n associate the two squares A i,j : Here, the first inequality follows from the fact that the maximum of x+y xy = 1 x + 1 y over A i,j is at (x, y) = (i, j). The second inequality follows by not doublecounting the set A d := (i,i)∈A A i,i corresponding to the diagonal elements of A. The last equality follows since x+y x·y = 1 x + 1 y and since A ′ is symmetric about y = x. Therefore, if A is feasible for the discrete relaxation, then A ′ is feasible solution for the continuous relaxation. Now, and so Corollary 2. lim sup n→∞ α n ≤ lim sup n→∞ α ′ n . It is not difficult to see that, actually, lim n→∞ α n = lim n→∞ α ′ n . Figure 1 shows that the upper bound from Lemma 1 is not very tight for finite n.
Next, we solve the continuous relaxation. The idea is the following: As the set A ′ it is advantageous to choose a sublevel set of the function x+y x·y . For c > 0 let Proof. If x < x 1 (c) and 1 ≤ y ≤ n, then 1 x + 1 y > c − 1 n + 1 n = c. If x 1 (c) ≤ x ≤ n and 1 ≤ y < y c (x), then 1 x + 1 y > c − 1  Proof. y c (x) decreases monotonically with x. Therefore, y c (x) ≥ y c (n) = x 1 (c) for all x ∈ [1, n]. Proof. Assume that c is such that x 1 (c) ≥ 1. Then and so Hence, A c is feasible if and only if (nc − 2) log(nc − 1) ≤ nc.
Now suppose that n > e. If c satisfies (3), then Thus, the above calculation is valid and shows that A c is feasible. On the other hand, if n > e and if c violates (3), then A c is not feasible.
To find the solution of the continuous relaxation, we need to find the value of c that solves (3) with equality. Consider the equation Both the left and the right hand side change sign at β = 2. For β > 2, both sides are positive, and for β < 2 they are negative. By Lemma 2, we are looking for a solution larger than 2. For β > 2, the right hand side is decreasing, while the left hand side is increasing. It follows that there is a unique solution β 0 > 2. Numerically, β 0 ≈ 5.68050. Thus, A c is feasible if and only if c ≤ β 0 /n, and in order to maximize |A c |, we have to choose c = β 0 /n. Lemma 5. x 1 (β 0 /n) > 1 for n large enough.
It remains to compute the maximum value of the continuous relaxation and to put everything together. Theorem 1. For any graph G with n vertices, where A(G) is the set of non-zero elements in the JDV of G.

Second Bound
Let G = (V, E) be an n-vertex graph and let A(G) be the set of non-zero elements in the JDV of G, as defined as in Section 4.1. Denote by n i = |V i | the number of vertices with degree i. We call i a single if n i = 1 and multiple if n i ≥ 2, noting that some i are neither single nor multiple, as they just do not occur as degrees. As before, for 1 ≤ i ≤ k ≤ n − 1, let j ik be the number of edges between the ith and kth degree classes and χ ik = 1 if j ik > 0, and 0 otherwise. It is easy to see that |A(G)| = n−1 i=1 n−1 k=i χ ik . Now we set D i = i k=1 χ ki + n−1 k=i+1 χ ik and B(G) = n−1 i=1 D i . Note that for k = i, D i counts χ ki = χ ik twice but χ ii is counted only once, so we get |A(G)| ≤ B(G)+n−1 2 and therefore We use this to prove the following theorem: Proof. Observe that D i ≤ m ≤ mi and D i ≤ in i , and hence since the minimum of two elements is less than their average. Note that if i is single we have D i ≤ min(m, i).
Employing (4) and (5) we get that where the last inequality follows from Cauchy-Schwarz.
We wish to upper bound the term from (6) over all graphs G. From our lower bound construction we know that |A(G)| ≥ (1 − o(1)) 1 2 n 2 . So we may assume that m > n/ √ 2, else we would have |A(G)| ≤ m 2 ≤ n 2 /2 and our estimation of |A(G)| would be complete.  Proof. We wish to upper bound i:ni>0 min(m, i) over all graphs. So assume the m highest possible degrees occur in our graph: n − 1, n − 2, . . . , n − m + 1, n − m.
Our assumption m > n/ √ 2 implies m > n − m + 1, so the value of m has to appear in the list of degrees above. There are n − 1 − m terms strictly larger than m in this list and each contributes min Now if the m highest degrees do not occur in our graph, then some degree less than n − m + 1 must occur which clearly gives something smaller than the term in (7).
Recall from the beginning of the section that a degree i is single if n i = 1, that is, there is only one vertex of degree i. Let s be the number of degrees i that are singles. Observe that s ≤ m and s + 2(m − s) ≤ n, implying that s ≤ m ≤ n+s 2 . Using s + i:i multiple n i = n and substituting