Connectedness matters: construction and exact random sampling of connected networks

We describe a new method for the random sampling of connected networks with a specified degree sequence. We consider both the case of simple graphs and that of loopless multigraphs. The constraints of fixed degrees and of connectedness are two of the most commonly needed ones when constructing null models for the practical analysis of physical or biological networks. Yet handling these constraints, let alone combining them, is non-trivial. Our method builds on a recently introduced novel sampling approach that constructs graphs with given degrees independently (unlike edge-switching Markov chain Monte Carlo methods) and efficiently (unlike the configuration model), and extends it to incorporate the constraint of connectedness. Additionally, we present a simple and elegant algorithm for directly constructing a single connected realization of a degree sequence, either as a simple graph or a multigraph. Finally, we demonstrate our sampling method on a realistic scale-free example, as well as on degree sequences of connected real-world networks, and show that enforcing connectedness can significantly alter the properties of sampled networks.


Introduction
From the active scaffolding of actomyosin in the cell's cortex to the underlying gene expression machinery that regulates it, from the neighbourhood interactions of grains in a sand pile to those of the engineered struts and cables in a suspension bridge, and from the flow of virtual traffic on the internet to, critically in the time of COVID-19, the web of contacts that allow the spread of viral contagion, network structures underlie the vast majority of sufficiently complex real-world systems. Unsurprisingly, then, a great deal of focus has been placed on the furtherance of our understanding of how these network structures affect and ultimately determine the physical, biological, and social phenomena that play out over them. Indeed, the explosive growth of the fields of network science and complexity science in the last two decades is a direct consequence of this focus.
As is to be expected in such a young field, however, there remain fundamental challenges. One such challenge is the surprizing difficulty of translating the simple concept of the null hypothesis into a network setting. Done directly, such a translation would read: 'there is no relationship between the network structure or properties and the observed or measured phenomena of interest'. But of course one cannot simply compare the case of phenomena potentially arising from some specific network structure with a case of no network at all, forcing one to conclude that the correct operational statement of the null hypothesis in the complex network milieu must be: 'there would be no difference in the observed phenomena or measured output if the specific underlying network were to be replaced by a generic network'. And herein lies the rub. What is a generic network? Surely one can demand that the generic network-or ensemble of generic networks-satisfy some small set of constraints in order to ensure relevance to the biology, physics, or social dynamics under consideration. For example, in epidemiological viral spreading models it would be of no use to consider a heavily disconnected network with many small individual components to be among the generic networks. In fact, it is the unfortunate state of affairs that this simple issue is so tricky that many network and complexity science results are reported and accepted without reference to a test of the null hypothesis! But before we can fruitfully return to the question of membership among the relevant generic networks, we must first briefly discuss the problem of sampling from constrained ensembles of networks.
Indeed, the so-called random graph models are among the most powerful tools of network science. Essentially, a random graph model is simply a probability distribution defined over a set G of graphs, also referred to as a graph ensemble. Often, such models are defined through an explicit stochastic graph construction process: the Watts-Strogatz model [1] and the preferential attachment model [2,3] are some well-known examples. Usually, such constructive models are introduced and studied because the graphs they produce have some interesting or relevant property: the Watts-Strogatz model can produce graphs with the 'small-world' property, which is famously present in social networks. The preferential attachment model can produce 'scalefree' graphs, i.e. graphs with a power-law degree distribution, a much-studied property which many real-world networks possess [4,5]. However, not all scale-free networks can be produced by the preferential attachment mechanism, and one cannot make general statements about all scale-free networks based only on those generated by a preferential attachment model. Therefore, for some purposes, it is useful to define random graph models not through a construction process, but by directly imposing a property of interest. The simplest way to define such a model (i.e. distribution) is to constrain its support G to include only those graphs that possess a given property, and assign the same probability to all elements of G. The graph ensemble obtained this way represents the property of interest the best. A related approach constrains the averages of some numerical graph properties and defines the distribution over G to be the one with maximal entropy, which leads to exponential random graph models [6][7][8].
Returning to the challenge of rendering the null hypothesis in a network setting, constraint-based random graph models are particularly useful as null models. Null models are used to determine if an interesting observed feature of some empirical network can be explained by another simpler feature. The simpler feature is used as a constraint to define a random graph model, which is then compared to the empirical data. Another application is dealing with incomplete empirical data. Sometimes, it is not possible to fully map the connections of a real-world network, either due to practical limitations or, in the case of networks of people, due to privacy concerns [9]. In such cases, the known data can be incorporated as a constraint into a random graph model, and individual networks sampled from the model can be used as proxies for the (unknown) real network. Both applications require being able to computationally generate samples from the model. In the case of constraintbased models this means restricting the set of graphs G to only those that satisfy the constraint, then performing uniform sampling. This is usually a difficult problem, as there are no general sampling methods that work with arbitrary constraints. Each constraint requires developing a sampling algorithm specific to it, and combining multiple constraints is a significant additional challenge.
In this paper we consider the problem of sampling connected graphs with a given degree sequence. Constraining the degrees has countless practical applications: it is a frequently used null model, for example when finding network motifs [10], detecting a so-called 'rich-club structure' [11] or analysing the assortative structure of networks [12]. Degree-constrained random graphs are also useful as proxies when only the degrees of an empirical network are known, such as in the case of the famous web of sexual connections dataset [9]. The constraint of connectedness is a frequent additional requirement: many real-world networks, such as vasculature, brain networks, or molecules (networks of atoms), are always connected. Commonly used network measures like closeness centrality are only meaningful for connected graphs. Processes such as epidemic spreading must be modelled on connected networks. In this work we present a novel method to handle these two constraints, degrees and connectedness, simultaneously.
The article is organized as follows: section 2 introduces the mathematical background used in later parts of the article, and reviews existing sampling methods for graphs with constrained degrees. Section 3 presents a new and simple algorithm to construct a single connected graph with given degrees. Section 4 presents a recently introduced family of importance sampling methods for graphs with constrained degrees, and shows how to incorporate the additional constraint of connectedness. Finally, section 5 demonstrates the practical applicability of the method on both synthetic and real-world examples.

Mathematical foundations
In this section we introduce the concepts, terminology and notations used in the rest of the work. We say that a graph is simple if it has no multi-edges or self-loops, i.e. if any two vertices have at most one connection between them, and no vertex is connected to itself. The degree d of a vertex is the number of connections it has. The degree sequence of a graph on n vertices is simply the collection of its vertex degrees, If the degree sequence of a graph G is d = (d 1 , d 2 , . . . , d n ), we say that the graph G realizes the degree sequence d.
Since each edge in a graph connects to a vertex at both of its endpoints, the sum of the degrees in a graph is twice the number of its edges, an even number. This statement is commonly known as the handshaking lemma. But not every even-sum sequence of integers is realizable as a simple graph. For example, d = (3, 2, 1) can only be realized by either a graph that includes self-loops, , or one that includes multi-edges, .

Definition 1 (graphicality).
A degree sequence is said to be graphical if there is a simple graph that realizes it.
The well-known Erdős-Gallai theorem provides a direct way to check if a degree sequence is graphical. [13]). Let d 1 d 2 · · · d n be a degree sequence. There is a simple graph that realizes this degree sequence if and only if n i=1 d i is even and

Theorem 1 (Erdős and Gallai
for every 1 k n.
Tripathi and Vijay have shown that it is sufficient to check these inequalities for those k where d k = d k+1 and for k = n [14]. Using this stricter version of the theorem, it is possible to perform the checks in linear computational time. Király [15] and Cloteaux [16] describe two such linear-time algorithms for testing graphicality.

Definition 2 (multigraphicality).
A degree sequence is said to be multigraphical if there is a graph, potentially containing multi-edges, but no self-loops, that realizes it. We refer to such a graph as a loopless multigraph.
where d max denotes the largest degree in d.
The proof of theorem 2 is given in appendix A.1. Not every graphical or multigraphical degree sequence has a connected realization. For example, the degree sequence (1, 1, 1, 1) is only realized by the non-connected graph Definition 3 (potential connectedness). A degree sequence is said to be potentially connected if it has a realization that is connected.
The concept of potential connectedness also applies to degree sequences which only have non-simple realizations. However, it can be shown that all potentially connected sequences that are graphical have connected realizations that are also simple.
In this paper we consider so-called labelled graphs, i.e. we consider the vertices to be distinguishable. Thus, the degree sequence (d 1 , d 2 , d 3 , d 4 ) = (1, 2, 2, 1) is taken to have two isomorphic but distinct realizations as and

Approaches to sampling graphs with a given degree sequence
There are two widely used approaches to uniformly sampling simple labelled graphs with a prescribed degree sequence: (1) 'stub-matching' algorithms such as the configuration model and (2) Markov chain Monte Carlo sampling based on degree-preserving edge switches. We briefly review both families of methods, and consider how the additional constraint of connectedness can be incorporated into them. The configuration model, also called the pairing model, is probably the simplest and most widely known approach to generating random graphs with a given degree sequence. The sampling algorithm proceeds as follows: let us consider each vertex with as many unconnected stubs as its degree, as shown in figure 1. Then repeatedly pick two not-yet-connected stubs uniformly at random and connect them, until there are no unconnected stubs left. This algorithm may clearly produce graphs that are not simple (i.e. they have multi-edges or self-loops). Such graphs are simply rejected, and the generation procedure is restarted.  The configuration model's algorithm produces each simple realization of the degree sequence with the same probability (although the same is not true for non-simple ones) [17]. Therefore, by rejecting the nonsimple outcomes, the simple realizations can be sampled uniformly. It is important to note that if the outcome is non-simple, the generation procedure must be restarted from the beginning. It is not sufficient to merely reject any stub pairing that creates a non-simple edge and choose another one instead. Doing so would no longer produce each realization with the same probability, as is shown in section 4.
The configuration model works well for sparse graphs that have small degrees. However, as the graph gets denser, the probability that the algorithm generates a non-simple graph, which must be rejected, increases quickly. For dense graphs, the rejection rate becomes too high for this sampling method to be computationally feasible. The same is true for degree sequences of sparse graphs that have a few very high degree vertices, such as scale-free and other heavy tail degree distributions, which are commonly observed in real-world networks [4,5]. Therefore, the configuration model is only practical in some limited situations.
The constraint of connectedness can be incorporated trivially into the configuration model: simply reject any non-connected outcomes along with the non-simple ones. However, usually, most realizations of a sparse degree sequence are not connected, increasing the rejection rate further. This makes the connected version of the configuration model unfeasible for sparse graphs as well.
Edge-switching Markov chain Monte Carlo (MCMC) methods work by first building a single realization of the degree sequence, then repeatedly modifying the graph using random, degree sequence preserving edge switches like those shown in figure 2. It can be shown that even though not all pairs of edges can be switched without creating a non-simple graph, all simple realizations of a degree sequence can be reached with permissible edge switches. Consequently, a Markov chain constructed using edge switches is irreducible. It can be shown that if the edges to be switched are chosen uniformly at random, and the switch is simply not performed when it would create a multi-edge, then the stationary distribution of the Markov chain will be uniform. Details are given in appendix B. Sampling can be performed as usual with MCMC, by recording states from the chain at certain intervals.
Incorporating the connectedness constraint into such a sampler is more involved than in the case of the configuration model. The Markov chain is still irreducible if edge switches that would disconnect the graph are forbidden [18]. However, testing whether an edge switch disconnects the graph takes computational time proportional to the size of the graph. Performing this test after every edge switch would make the method impractically slow. While there are published algorithms that make use of information from previous connectedness tests to achieve an average polylogarithmic complexity when a series of incremental changes are made to the graph [19][20][21], these algorithms are complicated and their implementation is involved. It is unclear if they would perform sufficiently well in practice. We are not aware of any MCMC-based graph sampler implementation that makes use of them. More practical approaches perform multiple edge switches between connectedness checks [22,23]. Frequent connectedness checks would result in bad computational performance, while an insufficient number of checks makes it more likely that the graph becomes disconnected, and therefore the last few edge switches must be reverted. These methods use heuristics to find an optimal number of switches to perform between connectedness checks, and maximize performance.
An alternative approach uses only local edge switches performed between pairs of edges that are connected by a third edge. These restricted switches, called edge flips, preserve connectedness. However, the flip Markov chain is irreducible only in certain special cases, such as for regular graphs or when all realizations of the degree sequence have a diameter of at least 4 [24][25][26]. Furthermore, it requires performing a larger number of moves to randomize the graph than the other MCMC approaches.
The disadvantage of MCMC-based methods is that the mixing time of these Markov chains is not known in general [27,28]. Therefore, one must use heuristics to determine how many switches to perform between recording samples to ensure that the successive samples will be sufficiently statistically independent. In this sense, these algorithms are not exact.

Building a single connected realization of a degree sequence
In this section we present a new simple and elegant algorithm to build a connected realization of a degree sequence, if one exists. Constructing such a graph is the first step of any edge-switching MCMC sampling algorithm. We will show two versions of the construction process: to build either a simple graph, or a loopless multigraph.
Let us first consider constructing an arbitrary, not-necessarily-connected simple realization of a degree sequence. The Erdős-Gallai theorem provides a fast way to check whether a degree sequence is graphical, but not to construct a corresponding graph. To build a realization of the degree sequence, we can use the well-known Havel-Hakimi theorem.

Theorem 3 (Havel and Hakimi [29, 30]). The degree sequence
This theorem can be understood as an algorithm to construct a simple graph: as with the configuration model, we consider the vertices of the graph with as many stubs as their degrees (figure 1). In each step of the algorithm, we select an arbitrary vertex (the 'hub'), and connect all of its stubs to the other vertices that have the most unconnected stubs left (highest remaining degree). The hub is then dropped from the degree sequence, along with any other vertices that have no remaining stubs. This step is repeated until no more degrees remain, or until no stubs can be connected without forming a non-simple graph. The theorem states that a degree sequence is graphical if and only if after performing a single step of the algorithm on it, the remaining degree sequence formed by the yet-unconnected stubs is also graphical. Thus, the algorithm will succeed in connecting up all the stubs if and only if the original degree sequence was graphical to begin with. This provides a way to both check the graphicality of a degree sequence and to build one of its realizations at the same time.
The Havel-Hakimi algorithm can construct a realization of a degree sequence, but how can we construct a connected realization? Previously, this has been done by first constructing an arbitrary, not necessarily connected realization, then using appropriately chosen edge switches (figure 2) to connect together the components of the graph [23]. This method is complicated and cumbersome to implement. Here we propose a simple and elegant alternative.
Note that the Havel-Hakimi theorem does not specify which vertex to choose as the hub in each step: any of them will do. Let us refer to choosing the vertex with the smallest remaining degree as an 'HH * -step'.

Theorem 4 (connected Havel-Hakimi).
Given a graphical degree sequence, the smallest-first Havel-Hakimi algorithm (i.e. consisting of HH * -steps) will produce a connected graph if and only if the starting degree sequence was potentially connected.
Proof. The key to the proof is to show that if the starting degree sequence is potentially connected, then every HH * -step reduces the number of vertices having non-zero remaining degree precisely by one, except in the very last step, when two vertices with remaining degree 1 each are connected to each other to complete the graph. Reversing the order of the steps would then correspond to building a graph by adding one vertex at a time and connecting it to some existing vertices. This clearly results in a connected graph.
Let us think about what kind of degree sequence we must apply an HH * -step to in order to reduce the number of vertices by more than one. The hub vertex is always removed. Additional vertices will only be removed if they only had one remaining stub (i.e. they had degree 1), which was then connected up to the hub vertex. Since we always choose a smallest-degree vertex as the hub, and connect it to the other vertices with the highest degrees, this situation is only possible when both the smallest and largest degree is 1. For example, the degree sequence (1, 1, 1, 1) is transformed to (1, 1) after one HH * -step, i.e. it decreases in size by 2. Except for (1, 1), such degree sequences consisting solely of 1 s are not potentially connected. Thus, we have established that an HH * -step removes precisely one vertex from any potentially connected degree sequence of length greater than two.
In the following, we will show that one HH * -step transforms any potentially connected degree sequence into another potentially connected one. Therefore, vertices are removed one at a time throughout the HH * construction procedure, up to the very last step when two degree-1 vertices are connected to each other. This will complete the proof of the theorem.
Note that with an arbitrary graph construction process, it is not necessary to maintain the potential connectedness of intermediate degree sequences in order to arrive to a connected graph. Maintaining potential connectedness at intermediate stages is a sufficient, but not a necessary condition for obtaining a connected graph. To show that the remaining degree sequences stay potentially connected throughout the HH * construction process, we invoke the following lemma: The proof is given in appendix A.2.
Will the inequality required for potential connectedness in lemma 5 stay valid after modifying the degree sequence with an HH * -step? The right-hand side will decrease by 1 from n − 1 to n − 2. If the selected hub vertex had degree 1, then the left-hand side also decreases by 1, thus the inequality is maintained.
If the hub vertex had degree Δ 2, then the sum of degrees is at least nΔ, n i=1 d i nΔ. After one HH * step, the sum of degrees decreases by 2Δ, thus we only need to show that nΔ/2 − Δ (n − 2), which is obviously true for Δ 2. The inequality is maintained again.
Let us now consider the case of loopless multigraphs, which may be constructed with a procedure analogous to the Havel-Hakimi algorithm.

and only if after connecting vertex 1 to vertex 2 with a single edge, the remaining degree sequence
In simpler terms, in order to construct a loopless multigraph, we may simply select an arbitrary vertex and connect it to a largest-degree one among the other vertices. Repeating this step results in a loopless multigraph if and only if the starting degree sequence was multigraphical. Unlike in the case of the Havel-Hakimi theorem, connections are made one edge at a time.
Proof. Clearly, if d is multigraphical, then so is d. Thus we need only show that the multigraphicality condition of theorem 2, 1 2 i d i d max , is maintained after adding a connection between a maximal degree vertex and another vertex. Adding one connection decreases the left-hand side of the inequality by 1. For the right-hand side, there are two cases: (1) if only one vertex had maximal degree, or if precisely two vertices had maximal degree and they were connected to each other, then the right-hand side (i.e. d max ) also decreases by 1, and the inequality is maintained. (2) If there is more than one maximal degree vertex and the connection was made between a maximal degree and a non-maximal-degree vertex, then d max does not decrease. However, in this case, the sum of degrees in d includes d max twice, and at least one more positive term due to the non-maximaldegree vertex. Therefore, i d i > 2d max ⇔ i d i 2(d max + 1), so decreasing the left-hand side by 1 will not violate the inequality.
We can also formulate the analogue of the theorem 4 for the loopless multigraph case: Theorem 7 (connected loopless multigraph construction). Let d be a multigraphical degree sequence, and let us repeatedly select the largest remaining degree vertex and the smallest non-zero remaining degree vertex, and connect them with a single edge. This construction procedure results in a connected graph if and only if d was potentially connected.
Proof. The proof is completely analogous to that of theorem 4, and proceeds in three steps: (1) we will show that after applying a single step of the construction process, the remaining degree sequence stays potentially connected. (2) Therefore, when applying a single step of the construction process to a potentially connected degree sequence, the number of non-zero remaining degrees decreases by no more than one, except in the very last step. (3) Consequently, reversing the order of steps constructs a connected graph.
To show that a single construction step keeps the degree sequence potentially connected, we must prove that the condition of lemma 5, 1  There is a simple intuition behind the statements of theorems 4 and 7. If we were to always choose the highest degree vertex as the hub, and connect it to other highest-degree vertices, it would quickly use up the available stubs. There would be insufficient stubs left towards the end of the procedure to connect all components together. Indeed, choosing highest-degree vertices as the hub tends to create graphs with multiple dense components (see figure 3). Conversely, choosing smallest-degree vertices as the hub and connecting them to highest-degree vertices leaves free stubs available. The same intuition raises the question: does the largest-first variant of the algorithm always build a non-connected realization, if one exists? The answer turns out to be no. A counterexample is d = (3, 2, 2, 2, 2, 2, 1), which can be split into two graphical degree sequences (3, 2, 2, 1) and (2, 2, 2), therefore it has a non-connected realization. Yet the largest-first Havel-Hakimi algorithm can only construct a connected one as it must connect the vertex of degree 3 to three degree-2 vertices. To the best of our knowledge, finding the computational complexity of deciding whether a degree sequence has a nonconnected realization as a simple graph, i.e. whether it is forcibly connected, is still an open problem. We are not aware of any polynomial-time solutions. An exponential time algorithm was given by Wang [31].
We have contributed an implementation of the construction algorithms for connected simple graphs and connected loopless multigraphs to the igraph C library [32] as igraph_realize_degree_ sequence(), and made it conveniently accessible through igraph's Mathematica interface, IGraph/M [33], as the IGRealizeDegreeSequence function. In python-igraph it will be available as the hh method of Graph.Degree_Sequence.

An exact biased sampling method
Recently, a new family of stub-matching sampling methods was proposed [34][35][36][37][38], which construct each sample directly and independently (unlike edge-switching MCMC methods) and work efficiently in polynomial time (unlike the configuration model). These algorithms do not sample uniformly, but they can compute the exact probability of obtaining a sample at the same time as generating that sample. This makes it possible to 'unbias' the samples and estimate any property that characterizes the entire set of realizations of a degree sequence, such as the averages of various graph metrics, similarly to how one might do with uniform sampling. Let S = {G 1 , G 2 , . . . , G K } be the set of generated samples, and let c(G) denote some numerical property of the graph G, such as its diameter, assortativity, clustering coefficient, etc. If the sampling is uniform, we can estimate the average of c over all realizations as If the sampling is biased, i.e. some graphs are generated with a higher probability p(G) than others, then we can re-weight them with 1/p(G) to estimate c as The same formula can be used if we do not have normalized probability values, but merely sampling weights w(G) ∼ p(G) which are proportional to the probabilities. This is the same principle as the one used in importance sampling.
To illustrate how this class of sampling methods works, let us consider the configuration model again, which pairs the stubs randomly. Along the same lines, we can exhaustively generate all realizations of a degree sequence by connecting up the stubs in all possible ways. This procedure can be thought of as a tree of decisions, like the one shown in figure 4: if there are k = i d i stubs in total, there will be k − 1 choices for connecting up the first stub. This is represented by the k − 1 branches of the tree starting from its root. In the next step (corresponding to the next level of the tree), there will be k − 3 choices, then k − 5, and so on. The leaves of the decision tree represent the fully constructed graphs.
The configuration model's algorithm can be thought of as traversing the decision tree randomly, starting at its root, choosing branches uniformly at random at each branching point, and finally arriving at a leaf. This decision tree is symmetric: all tree nodes i steps away from the root (i.e. at level i of the tree) have the same number of branches, k − (2i + 1). Therefore, each leaf is reached with the same probability p = 1 k−1 × 1 k−3 × · · · = 1 (k−1)!! , where n!! = n(n − 2)(n − 4) · · · denotes the double factorial. While each labelled graph appears as more than one tree leaf, all simple realizations appear the same number of times, with multiplicity i (d i !). This explains why the configuration model samples uniformly if non-simple outcomes are rejected. If we admit loopless multigraph outcomes as well, then the number of leaves that a graph appears as decreases by a factor of i<j (a ij !), where a ij denotes the number of edges between vertices i and j [17].
The part of the decision tree that leads to simple graphs is highlighted in red in figure 4. The core idea behind this new class of sampling methods is to traverse only this feasible subtree. The feasible subtree is not, in general, symmetric, therefore its leaves will not be sampled uniformly. However, the inverse sampling weight of a leaf can be computed by multiplying the number of feasible branches at each branching point on the path going from the tree root to the leaf. If not all graphs appear as the same number of leaves (as is the case with multigraphs), then the sampling weights used in equation (4) must be divided by the appropriate multiplicity. Through this approach, it is straightforward to take any algorithm that systematically generates all realizations of a degree sequence, and convert it into a random sampling algorithm. Instead of traversing all branches in its decision tree, simply pick a random branch to follow at every step. In order for such an algorithm to be efficient and practical, the following requirements must be met: (1) the multiplicity of each graph, i.e. the number of leaves that correspond to it, must be computable (2) it must be possible to count the feasible branches at each branching point, and select one of them efficiently. We note that depending on the exhaustive generation algorithm that the sampling is based on, it may be the case that some leaves of the decision tree that correspond to the same graph will have different sampling weights. However, equation (4) is still valid for estimating population averages.
A natural generalization of this method is to choose decision branches non-uniformly. This gives an opportunity to reduce the bias of the sampling. If each branch were chosen with probability proportional to the number of leaves it contains, then the sampling would be uniform. While computing the exact number of leaves Figure 5. As the wiring algorithm proceeds, sets of vertices that have already formed a connected component are grouped into 'supernodes'. Whether the remaining stubs can be wired up so as to make the entire graph connected can be decided by applying the potential connectedness theorem to the degree sequence of the supernodes (lemma 9). is a difficult combinatorial problem that may not be efficiently solvable, the sampling can be improved through heuristic choices of the branch probabilities. This idea is explored in more detail in [39]. For all the numerical examples discussed in section 5, we weighted the branches of the decision tree using a simple heuristic that is described in appendix D.
Here, we choose to work with the decision tree of the exhaustive generation algorithm described above and illustrated in figure 4: take the stubs one-by-one, in order, processing all stubs of a vertex before moving on to the next, and consider all possible ways each stub can be connected. This For each branch, we must perform two checks: one of graphicality (or multigraphicality) and one of potential connectedness. In the following, we show that both of these checks can be done in constant computational time on average. Therefore, in summary, the computational time required to generate one sample is O(nm), where n is the number of vertices and m is the number of edges of the generated graph.
The constraint of graphicality. When examining the feasibility of a branch, first we must determine if it leads to any simple graphs. This check is similar to the usual graphicality test, with an important difference: suppose that some stubs of vertex i (the 'hub vertex') have already been connected to vertices X = {j 1 , . . . , j k }, but it still has some free stubs. In order to obtain a simple graph, a second connection is not allowed to the vertices in the set X. This restriction is referred to as a star constraint on i, as the connections from i to the elements of X form a star graph. To check graphicality under this constraint, we use the following theorem: Notice that this is a generalization of the Havel-Hakimi theorem, which corresponds to the special case of X = Ø, i.e. no exclusion. The graphicality of d can be tested using the Erdős-Gallai theorem, making the entire test possible in O(n) computational time. In principle, theorem 8 could be used to test each branch of the decision tree separately, but this would not be efficient. A more sophisticated method is presented in [35], where it is shown that there exists a threshold degree d th that separates feasible branches from non-feasible ones. Connecting to a vertex with degree d d th preserves graphicality while connecting to one with d < d th does not. d th may be determined in O(n) time, thus testing the graphicality of individual branches becomes constant time on average. For a detailed description of this testing procedure, we refer the reader to [35].
The constraint of multigraphicality. If we wish to sample loopless multigraphs instead of simple graphs, theorem 2 can be used directly. This requires computing the degree sum, as well as the maximum degree. Instead of recomputing these quantities at each step, their values can be updated incrementally in amortized constant time after the addition of each new edge.
The constraint of connectedness. In order to incorporate the constraint of connectedness, we must find a way to detect decision branches which do not lead to any connected graphs. In other words, we must detect when adding a specific connection would make it impossible to build a connected graph. We do this by tracking the groups of vertices (components) which have so far been connected (figure 5). These components can also be thought of as the nodes of a multigraph, which we term the 'supergraph'. We refer to the components as 'supernodes'. Then the construction process can be completed to a connected graph if and only if the supergraph is potentially connected.
The potential connectedness of the supergraph may be checked using lemma 5. Note that the supergraph does not need to be a simple graph, and indeed it is clear that when the supernode degrees are sufficiently large, it cannot be simple. It is not obvious that in such a situation it can be ensured that the graph of vertices is simple (or loopless) and the supergraph is connected at the same time. The following lemma asserts that this is indeed possible: The proof is given in appendix A.2.
Note that there are two ways in which the potential connectedness of the supergraph can be broken: (1) there may no longer be a sufficient number of edges left to make the graph connected, i.e. m < N − 1, or (2) one of the supernodes (components) may become 'closed', i.e. its degree may become zero before the graph is fully constructed. To check whether adding a connection would give rise to either of these two conditions, we must consider several cases: if there is only one supernode, then the graph is already connected, therefore all connections are allowed. Otherwise, if m = N − 1, then only connections between different supernodes are allowed. Two supernodes with degree 1 each may not connect to each other, and a supernode with degree 2 may not connect to itself, except as the very last step that completed the graph. To check for these cases, we must determine if the two vertices to be connected are within the same supernode. This can be done in constant amortized time, as described in appendix C.

Numerical results
To demonstrate the practical applicability of our proposed sampling method for connected graphs, we performed numerical experiments on degree sequences sampled from a power-law distribution. Networks with similarly heavy-tailed degree distributions commonly occur in the real world [4,5]. The exponent of the power-law distribution was adjusted so as to obtain a degree sequence which, while potentially connected, has overwhelmingly many non-connected realizations. Sampling its connected realizations is therefore not feasible at all with the configuration model: in practice it never generates any connected samples. Thus, we compare results with MCMC samplers. Table 1. The first four statistical moments of the assortativity distributions shown in figure 6(c), as estimated with MCMC and with the biased stub-matching sampler. Standard errors obtained with bootstrapping and are indicated in parentheses.  Figure 6(a) shows one typical simple connected realization of such a degree sequence. This degree sequence was used to generate the results shown in the subsequent panels of the same figure. We chose assortativity, a measure of degree correlations [12], as the graph property to study. Figure 6(b) illustrates how the value of this measure develops while running an edge-switching MCMC sampler for simple connected graphs. Two trajectories are shown: one starting with a high-and one with a low-assortativity graph. In this experiment, at least 1500-2000 edge switches were needed before the two trajectories converged, an indicator of reaching statistical independence. Based on this, in the following numerical experiments 2500 steps were performed between taking samples from the Markov chain. In general, the number of steps which are required to guarantee a given level of independence cannot be determined exactly-this is precisely the problem that the biased stub-matching sampler introduced in this work is meant to overcome. Figure 6(c) compares the distribution of assortativity estimated using the MCMC sampler (blue curves) with the one obtained using the biased stub-matching sampler (yellow/brown curves), and demonstrates that both methods produce the same result. This validates our implementation of the method. The histogram of a biased sample is formed not by counting the number of data points in each bin, but by adding up their inverse sampling weights. The result shown in panel figure 6(c) comes from three separate experiments: in the first, only connected realizations were sampled. In the second, connectedness was not constrained. In the third, connectedness was also not constrained, but assortativity was measured only on the largest connected component (the 'giant component') of the graph. We included the third case because retaining only the giant component is often used as an ad-hoc substitute for incorporating the constraint of connectedness into random graph models [23]. The assortativity distributions are markedly different for all three cases, demonstrating the importance of taking connectedness into account when the problem at hand demands it. We note that with some degree distributions, simply taking the giant component of non-connected samples produces results similar to enforcing connectedness. However, as figure 6(c) demonstrates, with some other degree sequences there can be a significant difference.

Connected realizations
Estimates of four statistical moments of the distributions-their mean, standard deviation, skewness and kurtosis-are reported in table 1 along with their standard errors. We note that the number of samples required for an accurate estimate of statistical quantities is larger when using biased sampling than with uniform sampling. This is not dissimilar from how the effective sample size of the correlated output of an MCMC sampler is also smaller than the number of generated data points. Therefore, when generating the histograms in figure 6(c), we took 10 000 samples from the Markov chain (at intervals of 2500 steps) and 100 000 samples from the biased sampler. In the case of the biased sampler described here, the distribution of sample weights is typically bell-shaped on a logarithmic scale, as shown in figure 6(d). This is expected, since sample weights are the inverse products of the number of feasible branches encountered at each level while traversing the decision tree. If the number of branches were random, the distribution of weights would be log-normal according to the central limit theorem.
Finally, as an example application of the method, we investigate the properties of two connected real-world networks by comparing them to a null model with degree and connectedness constraints (figure 7). Both of these networks are sufficiently sparse so that most realizations of their degree sequences are disconnected. Therefore, the connectedness constraint cannot be handled with simple rejection. The first network is the equivalenced representation of the Western US power network, from the Harwell-Boeing sparse matrix collection (443 vertices, 590 edges) [40]. As a power grid, it is naturally connected. We investigate its global efficiency, defined as the average of the inverse of pairwise shortest path lengths between its vertices [41]. Figure 7(a) shows that the efficiency of this network is significantly lower than that of typical realizations of its degree sequence. This hints at the existence of another dominant constraint, which we surmise to be the spatially embedded nature of the network. Typical connected realizations have higher efficiency than non-connected ones. As the second example, we investigate the degree assortativity in the largest connected component of the protein-protein interaction network of the yeast Saccharomyces cerevisiae (964 vertices, 1487 edges)  [42,43]. Random networks with the same degrees become more disassortative (have higher negative assortativity) when forced to be connected, but still not as disassortative as the empirical network. This shows that high disassortativity is a special property of this network.

Discussion
In this paper we considered the problem of constructing a single realization of a connected graph with a given degree sequence, as well as random sampling from the set of all connected realizations. We addressed both the case of simple graphs, as well as loopless multigraphs. The main contribution of this work is incorporating the constraint of connectedness.
Building a not-necessarily-connected realization of a degree sequence as a simple graph can be accomplished using the well-known Havel-Hakimi construction. Until now, the usual method to construct a connected realization was to first build an arbitrary realization, then rewire its edges to make it connected. This method is complicated and cumbersome to implement. With theorem 4, we show that a specific variant of the Havel-Hakimi construction is guaranteed to produce a connected realization, if one exists. Furthermore, in theorem 7 we generalize the construction to the case of loopless multigraphs. This provides a simple and elegant algorithm for building connected graphs with given degrees. We contributed an implementation of these algorithms to the open-source igraph library [32] and its Mathematica interface, IGraph/M [33].
We have also extended a new family of biased-sampling stub matching methods so that they incorporate the constraint of connectedness without a performance penalty, allowing for fast, efficient rendering of null models and random sampling. Indeed, our approach is significantly faster than the configuration model, which is simply infeasible to use in some regimes of degree sequences. Our algorithm generates each sample in computational time O(nm), where n is the number of vertices and m is the number of edges. Unlike edge-switching MCMC methods, the mixing time of which are not currently known, our method suffers no uncertainty or ambiguity in the independence of the samples. In this sense it is exact. This is of particular importance, again, for the rendering of reliable null models that faithfully represent generic networks of a certain type. An implementation of our sampling method is made freely available at https://github.com/ szhorvat/ConnectedGraphSampler. Finally, we have demonstrated these methods both on generated scale-free degree sequences, as well as on degree sequences of real-world networks. The connected realizations of all of these are markedly different from the non-connected ones, illustrating the relevance of the connectedness constraint. This is consistent with earlier approximate results obtained with heuristic samplers whose bias was not controlled [44]. In all these examples, the use of the configuration model would have been simply infeasible.
We reiterate that these approaches are crucially important due to the pressing need for efficient, appropriate null models across the network and complexity sciences. While the general problem of multi-constraint null model construction and random sampling in random graph models remains open, connectedness is such a ubiquitous feature of real networks and graphs of potential interest that we hope our simple and powerful approach to building connected null models and performing random sampling will find wide applicability. Ultimately, reaching a state in which validation of new findings against numerical control experiments is the standard must be a critical goal for the field as a whole, and further progress in multi-constraint sampling is the only way forward. adding a connection. This reduces the depth of each tree in the forest to one. The subsequent joining of trees can thus never create a tree depth greater than two, i.e. no component check will take more than two operations.

Appendix D. A heuristic for weighting the branches of the decision tree
We employ two simple heuristics to reduce the bias of the sampling distribution: (1) re-ordering the degree sequence and (2) choosing branches of the decision tree non-uniformly.
Note that the structure of the decision tree depends on the order in which vertices are connected up, i.e. the ordering of the degree sequence. We observed empirically that when using the connectedness constraint, an increasing ordering of degrees produces a narrower weight distribution, i.e. results in 'more uniform' sampling. This is consistent with the intuition described in section 3: connecting small degree vertices to larger degree ones favours creating a connected graph.
In the most basic version of the sampling algorithm, each feasible branch of the decision tree is chosen with the same probability, i.e. allowed stubs are picked uniformly. This is equivalent to picking vertices with probability proportional to their degrees, d. We introduce a simple one-parameter heuristic to choose decision branches non-uniformly: pick vertices with probability proportional to d α , or equivalently, pick stubs with probability proportional to d α−1 . The parameter α effectively tunes the affinity of connecting to highversus low-degree vertices. α = 1 corresponds to uniform stub choice. This choice of weighting the branches of the decision tree is purely heuristic, and is motivated both by its simplicity and the observation that both graphicality and connectedness are affected by a preference to choose larger or small degrees (see sections 3 and 4). For a more detailed exploration of branch weighting, see [39].
The parameter α must be adjusted to reduce the bias of the sampler as much as possible. We do this based on the observation that the bias manifests itself in two important ways. First, the distribution of the sampling weights (figure 6(d)) has a large variance. If sampling were uniform, its variance would be zero. Therefore, α could be chosen so as to minimize the variance of the sampling weight distribution. Second, when measuring a certain graph property such as assortativity, the biased sampler may produce property values that should be common with a vanishingly low probability. Figure 8(a) shows the biased distributions of assortativity values obtained with various different choices of α (blue, yellow, red) and compares it to the values obtained with a non-biased MCMC sampler (grey). Notice that the biased distribution obtained with α = 1 (blue) overlaps with the non-biased one only partially, and, for the sample size used here, includes almost no values lower than −0.30. Therefore, the bias cannot be effectively corrected without increasing the sample size significantly. However, the range of values frequently produced by the biased sampler may be adjusted through α: increasing α shifts assortativity values to a lower range ( figure 8(a), red and yellow). In the spirit of importance sampling, we choose α to sample 'important' values with high probability, i.e. maximize the overlap of the biased distribution with the non-biased one, and thus minimize the amount of bias correction that is necessary. How can this be achieved without knowing the non-biased distribution a priori? Notice that bias correction will cause a shift in the range of values only if there is a correlation between the values and the sampling weights. Figure 8(b) shows their joint distributions: the correlation is negative for α = 1.5, positive for α = 1.0 and mostly vanishes for α = 1.2. When the distributions are unimodal, as is typically the case, the lowest correlation can be achieved by minimizing the mean logarithmic sampling weight, i.e. finding the minimum of the black dashed curve in figure 8(b). Notice that this may be done without reference to any particular graph measure, such as assortativity. In the examples considered here, we observed that minimizing the mean of the logarithmic sampling weights also reduced their variance. In summary, minimizing either the variance or the mean of the logarithmic sampling weight distribution are practical ways to improve the performance of the sampling method.
For all examples presented here, we used the Kiefer-Wolfowitz stochastic approximation algorithm to find the optimal α. The α values used for figure 6 were 1.200 when sampling from the connected realizations of the degree sequence and 1.107 when sampling from all realizations. The degree sequence was ordered increasingly in both cases.