Connectivity of random hypergraphs with a given hyperedge size distribution

This article discusses random hypergraphs with varying hyperedge sizes, admitting large hyperedges with size tending to inﬁnity, and heavy-tailed limiting hyperedge size distributions. The main result describes a threshold for the random hypergraph to be connected with high probability, and shows that the average hyperedge size suﬃces to characterise connectivity under mild regularity assumptions. Especially, the connectivity threshold is in most cases insensitive to the shape and higher moments of the hyperedge size distribution. Similar results are also provided for related random intersection graph models


Introduction
The probability of connectivity is one of the most classical problems in the study of random graphs.In their seminal article, Erdős and Rényi [12] proved that a random graph G nm with n nodes and m = m n edges, is with high probability (whp) connected when 2 m n − log n → ∞, and whp disconnected when 2 m n − log n → −∞.The random graph G nm is a special instance of a random hypergraph H nmd with n nodes and m hyperedges of size d, with d = 2.An extension of the above result to random hypergraphs with d = O(1), noted in [10] and formally proved by Poole [23], states that H nmd is whp connected when d m n − log n → ∞ and whp disconnected when d m n − log n → −∞.Instead of full connectivity, most earlier research on random hypergraphs has focused on sparse regimes critical for the emergence of a giant connected component (Behrisch, Coja-Oghlan, and Kang [2,3]; Karoński and Luczak [19]; Schmidt-Pruzan and Shamir [24]).On the other hand, fundamental statistical questions related to identifying parameters and detecting communities in networks with multiple layers and higher-order interactions often require a quantitative understanding of the full connectivity threshold [1,7,9,17].This motivates to study random hypergraphs where the hyperedge sizes may not be universally bounded, and the hyperedge sizes may be highly variable.Our main research objective is to study the effect of inhomogeneous hyperedge sizes on critical thresholds of connectivity in large hypergraphs.

Notations
Hypergraphs.A hypergraph is a pair h = (V h , E h ) where V h is a finite set of objects called nodes and E h is a collection of nonempty subsets1 of V h called hyperedges.A hypergraph h is connected if for each partition of V h into nonempty sets V 1 and V 2 there exists a hyperedge e ∈ E h such that V 1 ∩ e = ∅ and V 2 ∩ e = ∅.A node of a hypergraph is isolated if it is not contained in any hyperedge of size at least two.The empirical hyperedge size distribution of a hypergraph h is the probability mass function f (x) = |E h | −1 e∈E h δ |e| (x) where δ a (x) = 1 if x = a and δ a (x) = 0 otherwise.The 2-section of a hypergraph h is the graph h ′ = (V ′ h , E ′ n ) where V ′ h = V h and E ′ h is the set of unordered node pairs contained in at least one hyperedge of h.Probability.The expectation of a random variable defined on a probability space equipped with probability measure P = P n is denoted by E = E n .For clarity, the scale parameter n = 1, 2, . . . is mostly omitted from the notations.An event A = A n is said to occur with high probability (whp) if P n (A n ) → 1 as n → ∞.

Outline
The rest of the article is organised as follows.Section 2 presents the main results and Section 3 discusses their relevance and relation to earlier research.The proofs are in Sections 4-6, with Section 4 analysing isolated nodes, Section 5 connectivity, and Section 6 summarising details.

Main results
The main results are given separately for three different, but closely related network models.Section 2.1 discusses random hypergraphs with given hyperedge sizes, and Section 2.2 random intersection graphs obtained as a 2-section of a random hypergraph.Section 2.3 describes a random hypergraph model which allows to sample instances of both earlier models.

Random hypergraphs with given hyperedge sizes
Fix integers n, m ≥ 1 and a probability distribution f on {1, . . ., n}.Denote by H nmf the collection of hypergraphs on node set {1, . . ., n} having m hyperedges and empirical hyperedge size distribution f .We see that the set H nmf is nonempty when for all x = 1, . . ., n, mf (x) is an integer satisfying mf (x) ≤ n x .In this case we let H nmf be a random hypergraph sampled uniformly at random from H nmf .Then H nmf contains mf (1) hyperedges of size 1, mf (2) hyperedges of size 2, and so on.Because hyperedges of size one do not affect connectivity, relevant moments of the hyperedge size distribution are denoted by (2.1) Let H nmd be a random hypergraph sampled uniformly at random from the set of hypergraphs on node set {1, . . ., n} having m hyperedges of size d.Such a hypergraph is a special instance of H nmf where f is the Dirac point mass at d, and therefore Theorem 2.1 characterises the connectivity of H nmd when m ≪ n

Random intersection graphs
Fix integers n, m ≥ 1 and a probability distribution f on {0, . . ., n}.Let G nmf be a random graph on node set {1, . . ., n} generated by first sampling mutually independent random sets , and then declaring each unordered node pair {i, j} adjacent if there exist at least one set V k containing both i and j.This model is known as the passive random intersection graph [13].
Theorem 2.3.For any m = m n and f = f n such that m ≥ 1 and (f ) 2 (f ) 0 , Let G nmd be a special instance of the above random intersection graph model in which f = δ d is the Dirac point mass at d.

Shotgun random hypergraphs
The results in Sections 2.1-2.2 will be proved by analysing a more general random hypergraph model defined as follows.Fix a probability distribution F = f (1) ×• • •×f (m) on {0, . . ., n} m , and consider a random hypergraph H * nmF with node set {1, . . ., n} and hyperedge set2 {V 1 , . . ., V m }, where the sets V 1 , . . ., V m ⊂ {1, . . ., n} are mutually independent and such that the probability mass function of nmF corresponds to a generalisation of the passive random intersection graph in [13,14], and a special case of a Bernoulli superposition graph model in [4,5,6,15,18,22].Denote Theorem 2.5.For any m = m n and F = F n such that (F ) 2 (F ) 0 , The expected number of isolated nodes in H * nmd equals e λ with λ = log n + m log(1 − d/n) (see Proposition 4.1).Therefore, Theorem 2.6 implies that the threshold for connectivity is the same as the threshold for the existence of isolated nodes in H * nmd .

Role of large hyperedges
For uniform random hypergraphs, Poole [23] has shown that if d = O(1), then This characterisation follows as a special instance of Theorem 2.
To see why this is true, we note that (i) µ → −∞ together with λ ≤ µ implies that λ → −∞.We also note that (ii) µ → ∞ implies that m n d log n, and together with d To verify the limits in (3.1), denote m , and we conclude that λ → −∞.We also note that Similarly, Godehardt, Jaworski, and Rybarczyk [14] showed that for a random intersection graph G nmf with f = f n having support universally bounded by max{x : 3 extends this characterisation to random intersection graphs where the support of f is not universally bounded.Theorem 2.4 provides a sharper characterisation for degenerate distributions which is relevant in cases with d ≫ n log n .

Role of variable hyperedge sizes
Example 3.3 below shows that in some extremal cases, H * nmF have the expected number of isolated nodes converging to infinity, also implying that log n − m n (F ) 1 → +∞, while still producing graph samples which are whp connected.Therefore, some regularity conditions, such as the condition (F ) 2 (F ) 0 , must be imposed for the characterisation in Theorem 2.5 to be valid.
and therefore In this case any particular node is isolated with probability 1) .Hence the expected number of isolated nodes equals (1) .We conclude that H * nmF is whp connected, even though the expected number of isolated nodes is at least n 1/6−o (1) .The same conclusions are valid for the random intersection graph G nmf obtained as the 2-section of H * nmF .

Conclusion
This article's main findings indicate that under mild regularity assumptions, the average hyperedge size suffices to characterise connectivity, and other characteristics of the hyperedge size distribution, such as higher moments or heavy tails, are not relevant to connectivity.This finding is in line with similar studies related to the giant component in random graphs (see Deijfen, Rosengren, and Trapman [11]; Leskelä and Ngo [20]).The analysis in the present article was restricted to hyperedge size distributions with a universally bounded second moment.Whether or not this regularity condition may be relaxed remains a problem of further research.

Isolated nodes
In this section we will analyse the number of isolated nodes in the shotgun random hypergraph Proposition 4.1 shows that the expected number of isolated nodes tends to zero when λ → −∞ and to infinity when λ → +∞.The following result tells that this characterisation may be simplified under a mild regularity condition. where , and X k is an f (k) -distributed random integer.

Analysis of isolation probabilities
Denote by P 1 the probability that any particular node is isolated in H * nmf , and by P 2 the probability that any particular two nodes are both isolated in H * nmf .We will next derive formulas and bounds for P 1 and P 2 in terms of functions Similarly, the probability that any distinct nodes i and j both are isolated in H * nmF equals P 2 = m k=1 Eφ 2 (X k ) where φ 2 (x) is the conditional probability that i and j both are isolated in Lemma 4.5.The pair isolation probability ratio P 2 /P 2 1 is bounded by where X k is distributed according to f (k) and φ 1 is defined by (4.2).

Proof of Proposition 4.2. The inequality log
By noting that the right side of the above inequality equals − m n (F 1 ) with (F ) The assumption (F ) 1 m −1 n log n now implies that a (n log n) 1/2 .Especially, max k xk /n → 0. We may then apply the bound log(1 − t) ≥ − t 1−t , 0 ≤ t < 1, to notice that By writing µ = log n − m k=1 xk /n, we find that Proof of Proposition 4.3.The expected number of isolated nodes equals EI = nP 1 where P 1 is the probability that any particular node is isolated in G. Lemma 4.4 implies that ) with φ 1 defined by (4.2).It follows that EI = e λ .Markov's inequality P(I ≥ 1) ≤ EI then implies that P(I = 0) ≤ e λ , and the claim follows.

Upper bounds on cut probabilities
Fix integers 1 ≤ r ≤ n/2 and 1 ≤ x ≤ n.Denote by q r (x) the probability that a set of size x sampled uniformly at random from [n] is either fully contained in [r] or fully contained in ) Furthermore, for any random variable X with values in {0, . . ., n}, and Eq r (X) ≤ P(X < 2) + e −2 r n (1− r n ) P(X ≥ 2). (5.4) Proof.Fix an integer 1 ≤ r ≤ n/2.A simple counting argument shows that By applying a shotgun lemma (Lemma A.2), we see that n−r x / n x ≤ (1−x/n) r for all x ≤ n−r, and r x / n−r x ≤ (1 − n−2r n−r ) x = ( r n−r ) x for all x ≤ n.As a consequence, x n r for all 2 ≤ x ≤ r.Hence (5.1) is valid for all 2 ≤ x ≤ r, and also for all r < x ≤ n − r.Of course, (5.1) is trivially true for x > n − r as well.
To verify (5.4), we note that (5.2) implies that q r (x Proof.Denote by D be the event that H * nmd is disconnected, and by I the event that H * nmd contains isolated nodes.Recall that by Proposition 4.3,

Connectivity for constant layer sizes
(5.6) On the event D \ I there exists a node set R of size d ≤ r ≤ n/2 such3 that (R, R c ) forms a cut in H * nmd , so that no hyperedge of H * nmd contains nodes from both R and R c .By the union bound, we find that where q r (d) denotes the probability that a set of size d sampled uniformly at random from . By (5.1) in Lemma 5.1, we find that and by noting that (1 − d/n) r = (n −1 e λ ) r/m , we find that for all d ≤ r ≤ n/2.
Let us now split the index set J = {r : d ≤ r ≤ n/2} in the sum in (5.7) into J 1 = {r ∈ J : r ≤ n 0 } and J 2 = {r ∈ J : r > n 0 } using a scale-dependent threshold value n 0 = (m −1 n 2 ) ∧ (n/2), and define By (5.8) and the inequality n r ≤ ( en r ) r , we find that e (λ+5)r . (5.10) We may bound the sum S 2 by noting that (5.9) and the equality n r=0 n r = 2 n imply that Because n − n 0 ≥ n/2, we conclude that We also note that S 2 = 0 for m ≤ 2n, because in this case n 0 = n/2 and the index set J 2 is empty.For m > 2n we see that n 0 = m −1 n 2 , and the above inequality becomes Therefore, S 2 ≤ (2/e) n in both cases.By recalling that P(I) ≤ e λ due to (5.6), noting that P(D \ I) ≤ S 1 + S 2 due to (5.7), and combining the latter inequality with (5.10), we conclude that

Connectivity for inhomogeneous layer sizes
Proposition 5.3.
Proof.H * nmF is disconnected if and only if there exists a node set A of size 1 ≤ |A| ≤ n/2 such that H * nmF contains no links connecting A to its complement.For any node set with |A| = r nodes, the conditional probability of such event given the hyperedge sizes equals m k=1 q r (X k ) where X k = |V k | and q r (x) is the probability that a uniformly random x-subset of [n] is either fully contained in [r] or fully contained in [n]\[r].Because X 1 , . . ., X m are mutually independent, the corresponding unconditional probability equals Q r = m k=1 Eq r (X k ).The union bound hence implies that We will fix a threshold level n 0 = c −1 m −1 n 2 with c = (F ) 0 , and split the sum on the right as . (5.12) where (i) Assume that n 0 ≥ 1, so that J 1 is nonempty, and fix r ∈ J 1 .The bound (5.3) combined with the inequality log Recall the moments (F ) r defined by (2.2).Let Y be a random integer distributed according to probability measure 1 m m k=1 f k .By summing both sides of the above inequality, we find that By applying the bound n r ≤ n r , we find that n r Q r ≤ e Br , where Therefore, S 1 ≤ ∞ r=1 e Br .Because µ → −∞ and (F ) 2 (F ) 0 , we find that S 1 = o(1).(ii) For the terms of the sum S 2 , we apply inequality (5.4) and the fact that r By applying the equality n r=0 n r = 2 n , it follows that S 2 ≤ 2 n e −mct(1−t/2) = e n log 2−mct(1−t/2) .
6 Proofs of main results where , and X k is an f k -distributed random integer.We will show that m k=1 Var(Z k ) We saw in the proof of Proposition 4.2 that (F ) 2 (F ) 1 and a (m(F ) 1 ) 1/2 n 1/2 log 1/2 n.As a consequence, min k EZ k = 1 − a/n = 1 − o(1), and we conclude that We Let us denote by D the event that the sets V 1 , . . ., V m are all distinct.On this event, H * ∈ H nmf .We also see 4 that for any h ∈ H nmf .
The above formula shows that the probability mass function h → P(H * = h) is constant on H nmf , and therefore the conditional probability distribution on H * given D is the uniform distribution on H nmf .Especially, Furthermore, in this case the special moments (F ) r defined in (2.2) coincide with the moments defined in (2.1) according to (F ) r = (f ) r .Because The claims now follow by Theorem 2.6.

A Elementary bounds A.1 Sampling distinct random sets
Given integers n, m ≥ 1 and 1 ≤ x 1 , . . ., x m ≤ n, let V 1 , . . ., V m be mutually independent random sets such that for each k, the set V k is sampled uniformly at random from the collection of all subsets of {1, . . ., n} of size x k .The following result (compare with [16, ≪ 1, where m x denotes the number of sets of size x. 4 H * = h if and only if for each x, the x-sized hyperedges of H * coincide with the x-sized hyperedges of h, and the latter event has probability mx! n x −mx when h has mx = mf (x) distinct hyperedges of size x.
Lemma A.1.Denote by m x the number of sets V k with size 1 ≤ x ≤ n.The probability p that the sets V 1 , . . ., V m are distinct is nonzero if and only if m x ≤ n x for all x.In this case p is bounded by Proof.The event of interest can be written as D = ∩ n x=1 D x , where D x is the event that all sets of size x are distinct.The event D x corresponds to the classical birthday problem (e.g.[21] for which we know that P(D x ) = mx−1 k=1 (1 − k/ n x ).The inequality 1 − t ≤ e −t then implies that Because D 1 , . . ., D n are independent, it follows that P(D) = n x=1 P(D x ) ≤ e −c .Observe next that when D x fails, then there exists a pair of sets of size x which coincide with each other.Because any particular pair of sets of size x coincides with probability n

A.2 Shotgun lemma
Imagine shooting at a target having n squares, out of which r are painted red, and the rest are white.Let us fire a shotgun with d ≤ n − r bullets at the target, and assume that bullets hit a uniformly random d-subset of the n squares.The probability that none of the bullets hits a red square is n−r d / n d .Imagine a reversed setting, where we fire r ≤ n − d bullets, trying to avoid d red squares.The probability of success in that case is n−d r / n r .The following lemma confirms that both probabilities are the same.Lemma A.5. −t(1 − t) −1 ≤ log(1 − t) ≤ −t for all t ∈ (0, 1).Furthermore, −t − t 2 1−t ≤ log(1 − t) ≤ −t − 1 2 t 2 for all t ∈ (0, 1).

Asymptotics.
Standard Bachmann-Landau notations are employed as follows: We write a n ≪ b n and a n = o(b n ) when a n /|b n | → 0, a n b n and a n = O(b n ) when lim sup a n /|b n | < ∞, and a n ∼ b n when a n /b n → 1.

d 1 / 2 Theorem 2 . 2 .
and d = O(1).In this special case where all hyperedges are of equal size d, Theorem 2.2 below provides an alternative characterisation of connectivity which admits large hyperedges not constrained by the assumption that d = O(1).If m = m n and d = d n are such that m ≪ n d 1/2 and 2 ≤ d ≤ n, then

Theorem 2 . 4 .
For any m = m n and d = d n such that m ≥ 1 and 2 ≤ d ≤ n,

Lemma A. 2 .
For any integers d, r, n ≥ 0 such that d + r ≤ n, the probability p = n−r d / n d can be written as p = n−d r / n r and is bounded by p ≤ (1 − r/n) d and p ≤ (1 − d/n) r .Proof.To verify that n−r d / n d = n−d r / n r , it suffices to writen − r d / n d = (n − r)! d!(n − r − d)! d!(n − d)! n! = (n − r)! (n − r − d)! (n − d)! n! ,and note that the right side remains the same if d and r are swapped.We also note thatn−r d / n d = (n−r) d (n) d = d−1 c=0 n−r−c n−c = d−1 c=0 1 − r n−c, from which we find that p ≤ (1 − r/n) d .The latter inequality follows by repeating the same argument with d and r swapped.
Denote by H * nmd a special instance of H * nmF with F being equal to the Dirac point mass at (d, . . ., d) ∈ {0, . . ., n} m .For any m = m n and d = d n such that m ≥ 1 and 2 ≤ d ≤ n, ).
nmF defined in Section 2.3.Recall that a node of a hypergraph is called isolated if it is not contained in any hyperedge of size at least two.Section 4.1 summarises our findings in Propositions 4.1-4.3.Section 4.2 presents results for isolation probabilities that used in Section 4.3 to prove Propositions 4.1-4.3.
Proposition 4.1.The expected number of isolated nodes in H * nmF equals exp(λ) where Lemma 4.4.The probability that any particular node is isolated in H * nmF is given by P 1 = (x)f (k) (x)), and the probability that any particular two nodes both are isolated inH * nmF equals P 2 = m k=1 ( n x=0 φ 2 (x)f (k) (x))with φ 1 defined by (4.2) and φ 2 defined by (4.3).Proof.We may write H * nmF as a union ∪ m k=1 H * k where H * k is a hypergraph on {1, . . ., n} having V k as the only hyperedge.We note that a node i is isolated in H * nmF if and only if it isolated in H * k for all k.Because H * 1 , . . ., H * m are mutually independent, we see that P 1 3)) 2, nodes i and j both are isolated in H * k if and only if neither of these nodes is contained in V k , and given |V k | = x, this occurs with probability (1 ik and the sets V k are mutually independent, it follows thatP 1 = EI 1 = m k=1 EI 1k and P 2 = E(I 1 I 2 ) = m k=1 EI 1k I 2k .Then . Denote by I i (resp.I ik ) the indicator variable of the event that node i is isolated in H * nmF (resp.layer H * k ).Because I i = m k=1 I To analyse the covariance term in (4.4), denote X k = |V k |.Then by conditioning on X k , we find that Cov(I 1k , I 2k Proof of Proposition 4.1.The expected number of isolated nodes equals nP 1 where P 1 is the probability that any particular node is isolated in H * nmF .The claim follows after noting that by Lemma 4.4, P 1 Proof of Theorem 2.6.Fix integers m = m n and d = d n such that m ≥ 1 and 2 ≤ d ≤ n.Recall that H * nmd is a special instance of the model H * nmF for which λ defined by (4.1) reduces to λ = log n + log(1 − d/n).Proposition 4.3 then implies that P(H * nmd is connected) ≤ P(H * nmd contains no isolated nodes) ≤ e −λ , Proof of Theorem 2.1.Let H nmf be a random hypergraph sampled uniformly at random from the set H nmf of hypergraphs on node set {1, . . ., n} having m hyperedges and empirical hyperedge size distribution f .Instead of directly analysing H nmf , we will study a simpler model generated as follows: 1. Create a list (x 1 , . . ., x m ) by concatenating mf (1) copies of integer 1, mf (2) copies of integer 2, . . ., and mf (n) copies of integer n. 2. Sample random sets V 1 , . . ., V m independently and uniformly at random from the collection of subsets of [n] with sizes x 1 , . . ., x m , respectively.3. Define a hypergraph H * = ([n], {V 1 , . . ., V m }).note that the distribution of H * is in general not the uniform distribution on H nmf , because the collection {V 1 , . . ., V m } may contain less than m unique sets in case there are duplicates.Instead, H * = H * nmF may be recognised as an instance of the shotgun random hypergraph model defined in Section 2.3 where F = δ x 1 × • • • × δ xm with δ x denoting the Dirac point mass at x.
Lemma A.1 shows that P(D) → 1.The claims now follow by Theorem 2.5.Proof of Theorem 2.2.We will construct random hypergraph H * in the same way as in the proof of Theorem 2.1, this time defining the list (x 1 , . . ., x m ) simply by concatenating m copies of integer d.Then we recognise H * = H * nmd as an instance of the shotgun random hypergraph model defined in Section 2.3 where 5roof of Theorem 2.3.Let H * nmF be a shotgun random hypergraph with F = f × • • • × f being the m-fold product measure of f .Then the moments defined by (2.1) and (2.2) match according to (F ) r = (f ) r .The statements of Theorem 2.3 then follow by applying Theorem 2.5and noting that the graph G nmf is connected if and only if the hypergraph H * nmf is connected.Proof of Theorem 2.4.Let H * nmd be a shotgun random hypergraph with F being the Dirac point mass at (d, . . ., d) ∈ {0, . . ., n} m .The statements of Theorem 2.4 then follow by applying Theorem 2.6 and noting that the graph G nmd is connected if and only if the hypergraph H *