ASYMPTOTIC ENUMERATION OF BINARY CONTINGENCY TABLES AND COMPARISON WITH INDEPENDENCE HEURISTIC

. For parameters n, δ, B, C , we obtained a sharp asymptotic formula for the number of ( n + ⌊ n δ ⌋ ) 2 -dimensional binary contingency tables with non-uniform margins taking values of ⌊ BCn ⌋ and ⌊ Cn ⌋ . Furthermore, we compare our sharp asymptotics with the classical independence heuristic estimate and proved that the independence heuristic overestimates by a factor of e Θ( n 2 δ ) . Our comparison is based on the analysis of the correlation ratio and an explicit bound for the constant in Θ is also obtained.

1. Introduction 1.1.Overview.This paper studies the asymptotic enumeration of binary contingency tables and its connection with the classical independence heuristic introduced by I. J. Good 70 years ago [GC77].
Binary contingency table is the set of 0-1 matrices with fixed row and column sums.Let r = (r 1 , . . ., r m ) and c = (c 1 , . . ., c n ) be two positive integer vectors with same total sum of entries, i.e.
X kj = c j for all 1 ≤ i ≤ m, 1 ≤ j ≤ n be the set of binary contingency tables with row margin r and column margin c.Since X ij ∈ {0, 1}, it is easy to see that r i ≤ n, c j ≤ m for all i, j.Binary contingency table has close connections with bipartite graphs with fixed degree sequence, see e.g., [Wor19], for historical review.It also arises as the structural constants in the ring of symmetric functions and representation theory of general linear groups, see [Mac98].
Estimating the cardinality of M(r, c) is a fundamental problem in analytic combinatorics; see for instance, [GC77] and [Bar10].At the very beginning, we have the following effortless estimate based on the so-called independence heuristic.Precisely speaking, fix r = (r 1 , . . ., r m ) and c = (c 1 , . . ., c n ) with N = r 1 + . . .+ r m = c 1 + . . .+ c n , and let M N be the set of m × n 0-1 matrices with total sum of entries N. Let X be the uniform sample from M N and consider the following two events: R r = {X has row sum r} and R c = {X has column sum c}.
It follows from the definition that the independence heuristic estimate corresponding to margins r and c.
In 1969, O'Neil [O'N69] solved the problem for the case of large sparse binary matrices with r i , c j ≤ (log n) 1 4 −ε for all 1 ≤ i, j ≤ n.In 2010, A. Barvinok [Bar10] used permanents and the van der Waerden bound for doubly-stochastic matrix to obtain an estimate for generic r and c.This estimate has recently been improved in [BLP23] using the techniques of Lorentzian polynomials.Later on, Barvinok and Hartigan [BH13] used the Maximal Entropy Principle and local central limit theorem to obtain a more precise asymptotic formula under certain regularity conditions.
Our main focus is to first derive the asymptotics of |M(r, c)| when both r and c take two different linear values of n; see the precise definition in the next subsection.After that, we compared the results with the independence heuristic estimate (1.2) and showed that the independence heuristic estimate leads to a large overestimate by a factor of e Θ(n 2δ ) .Our derivation of asymptotics follows closely the spirit of [LP22] and is mainly based on Barvinok's asymptotic formula in [Bar10] and the author's recent work [Wu22] on the limiting distribution of Random Binary Contingency Tables.Similar asymptotics for the uniform margin case can also be obtaned with the same techniques and the limiting distribution derived in [Wu23] and let M n,δ (B, C) := M( r, c).Namely, M n,δ (B, C) is the set of (⌊n δ ⌋ + n) 2 -dimensional binary matrices whose first ⌊n δ ⌋ rows and columns have sum ⌊BCn⌋ and remaining n rows and columns have sum ⌊Cn⌋.
The first main result of this paper is the sharp asymptotics of . This theorem will be proved in the Section 2. For now, we remark that the proof is based on the Maximum Entropy Principle and the function f (x) is Shannon-Boltzmann entropy of Bernoulli random variable with mean x; the coefficients in front of n 2 , n 1+δ , n 2δ all come from the Bernoulli entropy.
Next, we study the relationship between |M n,δ (B, C)| and its corresponding independence heuristic.Recall that for margins r = (r 1 , . . ., r m ) and c = (c 1 , . . ., c n ), the independence heuristic estimate is the following quantity, where N = m i=1 r i = n j=1 c j is the total sum of entries.We denote To study the relation between I n,δ (B, C) and |M n,δ (B, C)|, we consider their Correlation Ratio ρ n,δ (B, C).It is defined as Our next main result is on the asymptotic behaviour of ρ n,δ (B, C).
and we have the following explicit formula for ∆ B,C : The behaviour of ∆ B,C tells us that the independence heuristic overestimates the number of tables in M n,δ (B, C).This matches Barvinok's arguments on cloned margins, see [Bar10] for details.Probabilistically speaking, the events R n,δ (B, C) = {0-1 matrices has row sums r} and C n,δ (B, C) = {0-1 matrices has column sums c} are asymptotically negatively correlated instead of being asymptotically independent.∆ B,C quantifies how far they are away from being independent.In fact, when B = 1, i.e. all row sums and columns sums are equal, the independence heuristic provides the best estimate.As B moves away from 1, meaning that the margins become less and less uniform, the independence heuristic overestimates by a factor of e Θ(n 2δ ) .It is really interesting for the readers to compare our results with the recent work of [LP22] on nonnegative integer case.When the contingency tables are non-negative integer valued, the independence heuristic leads to a large undercounting, which is opposite to our binary case.The reason behind this phenomena remain mysterious.It would be nice if we can obtain the intermediate results on the contingency tables whose entries take values from {0, 1, . . ., k} for finite k.Let r = (r 1 , . . ., r m ) ∈ N m and c = (c 1 , . . ., c n ) ∈ N n be two positive integer vectors with total sum of entries.The binary transportation polytope P(r, c) is defined as

Barvinok introduced the following notion of Typical Table in [Bar10].
Definition 2.1 (Typical Table ).Let r = (r 1 , . . ., r m ), c = (c 1 , . . ., c n ) be two positive integer vectors with total sum of entries.For each X = (X ij ) ∈ P(r, c), let ) is defined as the unique maximizer of g on P(r, c).
Remark 2.2.The function g is strictly concave so it attains a unique maximum in the interior of the binary transportation polytope.Hence typical table is well-defined.
Remark 2.3.The function f (x) = x log 1 x + (1 − x) log 1 1−x is the Shannon-Boltzmann entropy of Bernoulli random variable with mean x.Therefore, g(X) can be viewed as the Bernoulli entropy of X and Z is the maximal entropy matrix on the polytope P(r, c).
Remark 2.4.By symmetry, two entries in typical table are equal if they have the same margin conditions, i.e It turns out that typical table has close connection to the cardinality of M(r, c).The following theorem proved by Barvinok in [Bar10] plays a key role in our proof.
Lemma 2.6.Fix 0 ≤ δ < 1, 0 < C < 3 4 and 0 < B ≤ 1 C .Let Z = (z ij ) be the typical table for M n,δ (B, C).Then there exists constants γ 1 (B, C) and γ 2 (B, C) such that the followings hold: (1) Next, we derive the asymptotics of g(Z), which is the entropy for the typical table.Our proof is almost identical to the [LP22, Proposition 3.4], except that we plug in different limits of the typical table.Notice that in the binary case, there is no sharp phase transition for the value of B with respect to C; see [Wu22, Remark 1.4] for more discussions on this aspect. where By symmetry of typical table and marginal condition, . Therefore, ).Taylor expansion of f (x) around a has the following form, Hence, Proof of Theorem 1.1.Notice that for margins r and c, by Theorem 2.5, Theorem 1.1 follows from Proposition 2.7.

Independence Heuristic Estimate
Using Stirling formula, we can deduce the asymptotics of independence heuristic estimation.Similar asymptotics of independence heuristic for non-negative integer-valued contingency tables can be found in [LP22,Lemma 4 Proof.By Stirling formula, we have that Recall that for general margins r and c, the independence heuristic estimate takes the form In our setup, when r = r and c = c, the dimension is n + n δ and the total sum of entries N = BCn 1+δ + Cn 2 .By Taylor expansion Therefore, log I n,δ (B, C) has the following expansion,
r, c) = N log N + (mn − N) log(mn − N) − m i=1 r i log r i − m i=1 (n − r i ) log(n − r i ) − n j=1 c j log c j − n j=1 (m − c j ) log(m − c j ) + O ((m + n) log(mn)) .