1 Introduction

Combinatorial optimization problems on graphs are widespread in operation research, with applications in planning and logistics. Their study is strongly related to algorithm theory and computational complexity theory. The most representative example of such discrete variational problems is the travelling salesperson problem (TSP) [45]: given a set of cities and distances between each pair of them, one asks for the shortest route that visits each city exactly once and returns to the origin city (i.e. a tour). Like many related combinatorial problems and despite its straightforward formulation, the TSP belongs to the class of NP-hard problems. In practical terms, computing an exact solution becomes computationally intractable as known algorithms perform exponentially many steps in the number of cities.

In real-world situations, there is quite often the need to solve many similar instances of a given combinatorial optimization problem. In that case, additional structure, including geometry and randomness, can be exploited. The Euclidean formulation of the TSP, i.e., when cities are points in \(\mathbb {R}^d\) and distances are given by the Euclidean distance, is still NP-hard [40], but Karp [33] observed that solutions to random instances, i.e., when cities are sampled independently and uniformly, can be efficiently approximated via a partitioning scheme. His proof relies upon the seminal work by Beardwood, Halton and Hammersley [7], where precise asymptotics for optimal costs of a random instance of the problem were first established: given i.i.d. points \((X_i)_{i=1}^n\) distributed according to a probability density \(\rho \) on \(\mathbb {R}^d\), denoting the length \( \mathcal {C}_{\textsf{TSP}}((X_i)_{i=1}^n)\) of the (random) solution to the TSP cycling through such points satisfies the \(\mathbb {P}\)-a.s. limit

$$\begin{aligned} \lim _{n \rightarrow \infty } n^{\frac{1}{d}-1} \mathcal {C}_{ \textsf{TSP}}((X_i)_{i=1}^n) = \beta _{{\text {BHH}}} \int _{\mathbb {R}^d} \rho ^{1-\frac{1}{d}}, \end{aligned}$$
(1.1)

where \(\beta _{{\text {BHH}}} = \beta _{{\text {BHH}}}(d) \in (0, \infty )\) is a constant depending on the dimension d only. The scaling \(n^{1-1/d}\) is intuitively explained by the fact that the n cities are connected through paths of typical length \(n^{-1/d}\) (as if they were on a regular grid).

Building upon these ideas, several authors [39, 46, 47, 53] contributed towards establishing a general theory to obtain limit results of BHH-type, i.e., as in (1.1), for a wide class of random Euclidean combinatorial optimization problems. The theory allows also for more general weights than the Euclidean length, including p-th powers of the Euclidean distance, a variant often motivated by modelling needs. If \(0<p<d\), with a minimal modification of the techniques one obtains BHH-type results as in (1.1), with the scaling replaced by \(n^{1-p/d}\), the constant \(\beta _{{\text {BHH}}}\) now depending on p, d and the specific combinatorial optimization problem, and the integrand \(\rho ^{1-1/d}\) replaced by \(\rho ^{1-p/d}\). For \(p \ge d\), the situation becomes subtler and (1.1) is known for the TSP only if \(p=d\), see [52] and [53, Section 4.3].

Despite the wide applicability of this theory, several classical problems such as those formulated over two random sets of points, are not covered and require different mathematical tools. The Euclidean assignment problem, also called bipartite matching, is certainly the most representative among these: given two sets of n points \((x_i)_{i=1}^n\), \((y_j)_{j=1}^n \subseteq \mathbb {R}^d\), one defines the matching cost functional as

$$\begin{aligned} \textsf{M}^p\left( (x_i)_{i=1}^n, (y_j)_{j=1}^n \right) = \min _{\sigma } \sum _{i=1}^n |x_i - y_{\sigma (i)}|^p, \end{aligned}$$

where the minimum is taken among all the permutations \(\sigma \) over n elements. This is often interpreted in terms of optimal planning for the execution of a set of jobs at positions \(y_j\)’s to be assigned to a set of workers at the positions \(x_i\)’s. Although the assignment problem belongs to the P complexity class, i.e., an optimal \(\sigma \) can be found in a polynomial number of steps with respect to n [38], the analysis of random instances still shows some interesting behavior in low dimensions. Indeed, if \((X_i)_{i=1}^n\), \((Y_j)_{j=1}^n\) are i.i.d. and uniformly distributed on the cube \((0,1)^d\), it is known [1, 21, 22, 48] thatFootnote 1

$$\begin{aligned} \mathbb {E}\left[ \textsf{M}^1( (X_i)_{i=1}^n, (Y_j)_{j=1}^n ) \right]\sim {\left\{ \begin{array}{ll} \sqrt{n} &{}\text { for } d=1\\ \sqrt{ n \log n} &{} \text { for } d=2\\ n^{1 -\frac{1}{d}} &{} \text { for } d\ge 3. \end{array}\right. } \end{aligned}$$

In particular, for \(d\in \left\{ 1,2 \right\} \) the cost is asymptotically larger than the heuristically motivated \(n^{1-1/d}\). This exceptional scaling is intuitively due to local fluctuations of the distributions of the two families of points.

Inspired by the combinatorial approach in [12] for the random Euclidean bipartite matching problem in dimension \(d\ge 3\), Barthe and Bordenave [6] first proposed a general theory to establish results of BHH-type (1.1) for a wide class of random Euclidean combinatorial optimization problems over two sets of n points. Let us point out that the equality in (1.1) is actually only proven for uniform measures while in general only upper and lower bounds (which are conjectured to coincide) are known. In case of p-th power weighted distances, the theory developed in [6] applies in the range \(0<p<d/2\), which appears quite naturally in their arguments. The threshold \(p=d/2\) is not merely technical, since in fact (1.1) cannot hold without additional hypothesis on the density \(\rho \). For example, because of fluctuations a necessary condition is connectedness of the support of \(\rho \). Nevertheless, in the case of the Euclidean bipartite matching problem, it was recently proved [25] that if \(\rho \) is the uniform measure on the unit cube with \(d \ge 3\) and \(p \ge 1\), then

$$\begin{aligned} \lim _{n \rightarrow \infty }n^{\frac{p}{d}-1} \mathbb {E}\left[ \textsf{M}^p( (X_i)_{i=1}^n, (Y_j)_{j=1}^n ) \right] = \beta _{\textsf{M}}. \end{aligned}$$
(1.2)

Here \(\beta _{\textsf{M}} \in (0, \infty )\) depends on d and p only. The proof is a combination of classical subadditivity arguments—that originate from [7]—and tools from the theory of optimal transport. In particular, the defect in subadditivity is estimated using the connection between Wasserstein distances and negative Sobolev norms. In this context, the use of this type of estimates can be traced back to a recent PDE ansatz proposed in statistical physics [14]. Since then, it has been successfully used in the mathematical literature [3, 10, 16, 23, 24, 26, 27, 35, 37], even beyond the case of i.i.d. points [11, 28, 30, 51]. We refer to [8, 9, 15] for further statistical physics literature. In fact, the technique in [25] is quite robust and coarser estimates can be used, avoiding the use of PDEs. Still, the results apply only for the Euclidean bipartite matching problem thanks to its connection with optimal transport. The main purpose of this paper is to show that for a quite general class of bipartite combinatorial problems it is actually possible to rely on the good bounds for the matching problem to obtain the analog of (1.1) provided \(p<d\). This is inspired by [13] where a similar idea is used for the TSP and the 2-factor problem when \(p=d=2\).

As alluded to, an important open question left from the theory developed in [6] (see also [20]) is the existence of a limit in (1.1) for general densities. The only result in this direction is [4], which established for \(p=d=2\) that the limit of the expected cost (suitably renormalized) exists if \(\Omega \) is a bounded connected open set, with Lipschitz boundary and \(\rho \) is Hölder continuous and uniformly strictly positive and bounded from above on \(\Omega \). This settled a conjecture from [9] and, more importantly for our purposes, combined subadditivity and PDE arguments with a Whitney-type decomposition to take into account the structure of \(\Omega \) and its boundary. While we do not address this question here, some of the ideas from [4] are further developed in this work.

1.1 Main result

Our aim is to establish limit results for the cost of a wide class of Euclidean combinatorial optimization problems of two random point sets, in the range \(d/2 \le p <d\) for any dimension \(d \ge 3\). This overcomes the limitations of [6], showing that in higher dimensions bipartite problems behave much more similarly to non-bipartite ones. Our general theorem can be stated as follows (a precise description of all the assumptions and notation is given in Sect. 2).

Theorem 1.1

Let \(d \ge 3\), \(p\in [1,d)\) and let \(\textsf{P}= (\mathcal {F}_{n,n})_{n \in \mathbb {N}}\) be a combinatorial optimization problem over complete bipartite graphs such that assumptions A1, A2, A3, A4 and A5 hold and write \(\mathcal {C}_{\textsf{P}}^p( (x_i)_{i=1}^n, (y_j)_{j=1}^n)\) for the optimal cost of the problem over the two sets of n points \((x_i)_{i=1}^n\), \((y_j)_{j=1}^n \subseteq \mathbb {R}^d\), with respect to the Euclidean distance raised to the power p. Then, there exists \(\beta _{\textsf{P}}\in (0, \infty )\) depending on p, d and \(\textsf{P}\) only such that the following hold.

Let \(\Omega \subseteq \mathbb {R}^d\) be a bounded open set and assume that it is either convex or has \(C^2\) boundary. Let \(\rho \) be a Hölder continuous probability density on \(\Omega \), uniformly strictly positive and bounded from above. Given i.i.d. random variables \((X_i)_{i=1}^\infty \), \((Y_j)_{j=1}^\infty \) with common law \(\rho \) we have \(\mathbb {P}\)-a.s. that

$$\begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1} \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) \le \beta _{\textsf{P}} \int _{\Omega } \rho ^{1-\frac{p}{d}}. \end{aligned}$$
(1.3)

Moreover, if \(\rho \) is the uniform density and \(\Omega \) is either a cube or has \(C^2\) boundary, then the above is a \(\mathbb {P}\)-a.s. limit and equality holds.

Our assumptions A1, A2, A3, A4 and in particular A5 are slightly stronger than those introduced in [6, Section 5.3], but it is not difficult to show that all the specific examples discussed in [6] satisfy them. In particular, our result apply to the TSP, the minimum weight connected k-factor problem and the k-bounded degree minimum spanning tree. It is thus fair to say that for compactly supported densities, Theorem 1.1 extends the main results in [6] in the range \(d/2 \le p<d\) for any \(d \ge 3\) and for this reason we do not consider the case \(p<1\).

Remark 1.2

Let us point out that (1.3) also holds in expectation (see Proposition 5.1).

Remark 1.3

Arguing as in [6] (see also [4]) and considering a “boundary” variant of \(\textsf{P}\) it should be possible to adapt the proof of Theorem 1.1 to show that there exists \(\beta _{\textsf{P}}^b>0\) such that

$$\begin{aligned} \beta _{\textsf{P}}^b \int _{\Omega } \rho ^{1-\frac{p}{d}}\le \liminf _{n \rightarrow \infty } n^{\frac{p}{d}-1} \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) . \end{aligned}$$

However since we are currently not able to prove that \(\beta _{\textsf{P}}^b=\beta _{\textsf{P}}\) we decided to leave it aside.

Remark 1.4

In fact our result applies, at least in expectation, to any \(p-\)homogeneous bi-partite functional \(\mathcal {C}\) satisfying the subadditivity inequality (5.2) (which is similar to the condition \((\mathcal {S}_p)\) from [6]) and the growth condition (5.3) (somewhat reminiscent of condition \((\mathcal {R}_p)\) from [6]). See Remark 5.2.

Of course, our result applies in particular for the Euclidean assignment problem.

Corollary 1.5

For \(d \ge 3\), \(p\in [1,d)\), let \(\Omega \subseteq \mathbb {R}^d\) be a cube or a bounded connected open set with \(C^2\) boundary and let \(\rho \) be a Hölder continuous probability density on \(\Omega \), uniformly strictly positive and bounded from above. Then, given i.i.d. \((X_i)_{i=1}^\infty \), \((Y_j)_{j=1}^\infty \) with common law \(\rho \), we have \(\mathbb {P}\)-a.s. that

$$\begin{aligned} \limsup _{n \rightarrow \infty }n^{\frac{p}{d}-1} \textsf{M}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) \le \beta _{\textsf{M}} \int _{\Omega } \rho ^{1-\frac{p}{d}}, \end{aligned}$$

with \(\beta _{\textsf{M}}\) as in (1.2). Moreover, if \(\rho \) is the uniform density and \(\Omega \) has \(C^2\) boundary, then the above is a \(\mathbb {P}\)-a.s. limit and equality holds.

Remark 1.6

In the case of the matching problem, combining ideas from this paper and [25] the conclusion of Corollary 1.5 could be extended to every \(p\ge 1\) (at least in expectation).

1.2 Comments on the proof technique

Our proof leverages on the techniques developed for the bipartite matching problem, in particular [4, 25] to carefully estimate the defects in a geometric subadditivity argument. Comparing the approach in [6], which works if \(p<d/2\), with that in [25], which holds instead for any p, a crucial difference is that the errors due to local oscillations in the two distributions of points are mitigated in the latter by spreading them evenly across all the points. This is possible since the optimal transport relaxation allows for general couplings as well as continuous densities, rather than discrete matchings only.

The overall strategy is thus to find a suitable replacement for such operation in the purely combinatorial setting. The starting point is Proposition 3.7 where we prove a subadditivity inequality. The problem is then to estimate the defect in subadditivity. This is achieved by combining the following three key observations.

The first one is to bound from above the cost of the problem over any two point sets \((x_i)_{i=1}^n\), \((y_j)_{j=1}^n\) by the sum of a term of order \(n^{1-p/d}\) plus the bipartite matching cost between the two point sets. This is stated as an assumption (A5), but can be easily checked on many specific problems (Lemma 3.10): being an upper bound, it usually suffices to combine an optimal matching with the solution to an additional non-bipartite combinatorial optimization problem, such as the TSP, to build a feasible solution. This approach was first successfully used in [13] (see also [4]) for the random bipartite TSP in the case \(p=d=2\), where one can simply argue that the main contribution comes from the logarithmic corrections in the matching cost.

The second key observation is that for point sets mostly made of i.i.d. points (while much less is assumed on the remaining ones), it is still possible to obtain good bounds for the matching cost. We refer to Sect. 6 for the precise statements, but the underlying idea is strongly related to bounds for the optimal transport cost in terms of the negative Sobolev norms—thus relying again on the PDE ansatz originally introduced in the statistical physics literature.

The third observation is that, in order to ensure that a small fraction of i.i.d. uniformly distributed points can indeed be found in the subadditivity defect terms, it is enough to keep them out of the optimization procedure on the smaller scales. As usual with those arguments, the proof of existence of the limit is performed first on the Poisson version of the random problem, so to retain a fraction of points we perform a thinning procedure.

Besides these main ideas, plenty of technical modifications with respect to the arguments in [6] and [4, 25] are required, e.g. in order to establish improved subadditivity inequalities (Proposition 3.7) and to extend the Whitney-type decomposition argument from [4] to \(p\ne 2\).

1.3 Further questions and conjectures

Our results raise several questions about costs and properties of solutions to Euclidean random combinatorial optimization problems over two point sets. We list here a few which we believe are worth exploring.

  1. 1.

    Existence of a limit in (1.3) for non-uniform densities is rather easy to conjecture, but so far our techniques do not improve upon [6], hence the problem remains largely open.

  2. 2.

    Our techniques break down if \(p \ge d\) for many reasons. In particular, the idea of leaving out a small fraction of i.i.d. points to improve the estimate of the defect terms seems to fail. It is however natural to conjecture that Theorem 1.1 should hold also in that range.

  3. 3.

    In this work we considered only the case of compactly supported densities \(\rho \). It would be interesting to investigate the case where the support is \(\mathbb {R}^d\). To the best of our knowledge, the only results available so far in this direction are [35, 37] where the correct rates are established for the Gaussian density in the case of the matching problem.

  4. 4.

    The assumptions in [6] are slightly different than ours, although the specific problems considered therein satisfy both. It would be interesting to find examples which satisfy only one set of these, or possibly simplify even more our assumptions.

  5. 5.

    Many problems, such as the bounded degree minimum spanning tree, but also the bipartite matching problem itself, can be naturally formulated also for two families of points with different number of elements: it could be of interest to investigate limit results also in those cases.

  6. 6.

    The cases \(d \in \left\{ 1,2 \right\} \) are necessarily excluded by our analysis, since subadditivity arguments do not apply already for the random bipartite matching problem. It is however already an open question, whether the additional logarithmic correction indeed appears in the asymptotic rates for many other problems. As an example, we mention that for the Euclidean minimum spanning tree over two random point sets (without any uniform bound on the degree) no logarithmic corrections appear [19], but the maximum degree is unbounded, hence it is not covered by our results.

  7. 7.

    In the deterministic literature, for the TSP and other NP-hard Euclidean combinatorial optimization problems, polynomial time approximation schemes are known [5] for any (fixed) dimension d, as the number of points grows. Can our approach lead to similar schemes for problems on two families of points, possibly under some mild regularity assumption on their spatial distributions?

1.4 Structure of the paper

In Sect. 2 we first introduce some general notation. We then discuss Whitney-type decompositions, Sobolev spaces as well as recall useful known facts on the Optimal Transport problem, and possibly some novel ones (Proposition 2.9). We close the section with a variant of the standard subadditivity (Fekete-type) arguments, suited for our purposes together with some simple concentration inequalities. Section 3 is devoted to the combinatorial optimization problems we consider, discussing in particular the main assumptions that we require and some useful consequences. In Sect. 4 we establish a variant our main result in the case of Poisson point processes and in Sect. 5 we use it to deduce Theorem 1.1. These two sections in fact rely upon the novel bounds for the Euclidean assignment problem that we finally establish in Sect. 6.

2 Notation and preliminary results

2.1 General notation

Given \(n \in \mathbb {N}\), we write \([n] = \left\{ 1, \ldots , n \right\} \) and \([n]_1 = \left\{ (1,i) \right\} _{i=1}^n\), \([n]_2 = \left\{ (2,i) \right\} _{i=1}^n\), which easily allows to define two disjoint copies of [n]. Given a finite set A, we write |A| for the number of its elements, while, if \(A \subseteq \mathbb {R}^d\) is infinite, |A| denotes its Lebesgue measure.

Given a metric space \((\Omega , \textsf{d})\), \(x \in \Omega \), \(A\subseteq \Omega \), we write \(\textsf{d}(x, A) = \min _{y \in A}\left\{ \textsf{d}(x,y) \right\} \) and \({\text {diam}}(A) = \sup _{x,y \in A} \textsf{d}(x,y)\). We endow every set \(\Omega \subseteq \mathbb {R}^d\) with the Euclidean distance. A partition \(\left\{ \Omega _k \right\} _{k=1}^K\) of a set \(\Omega \) is always intended up to a set of Lebesgue measure zero. A rectangle \(R \subseteq \mathbb {R}^d\) is a subset of the form \(R = \prod _{i=1}^d (x_i,x_i+L_i)\), and is said to be of moderate aspect ratio if for every ij, \(L_i/L_j\le 2\). If \(L_i= L\) for every i, then \(R = Q\) is a cube of side length L. We write \(Q_L = (0,L)^d\). We write \(I_\Omega \) for the indicator function of a set \(\Omega \).

2.2 Families of points

Given a set \(\Omega \), we consider finite ordered families of points \(\textbf{x}= \left( x_i \right) _{i=1}^n \subseteq \Omega \), with \(n \in \mathbb {N}\), letting \(\textbf{x}= \emptyset \) if \(n=0\). For many purposes the order will not be relevant, but we thus may allow e.g. for repetitions (which will be probabilistically negligible anyway). Given a family \(\textbf{x}\subseteq \mathbb {R}^d\), we write \(\mu ^{\textbf{x}} = \sum _{i=1}^n \delta _{x_i}\) for the associated empirical measure and, for every (Borel) \(\Omega \subseteq \mathbb {R}^d\), we let \(\textbf{x}(\Omega ) = \mu ^{\textbf{x}}(\Omega )\). In the special case \(\Omega = \mathbb {R}^d\), we simply write \(|\textbf{x}| = \textbf{x}(\mathbb {R}^d) = \mu ^{\textbf{x}}(\mathbb {R}^d)\) for the total number of points (counted with multiplicity). We also write \(\textbf{x}_ \Omega \) for its restriction to \(\Omega \), i.e., the family of all points \(x_i \in \Omega \), so that \(\textbf{x}= \textbf{x}_{\Omega }\) if \(\textbf{x}\subseteq \Omega \) (conventionally, we naturally re-index it over \(i=1, \ldots , \textbf{x}(\Omega )\) with the order inherited from that in \(\textbf{x}\)). Given \(\textbf{x}= \left( x_i \right) _{i=1}^n\), \(\textbf{y}= \left( y_j \right) _{j=1}^m\subseteq \mathbb {R}^d\), their union is \(\textbf{x}\cup \textbf{y}= (x_1, \ldots , x_n, y_1, \ldots , y_m)\). Strictly speaking, the union should be called concatenation, since the operation is not commutative, in general.

2.3 Whitney partitions

We recall the following partitioning result [4, Lemma 5.1].

Lemma 2.1

Let \(\Omega \subset \mathbb {R}^d\) be a bounded domain with Lipschitz boundary and let \(\mathcal {Q}= \{Q_i\}_i\) be a Whitney partition of \(\Omega \). Then, for every \(\delta >0\) sufficiently small, letting \(\mathcal {Q}_\delta =\{Q_i \, \ {\text {diam}}(Q_i) \ge \delta \}\), there exists a finite family \(\mathcal {R}_\delta =\{\Omega _j\}_j\) of disjoint open sets such that:

  1. (i)

    \((\Omega _k)_{k=1}^K = \mathcal {Q}_\delta \cup \mathcal {R}_\delta \) is a partition of \(\Omega \),

  2. (ii)

    \( |\Omega _k| \sim {\text {diam}}(\Omega _k)^d\) for every \(k=1, \ldots , K\),

  3. (iii)

    if \(\Omega _k \in \mathcal {Q}_\delta \), then \({\text {diam}}(\Omega _k) \sim \textsf{d}(x, \Omega ^c)\) for every \(x \in \Omega _k\),

  4. (iv)

    if \(\Omega _k \in \mathcal {R}_\delta \), then \({\text {diam}}(\Omega _k)\sim \delta \) and \(\textsf{d}(x, \Omega ^c) \lesssim \delta \), for every \(x \in \Omega _k\).

Here all the implicit constants depend only on the initial partition \(\mathcal {Q}\) (and not on \(\delta \)).

For later use, we collect some useful bounds related to these partitions.

Lemma 2.2

Let \(\Omega \subset \mathbb {R}^d\) be a bounded domain with Lipschitz boundary and let \(\mathcal {Q}= \{Q_i\}_i\) be a Whitney partition of \(\Omega \). Then, for every \(\delta >0\) sufficiently small, letting \((\Omega _k)_{k=1}^K = \mathcal {Q}_\delta \cup \mathcal {R}_\delta \) as in Lemma 2.1, one has that \(|\mathcal {R}_\delta | \lesssim \delta ^{1-d}\) and the following holds:

  1. (1)

    For every \(\alpha \in \mathbb {R}\),

    $$\begin{aligned} \sum _{k=1}^K {\text {diam}}(\Omega _k)^{\alpha } \lesssim _\alpha {\left\{ \begin{array}{ll} 1 &{} \text {if}\,\alpha >d-1,\\ |\log \delta |&{}\ \text {if}\,\alpha =d-1,\\ \delta ^{1-d+\alpha } &{} \text {if}\,\alpha <d-1. \end{array}\right. } \end{aligned}$$
    (2.1)
  2. (2)

    If \(\alpha <0\), then for every \(k =1,\ldots , K\), and \(x \in \Omega _k\),

    $$\begin{aligned} \sum _{j=1}^K {\text {diam}}(\Omega _j)^{\alpha } \min \left\{ 1, \left( \frac{{\text {diam}}(\Omega _j)}{\textsf{d}(x, \Omega _j)} \right) ^{d-1} \right\} \lesssim \delta ^{\alpha } |\log (\delta )|. \end{aligned}$$
    (2.2)

In all the inequalities the implicit constants depend upon \(\alpha \) and \(\mathcal {Q}\) in (2.1) only.

By property (ii), inequality (2.1) also holds for the sum \(\sum _{k=1}^K |\Omega _k|^{\alpha }\), with \(\alpha d\) instead of \(\alpha \).

Proof

Since \(\partial \Omega \) is Lipschitz, it follows from the properties of the partition that, for every \(x \in \Omega \) and \(r \ge s \ge \delta \),

$$\begin{aligned} \left| \left\{ k\,: \, \Omega _k\subseteq B(x, r), {\text {diam}}(\Omega _k) \in [s, 2s) \right\} \right| \lesssim (r/s)^{d-1}, \end{aligned}$$
(2.3)

with the implicit constant depending on \(\mathcal {Q}\) only. It follows that \(|\mathcal {R}_\delta | \lesssim \delta ^{1-d}\) and, for every \(\ell \le |\log _2 \delta |\), the number of cubes \(\Omega _k \in \mathcal {Q}_\delta \) with \({\text {diam}}(\Omega _k) \in [2^{-\ell }, 2^{-\ell +1})\) is estimated by \( 2^{\ell (d-1)}\). Therefore, for \(\alpha \in \mathbb {R}\),

$$\begin{aligned} \begin{aligned} \sum _{k=1}^K {\text {diam}}(\Omega _k)^{\alpha }&\lesssim \sum _{\Omega _k \in \mathcal {Q}_{\delta }} {\text {diam}}(\Omega _k)^{\alpha } + \sum _{\Omega _k \in \mathcal {R}_\delta } {\text {diam}}(\Omega _k)^{\alpha } \\&\lesssim \sum _{\ell \le |\log _2 \delta |} \left| \left\{ \Omega _k \in \mathcal {Q}_{\delta }: {\text {diam}}(Q_k) \in [2^{-\ell }, 2^{-\ell +1}) \right\} \right| 2^{-\ell \alpha } + |\mathcal {R}_\delta | \cdot \delta ^{\alpha }\\&\lesssim \sum _{\ell \le |\log _2 \delta |} 2^{\ell (d-1)}\cdot 2^{-\ell \alpha } + \delta ^{1-d} \cdot \delta ^{\alpha }. \end{aligned} \end{aligned}$$

Since \(\ell \) is also bounded from below in the summation (e.g. by \(-|\log _2{\text {diam}}(\Omega )|\)), we obtain (2.1).

We next prove (2.2). We claim that it follows from the following inequalities, valid for any \(\gamma \in \mathbb {N}\):

$$\begin{aligned} \sum _{j\,: \, \textsf{d}(x, \Omega _j) \le 2^{-\gamma } {\text {diam}}(\Omega _k)} {\text {diam}}(\Omega _j)^\alpha \lesssim 2^{-\gamma (d-1)} {\text {diam}}(\Omega _k)^{d-1} \delta ^{\alpha +1-d}, \end{aligned}$$
(2.4)

and, for \(\beta <d-1\),

$$\begin{aligned} \sum _{j \,: \, \textsf{d}(x,\Omega _j)>2^{-\gamma } {\text {diam}}(\Omega _k)} \frac{{\text {diam}}(\Omega _j)^\beta }{\textsf{d}(x, \Omega _j)^{d-1}} \lesssim |\gamma + \log \left( {\text {diam}}(\Omega _k) \right) | \delta ^{\beta +1-d}. \end{aligned}$$
(2.5)

Indeed, we can split the summation and use (2.4) and (2.5) to get

$$\begin{aligned} \begin{aligned} \sum _{j}&{\text {diam}}(\Omega _j)^{\alpha } \min \left\{ 1, \left( \frac{{\text {diam}}(\Omega _j)}{\textsf{d}(x, \Omega _j)} \right) ^{d-1} \right\} \\&\lesssim \sum _{j\,: \, \textsf{d}(x, \Omega _j) \le 2^{-\gamma } {\text {diam}}(\Omega _k)} {\text {diam}}(\Omega _j)^\alpha + \sum _{j\,: \, \textsf{d}(x, \Omega _j) > 2^{-\gamma } {\text {diam}}(\Omega _k)} \frac{{\text {diam}}(\Omega _j)^{d-1+\alpha }}{\textsf{d}(x, \Omega _j)^{d-1}}\\&\lesssim 2^{-\gamma (d-1)} {\text {diam}}(\Omega _k)^{d-1} \delta ^{\alpha +1-d} + |\gamma + \log \left( {\text {diam}}(\Omega _k) \right) | \delta ^{\alpha }. \end{aligned}\nonumber \\ \end{aligned}$$
(2.6)

Recalling that \({\text {diam}}(\Omega _k) \gtrsim \delta \) and choosing \(\gamma \) so that \(2^{-\gamma } \le \delta \le 2^{-\gamma +1}\) yields (2.2).

In order to prove (2.4) and (2.5) we first notice that, given \(\Omega _k\), \(\Omega _j\) and \(x \in \Omega _k\), we have that, for some constant \(C = C(\mathcal {Q})\),

$$\begin{aligned} \Omega _j \subseteq B(x, C \max \left\{ \textsf{d}(x, \Omega _j), {\text {diam}}(\Omega _k) \right\} ). \end{aligned}$$
(2.7)

Indeed, if \(\Omega _j \in \mathcal {R}_\delta \), then \({\text {diam}}(\Omega _j)\lesssim \delta \lesssim {\text {diam}}(\Omega _k)\), hence (2.7) holds. If instead \(\Omega _j \in \mathcal {Q}_\delta \), then we can find \(y \in \Omega _j\) with \(|x-y|\le 2 \textsf{d}(x,\Omega _j)\), so that, by the triangle inequality,

$$\begin{aligned} \textsf{d}(y, \Omega ^c) \le |x-y| + \textsf{d}(x, \Omega ^c) \lesssim \max \left\{ \textsf{d}(x, \Omega _j), {\text {diam}}(\Omega _k) \right\} \end{aligned}$$
(2.8)

and by property (iii) in Lemma 2.1 we obtain that \({\text {diam}}(\Omega _j) \lesssim \max \left\{ \textsf{d}(x, \Omega _j), {\text {diam}}(\Omega _k) \right\} \), yielding again the desired inclusion.

Hence, we prove (2.4) and (2.5). Let \(\ell _k \le |\log _2 \delta |\) be such that \({\text {diam}}(\Omega _{k}) \in [2^{-\ell _k}, 2^{-\ell _k+1})\). Combining (2.7) and (2.3), we see that, for every \(\ell \le |\log _2\delta |\), there are at most \(2^{(\ell -\ell _k-\gamma )(d-1)}\) sets \(\Omega _j\) such that \(\textsf{d}(x, \Omega _j) \le 2^{-\gamma }{\text {diam}}(\Omega _k)\) and \({\text {diam}}(\Omega _j) \in [2^{-\ell }, 2^{-\ell +1})\). Therefore,

$$\begin{aligned} \begin{aligned} \sum _{j\,: \, \textsf{d}(x, \Omega _j) \le 2^{-\gamma } {\text {diam}}(\Omega _k)} {\text {diam}}(\Omega _j)^{\alpha }&\lesssim \sum _{\ell \le |\log _2\delta |} 2^{-\ell \alpha } 2^{(\ell -\ell _k)(d-1)} \\&\lesssim 2^{-(\gamma +\ell _k)(d-1)} \sum _{\ell \le |\log _2\delta |} 2^{-\ell (\alpha +1-d)} \\ {}&\lesssim 2^{-\gamma (d-1)}{\text {diam}}(\Omega _k)^{d-1} \delta ^{\alpha +1-d}. \end{aligned} \end{aligned}$$
(2.9)

This proves (2.4). To prove (2.5), we split dyadically,

$$\begin{aligned} \begin{aligned} \sum _{j \,: \, \textsf{d}(x,\Omega _j)>2^{-\gamma }{\text {diam}}(\Omega _k)} \frac{{\text {diam}}(\Omega _j)^\beta }{d(x,\Omega _j)^{d-1}}&\lesssim \sum _{\ell \le \ell _k+\gamma } \frac{1}{(2^{-\ell })^{d-1}} \sum _{ j\,: \, d(x,\Omega _j) \in [2^{-\ell }, 2^{-\ell +1})} {\text {diam}}(\Omega _j)^\beta \\&{\mathop {\lesssim }\limits ^{(2.7)}} \sum _{\ell \le \ell _k+\gamma } 2^{\ell (d-1)} \sum _{ \Omega _j \subset B(x,C 2^{-\ell })} {\text {diam}}(\Omega _j)^\beta . \end{aligned} \end{aligned}$$
(2.10)

Let us also notice that, if \(\Omega _j \subseteq B(x, C2^{-\ell })\), then necessarily \(\delta \le {\text {diam}}(\Omega _j) \lesssim 2^{-\ell }\) (since \({\text {diam}}(\Omega _j)^d \sim |\Omega _j|\)). Thus for \(\ell '\) with \(2^{-\ell '} \sim 2^{-\ell }\),

$$\begin{aligned} \begin{aligned} \sum _{ \Omega _j \subset B(x, C 2^{-\ell })}{\text {diam}}(\Omega _j)^\beta&\lesssim \sum _{\ell ' \le u \le |\log _2 \delta |} 2^{-u \beta } \sharp {\left\{ \Omega _j \subseteq B(x, C 2^{-\ell }) \,: {\text {diam}}(\Omega _j) \in [2^{-u}, 2^{-u+1}) \right\} }\\&{\mathop {\lesssim }\limits ^{(2.3)}} \sum _{\ell ' \le u \le |\log _2 \delta |} 2^{-u\beta } \cdot 2^{(u-\ell ) (d-1) }= 2^{-\ell (d-1)} \sum _{\ell ' \le u \le |\log _2 \delta |} 2^{-u(\beta +1-d)}\\&\lesssim 2^{-\ell (d-1)} \delta ^{\beta +1-d}, \end{aligned} \end{aligned}$$
(2.11)

using again that \(\ell '\) is bounded from below by a constant depending on \(\mathcal {Q}\) only. Plugging this bound in (2.10), we conclude that

$$\begin{aligned} \begin{aligned} \sum _{j \,: \, \textsf{d}(x,\Omega _j)>2^{-\gamma } {\text {diam}}(\Omega _k)} \frac{{\text {diam}}(\Omega _j)^\beta }{d(x,\Omega _j)^{d-1}}&\le \sum _{\ell \le \ell _k+\gamma } 2^{\ell (d-1)} \cdot 2^{-\ell (d-1)} \delta ^{\beta +1-d}\\&\lesssim \left( \gamma + |\log \left( {\text {diam}}(\Omega _k) \right) | \right) \delta ^{\beta +1-d}. \end{aligned} \end{aligned}$$
(2.12)

This concludes the proof of (2.5). \(\square \)

2.4 Sobolev norms

Given a bounded domain \(\Omega \subseteq \mathbb {R}^d\) with Lipschitz boundary and \(p \in (1,\infty )\), with Hölder conjugate \(q = p/(p-1)\), we write \(\Vert f \Vert _{L^p(\Omega )}\) for the Lebesgue norm of f, and

$$\begin{aligned} \Vert f\Vert _{W^{-1,p}(\Omega )}=\sup _{|\nabla \phi |_{L^{q}(\Omega )}\le 1} \int _{\Omega } f \phi =\inf _{\text {div} \xi =f} \Vert \xi \Vert _{L^p(\Omega )} \end{aligned}$$

for the negative Sobolev norm. We notice in particular that if \(\Vert f\Vert _{W^{-1,p}(\Omega )}<\infty \) then \(\int _\Omega f=0\). In this case we may also restrict the supremum to functions \(\phi \) having also average zero. When it is clear from the context, we will drop the explicit dependence on \(\Omega \) in the norms.

Let us recall that we can bound the \(W^{-1,p}\) norm by the \(L^p\) norm. We give here a proof based on the embedding \(L^{pd/(p+d)}\subset W^{-1,p}\) (for \(p>d/(d-1)\)) which is an elementary alternative to the PDE arguments used in [25, Lemma 3.4].

Lemma 2.3

Let \(\Omega \) be a bounded domain with Lipschitz boundary and let \(f:\Omega \rightarrow \mathbb {R}\) such that \(\int _\Omega f=0\). Then, for every \(p>d/(d-1)\),

$$\begin{aligned} \Vert f\Vert _{W^{-1,p}(\Omega )}\lesssim |\Omega |^{\frac{1}{d}} \Vert f\Vert _{L^p(\Omega )}. \end{aligned}$$
(2.13)

Moreover, the implicit constant depends on \(\Omega \) only through the corresponding constant for the Sobolev embedding.

Proof

Let q be the Hölder conjugate of p, \(q^*\) the Sobolev conjugate of q and \(p^*=pd/(p+d)\) the Hölder conjugate of \(q^*\). We then have for every \(\phi \) with \(\Vert \nabla \phi \Vert _{L^q(\Omega )}\le 1\),

$$\begin{aligned}{} & {} \int _{\Omega } f \phi \le \left(\int _{\Omega } |f|^{p^*}\right)^\frac{1}{p^*} \left(\int _{\Omega } |\phi |^{q^*}\right)^{\frac{1}{q^*}} \\{} & {} \quad \lesssim \left(\int _{\Omega } |f|^{p^*}\right)^\frac{1}{p^*} \left(\int _{\Omega } |\nabla \phi |^{q}\right)^{\frac{1}{q}}\le \left(\int _{\Omega } |f|^{p^*}\right)^\frac{1}{p^*}. \end{aligned}$$

Using that \(p^*<p\) and Hölder inequality concludes the proof of (2.13). \(\square \)

As in [4], (2.13) will however not be precise enough when estimating the error in subadditivity in the case of general densities and domains. We will instead rely on gradient bounds for the Green kernel \((G(x,y))_{x,y\in \Omega }\) of the Laplacian with Neumann boundary conditions to obtain sharper estimates. See [2, 3, 23, 35] for related results. Let us however point out that in our case we will not rely on any stochastic cancellation in the form of Rosenthal inequality [43] but will instead use a purely deterministic estimate. We will assume that

$$\begin{aligned} \left| \nabla _x G(x,y) \right| \lesssim |x-y|^{1-d}, \quad \text {for every}\, x, y \in \Omega , \end{aligned}$$
(2.14)

where the implicit constant depends uniquely on \(\Omega \).

Remark 2.4

This condition is satisfied for instance if \(\Omega \) is \(C^2\) or convex, see e.g. [50]. Notice that since it is a local condition it also holds for \(Q\backslash \Omega \) where Q is a cube and \(\Omega \) a \(C^2\) open set with \(d(\partial Q,\partial \Omega )>0\).

Remark 2.5

Let us point out that as in [35], instead of (2.14) it would have been enough to have \(L^p\) bounds (for the same p as for the cost \(\mathcal {C}_{\textsf{P}}^p\)) on the Riesz transform for the Neumann Laplacian. From the available results for the Dirichlet Laplacian [31, 44], we expect that for every Lipschitz domain there is \(p>3\) (depending on the domain) for which these bounds hold. In particular, this would allow to extend the validity of Theorem 1.1 to every Lipschitz domain when \(d=3\). However, since we were not able to find in the literature the corresponding results for the case of Neumann boundary conditions we kept the stronger hypothesis (2.14).

We then have

Lemma 2.6

Let \(\Omega \subset \mathbb {R}^d\) be a bounded domain with Lipschitz boundary, such that (2.14) holds and let \(\rho \) be a density bounded above and below on \(\Omega \). For \(\delta >0\) sufficiently small, let \((\Omega _k)_{k=1}^K = \mathcal {Q}_\delta \cup \mathcal {R}_\delta \) as in Lemma 2.1. If there exists \(h>0\) such that \(|b_k|\le h^{\frac{1}{2}} |\Omega _k|^{\frac{1}{2}}\) for \(k=1, \ldots , K\), then for every \(p\ge 1\),

$$\begin{aligned} \left\Vert\sum _{k=1}^K \frac{b_k}{\rho (\Omega _k)}( I_{\Omega _k} -\rho (\Omega _k))\rho \right\Vert_{W^{-1,p}(\Omega )}\lesssim \delta ^{1-\frac{d}{2}} |\log (\delta ) |h^{\frac{1}{2}}. \end{aligned}$$
(2.15)

Proof

Set

$$\begin{aligned} B_k = \frac{ b_k}{\rho (\Omega _k)}, \qquad f_k= \left( I_{\Omega _k} - \rho (\Omega _k) \right) \rho . \end{aligned}$$

Let then \(\phi _k\) denotes the solution to the equation \(\Delta \phi _k = f_k\), with null Neumann boundary conditions on \(\Omega \) and use as competitor \(\xi =\sum _{k=1}^K B_k \nabla \phi _k\) in the definition of the \(W^{-1,p}\) norm. We get,

$$\begin{aligned} \left\| \sum _{k=1}^K B_k f_k \right\| _{W^{-1,p}(\Omega )}^p \le \int _{\Omega } \left| \sum _{k=1}^K B_k {\nabla \phi _k} \right| ^p \lesssim h^{\frac{p}{2}} \int _{\Omega } \left( \sum _{k=1}^K |\Omega _k|^{-\frac{1}{2}} \left| \nabla \phi _k \right| \right) ^p.\nonumber \\ \end{aligned}$$
(2.16)

To bound the last term, we use the integral representation in terms of the Green’s function,

$$\begin{aligned} \phi _k = \int _{\Omega } G(x,y) f_k(y) d y, \end{aligned}$$

to obtain that, for every \(x\in \Omega \),

$$\begin{aligned} |\nabla \phi _k(x)|\lesssim \min \left\{ {\text {diam}}(\Omega _k), \frac{|\Omega _k|}{\textsf{d}(x,\Omega _k)^{d-1}} \right\} . \end{aligned}$$
(2.17)

Indeed, by (2.14),

$$\begin{aligned}\begin{aligned} |\nabla \phi _k(x)|&\lesssim \int _{\Omega _k} \frac{dy}{|x-y|^{d-1}}+ |\Omega _k| \int _{\Omega } \frac{dy}{|x-y|^{d-1}} \le \int _{\left\{ |y|\le {\text {diam}}(\Omega _k) \right\} } \frac{dy}{|y|^{d-1}} +|\Omega _k|\\&\lesssim {\text {diam}}(\Omega _k). \end{aligned} \end{aligned}$$

Moreover, for \(x\notin \Omega _k\), we get directly from (2.14),

$$\begin{aligned} |\nabla \phi _k(x)|\lesssim \frac{|\Omega _k|}{\textsf{d}(x,\Omega _k)^{d-1}}. \end{aligned}$$

For any \(k=1, \ldots , K\) and \(x \in \Omega _k\), we then estimate

$$\begin{aligned} \begin{aligned} \sum _{j=1}^K |\Omega _j|^{-\frac{1}{2}} \left| \nabla \phi _j(x) \right|&{\mathop {\lesssim }\limits ^{(2.17)}} \sum _{j=1}^K {\text {diam}}(\Omega _j)^{1-d/2} \min \left\{ 1, \left( \frac{{\text {diam}}(\Omega _j)}{\textsf{d}(x, \Omega _j)} \right) ^{d-1} \right\} \\&\lesssim \delta ^{1-d/2}|\log (\delta )| \end{aligned} \end{aligned}$$

having used inequality (2.2) from Lemma 2.2 with \(\alpha = 1-d/2\).

Therefore, we can split the integration

$$\begin{aligned} \begin{aligned} \int _{\Omega } \left( \sum _{j=1}^K |\Omega _j|^{-\frac{1}{2}} \left| \nabla \phi _j \right| \right) ^p&= \sum _{k=1}^K \int _{\Omega _k} \left( \sum _{j=1}^K |\Omega _j|^{-\frac{1}{2}} \left| \nabla \phi _j \right| \right) ^p \lesssim \delta ^{(1-d/2)p} | \log (\delta ) |^p \end{aligned} \end{aligned}$$

In combination with (2.16) this concludes the proof of (2.15). \(\square \)

2.5 Optimal transport

Given two positive Borel measures \(\mu \), \(\lambda \) on \(\mathbb {R}^d\) with \(\mu (\mathbb {R}^d) = \lambda (\mathbb {R}^d) \in (0, \infty )\) and finite p-th moments, the optimal transport cost of order \(p\ge 1\) between \(\mu \) and \(\lambda \) is defined as the quantity

$$\begin{aligned} \textsf{W}^p(\mu , \lambda ) = \min _{\pi \in \Gamma (\mu ,\lambda )} \int _{\mathbb {R}^d\times \mathbb {R}^d} {|x-y|}^p d \pi (x,y), \end{aligned}$$

where \(\Gamma (\mu , \lambda )\) is the set of couplings between \(\mu \) and \(\lambda \), i.e., finite Borel measures \(\pi \) on the product \(\mathbb {R}^d\times \mathbb {R}^d\) such that their marginals are respectively \(\mu \) and \(\lambda \). Notice that if \(\mu (\mathbb {R}^d) = \lambda (\mathbb {R}^d) = 0\) then \(\textsf{W}^p(\mu , \lambda ) = 0\), while if \(\mu (\Omega ) \ne \lambda (\Omega )\), we conveniently extend the definition setting \(\textsf{W}^p(\mu , \lambda ) = \infty \). Let us recall that the triangle inequality for the Wasserstein distance of order p (which is defined as the the p-th root of \(\textsf{W}^p(\mu , \lambda )\)) yields

$$\begin{aligned} \textsf{W}^p( \mu , \nu ) \lesssim \textsf{W}^p(\mu , \lambda ) + \textsf{W}^p(\nu , \lambda ), \end{aligned}$$
(2.18)

A straightforward, but useful subadditivity inequality is

$$\begin{aligned} \textsf{W}^p\left( \sum _{k} \mu _k, \sum _{k} \nu _k \right) \le \sum _{k} \textsf{W}^p(\mu _k, \nu _k). \end{aligned}$$
(2.19)

valid for any (countable) family of measures \((\mu _k, \nu _k)_{k}\).

To keep notation simple, we write

$$\begin{aligned} \textsf{W}^p_{\Omega } (\mu , \lambda ) = \textsf{W}^p(\mu \lnot \Omega , \lambda \lnot \Omega ). \end{aligned}$$

and, if a measure is absolutely continuous with respect to Lebesgue measure, we only write its density. For example, \(\textsf{W}^p_{\Omega } \left( \mu , \mu (\Omega )/|\Omega | \right) \) denotes the transportation cost between \(\mu \lnot \Omega \) to the uniform measure on \(\Omega \) with total mass \(\mu (\Omega )\).

For \(q \ge p\), Jensen inequality gives

$$\begin{aligned} \textsf{W}^p_{\Omega } (\mu , \nu ) \le \mu (\Omega )^{1-\frac{p}{q}} \left( \textsf{W}_{\Omega }^q (\mu , \nu ) \right) ^{\frac{p}{q}}. \end{aligned}$$
(2.20)

Our arguments make substantial use of two crucial properties of the optimal transport cost. The first one [25, Lemma 3.1] is a simple consequence of (2.18) and (2.19).

Lemma 2.7

For every \(p\ge 1\), there exists a constant \(C>0\) depending only on p such that the following holds. Let \(\Omega \subseteq \mathbb {R}^d\) be Borel and \((\Omega _k)_{k\in \mathbb {N}}\) be a countable Borel partition of \(\Omega \). Then, for finite measures \(\mu \), \(\lambda \), and \(\varepsilon \in (0,1)\), we have the inequality

$$\begin{aligned} \textsf{W}^p_{\Omega }\left(\mu , \alpha \lambda \right)\le (1+\varepsilon )\sum _k \textsf{W}^p_{\Omega _k}\left(\mu , \alpha _k \lambda \right)+ \frac{C}{\varepsilon ^{p-1}}\textsf{W}^p_\Omega \left(\sum _k \alpha _k I_{\Omega _k} \lambda , \alpha \lambda \right). \end{aligned}$$
(2.21)

where \(\alpha = \mu (\Omega )/\lambda (\Omega )\) and \(\alpha _k = \mu (\Omega _k)/\lambda (\Omega _k)\).

The second one is [4, Lemma 2.2] which gives an upper bound for the Wasserstein distance in terms of a negative Sobolev norm. It follows from the Benamou-Brenier formulation of the optimal transport problem (see also [41, Corollary 3]).

Lemma 2.8

Assume that \(\Omega \subseteq \mathbb {R}^d\) is a bounded connected open set with Lipschitz boundary. If \(\mu \) and \(\lambda \) are measures on \(\Omega \) with \(\mu (\Omega ) = \lambda (\Omega )\), absolutely continuous with respect to the Lebesgue measure and \(\inf _\Omega \lambda >0\), then, for every \(p\ge 1\),

$$\begin{aligned} W_{\Omega }^p(\mu ,\lambda )\lesssim \frac{1}{\inf _{\Omega } \lambda ^{p-1}}\left\| \mu - \lambda \right\| _{W^{-1,p}(\Omega )}^p. \end{aligned}$$
(2.22)

As in many recent works on the matching problem, we will use this inequality to improve on the trivial bound

$$\begin{aligned} \textsf{W}^p_{\Omega } (\mu , \lambda ) \le {\text {diam}}(\Omega )^p \mu (\Omega ), \end{aligned}$$
(2.23)

which holds as soon as \(\mu (\Omega ) = \lambda (\Omega )\). Much of our effort in the proofs will be ultimately to deal with an intermediate situation, where the measures can be decomposed as the sum of a “good” part, i.e., absolutely continuous with smooth density and a “bad” remainder about which not much can be assumed. We prove here a general inequality which could also be of independent interest.

Proposition 2.9

Let \(\Omega \subseteq \mathbb {R}^d\) be a bounded Lipschitz domain, \(\rho \) be a density bounded above and below on \(\Omega \), \(\mu \) be any finite measure on \(\Omega \) and \(h >0\). Then, for every \(p>d/(d-1)\),

$$\begin{aligned} \textsf{W}^p_\Omega \left( \mu +h\rho , \alpha \rho \right) \lesssim \frac{1}{h^{\frac{p}{d}}} \mu (\Omega )^{1+\frac{p}{d}}, \end{aligned}$$
(2.24)

where \(\alpha = \frac{\mu (\Omega )}{\rho (\Omega )} +h\). Moreover, this inequality is invariant by rescaling of \(\Omega \).

Proof

By scaling we may assume that \(|\Omega |=1\). Notice that, by the trivial bound

$$\begin{aligned} \textsf{W}^p(\mu +h\rho , \alpha \rho )\lesssim \mu (\Omega ), \end{aligned}$$

we can assume that \(\mu (\Omega ) \ll h\). Let \(P_t\) be the heat semi-group with null Neumann boundary conditions on \(\Omega \) and set \(\mu _t=P_t\mu \). By triangle inequality (2.18) and (2.22), we have

$$\begin{aligned} \begin{aligned} \textsf{W}^p(\mu +h\rho , \alpha \rho )&\lesssim \textsf{W}^p(\mu +h\rho , \mu _t+h\rho )+\textsf{W}^p(\mu _t+h\rho , \alpha \rho )\\&\lesssim \textsf{W}^p\left( \mu , \mu _t \right) +\frac{1}{h^{p-1}}\left\| \mu _t- \frac{\mu (\Omega )}{\rho (\Omega )}\rho \right\| ^p_{W^{-1,p}}\\&\lesssim t^\frac{p}{2} \mu (\Omega ) +\frac{1}{h^{p-1}}\left\| \mu _t- \frac{\mu (\Omega )}{\rho (\Omega )}\rho \right\| ^p_{W^{-1,p}}. \end{aligned} \end{aligned}$$

We now estimate the last term. For this let q be the Hölder conjugate exponent of p, i.e., \(q=p/(p-1) \in (1, d)\) and \(q^* = qd/(d-q)\) be the Sobolev conjugate of q. We first use the triangle inequality and the fact that \(\rho \) is bounded from above and below to estimate

$$\begin{aligned} \begin{aligned} \left\| \mu _t- \frac{\mu (\Omega )}{\rho (\Omega )}\rho \right\| _{W^{-1,p}}&\le \left\| \mu _t- \mu (\Omega ) \right\| _{W^{-1,p}} + \mu (\Omega )\left\| 1- \frac{1}{\rho (\Omega )}\rho \right\| _{W^{-1,p}}\\&\lesssim \left\| \mu _t- \mu (\Omega ) \right\| _{W^{-1,p}} + \mu (\Omega ). \end{aligned} \end{aligned}$$

Using that the Sobolev embedding is equivalent to ultra-contractivity i.e. if \(\int _\Omega \phi =0\),

$$\begin{aligned} \Vert \phi _t\Vert _{L^\infty (\Omega )}\lesssim t^{-\frac{d}{2q_*}} \Vert \phi \Vert _{L^{q^*}(\Omega )}\lesssim t^{-\frac{d}{2q_*}} \Vert \nabla \phi \Vert _{L^{q}(\Omega )}, \end{aligned}$$

where \(\phi _t = P_t \phi \), we finally estimate for every \(\phi \) with \(\Vert \nabla \phi \Vert _{L^{q}(\Omega )}\le 1\) and \( \int _{\Omega } \phi =0\),

$$\begin{aligned} \int _{\Omega }\phi (\mu _t-\mu (\Omega ))= \int _\Omega \phi _t d\mu \le \mu (\Omega ) \Vert \phi _t\Vert _{L^\infty (\Omega )}\le \mu (\Omega ) t^{-\frac{d}{2q_*}}. \end{aligned}$$

Therefore, by taking the supremum over \(\phi \) we find

$$\begin{aligned} \left\| \mu _t- \mu (\Omega ) \right\| _{W^{-1,p}}\lesssim \mu (\Omega )t^{-\frac{d}{2q_*}}. \end{aligned}$$

Taking the p-th power we find for \(t\le 1\),

$$\begin{aligned} \begin{aligned} \textsf{W}^p(\mu +h\rho , \alpha \rho )&\lesssim \mu (\Omega )\left[t^{\frac{p}{2}} +t^{-\frac{pd}{2q^*}}\left(\frac{\mu (\Omega )}{h}\right)^{p-1}\right]\\&=\mu (\Omega )\left[t^{\frac{p}{2}} +t^{-\frac{p}{2}\left(\frac{d}{q}-1\right)}\left(\frac{\mu (\Omega )}{h}\right)^{p-1}\right] \end{aligned}. \end{aligned}$$

Optimizing in t we find \(t^{\frac{p}{2}}=\left(\frac{\mu (\Omega )}{h}\right)^{\frac{(p-1)q}{d}}\) which satisfies \(t\ll 1\) if \(\mu (\Omega )\ll h\). Since \((p-1)q=p\), this concludes the proof of (2.24). \(\square \)

Remark 2.10

Since by Hölder inequality it will be enough for us to apply Proposition 2.9 for p arbitrarily close to d, the condition \(p>d/(d-1)\) will not be a limitation for us. Let us however mention that, in the critical case \(p=d/(d-1)\) one can argue similarly, relying instead on the Moser-Trudinger inequality [18, Remark 1.4], to obtain (in the case \(\rho =1\) and \(\Omega =Q\) a cube for simplicity)

$$\begin{aligned} \textsf{W}^p_Q\left( \mu +\frac{h}{|Q|}, \frac{ \mu (Q)}{|Q|} + \frac{h}{|Q|} \right) \lesssim |Q|^{1/(d-1)} \mu (Q) \left|\log \left(\frac{\mu (Q)}{h}\right)\right| \left(\frac{\mu (Q)}{h}\right)^{\frac{1}{d-1}}. \end{aligned}$$

If instead \(1\le p<d/(d-1)\), using the same proof as above but with the inclusion \(W^{1,q}(Q) \subseteq L^\infty (Q)\) and letting \(t \rightarrow 0\) gives the estimate

$$\begin{aligned} \textsf{W}^p_Q\left( \mu + \frac{h}{|Q|}, \frac{ \mu (Q)}{|Q|} + \frac{h}{|Q|} \right) \lesssim |Q|^{p/d} \mu (Q)\left(\frac{ \mu (Q)}{h}\right)^{p-1}. \end{aligned}$$

We close this section with the following result easily adapted from [4, Proposition 2.4] which helps in particular to reduce the transport problem from Hölder to constant densities.

Proposition 2.11

For \(d\ge 1\), \(\alpha \in (0,1)\) and \(\rho _0>0\), there exists \(C = C(\rho _0, d, \alpha )>0\) such that the following holds: for any \(\rho \in C^\alpha ( (0,1)^d)\) with

$$\begin{aligned} \int _{(0,1)^d} \rho = 1 \quad \text {and} \quad \rho _0 \le \rho \le \rho _0^{-1}, \end{aligned}$$

there exists \(T: (0,1)^d \rightarrow (0,1)^d\) such that \(T_{\sharp } \rho = 1\), with

$$\begin{aligned} {\text {Lip}}T, {\text {Lip}}T^{-1} \le 1 + C \left\| \rho -1 \right\| _{C^\alpha }. \end{aligned}$$

2.6 A subadditivity lemma

We will need a slight variant of the usual convergence results for subadditive functions, see e.g. [12, 46].

Lemma 2.12

Let \(\alpha , \beta , c>0\), \(f: [1, \infty ) \rightarrow [0, \infty )\) be continuous and such that the following holds: for every \(\eta \in (0,1/2]\), there exists \(C(\eta )>0\) such that, for every \(m \in \mathbb {N}{\setminus }\left\{ 0 \right\} \) and \(L \ge C(\eta )\),

$$\begin{aligned} f(m L) \le f( L (1-\eta )) + c \eta ^{\alpha } + C(\eta )L^{-\beta }. \end{aligned}$$
(2.25)

Then \(\lim _{L\rightarrow \infty } f(L) \in [0, \infty )\) exists.

Proof

We use the following fact: for any open interval \((a,b) \subseteq [0, \infty )\), the union

$$\begin{aligned}\bigcup _{m=1} ^\infty (ma, mb) \supseteq (A, +\infty )\end{aligned}$$

contains a half-line, for some \(A>0\). Indeed, one has \((ma, mb) \cap ((m+1)a, (m+1)b)\ne \emptyset \) if \(mb > (m+1)a\), which holds for every \(m> a/(b-a)\).

First, we show that f is uniformly bounded. Let \(\eta = 1/2\) and use the fact that both f(L) and \(L^{-\beta }\) are continuous for \(L \in [1/2,2]\), hence bounded, so that by (2.25), for every \(m \ge 1\), \(L \in [1,2]\),

$$\begin{aligned} f(m L) \le \sup _{\ell \in [1,2]} \left( f(\ell /2) + c2^{-\alpha } + C(1/2) \ell ^{-\beta } \right) < \infty . \end{aligned}$$

since \(\bigcup _{m= 1}^\infty [m, 2m] = [1, \infty )\), it follows that f is uniformly bounded on \([1, \infty )\). To show that the limit exists (and is finite) we argue that

$$\begin{aligned} \limsup _{L \rightarrow \infty } f(L) \le \liminf _{L \rightarrow \infty } f(L). \end{aligned}$$

Given \(\varepsilon \ll 1\), let \(\eta = \eta (\varepsilon ) \in (0,1/2]\) such that \(c\eta ^{\alpha }= \varepsilon \) and \(L_{\varepsilon }>0\) such that \(C(\eta ) L_{\varepsilon }^{-\beta }=\varepsilon \), so that, for every \(L \ge L_\varepsilon \),

$$\begin{aligned} C(\eta ) L^{-\beta } \le \varepsilon . \end{aligned}$$

Let then \(L^* > \max \left\{ L_\varepsilon , C(\eta ) \right\} \) be such that

$$\begin{aligned} f(L^*) < \liminf _{L \rightarrow \infty } f(L) + \varepsilon . \end{aligned}$$

By continuity of f, there exists \(a< L^* <b\) with \(a>\max \left\{ L_\varepsilon , C(\eta ) \right\} \) such that the same inequality holds for \(L \in (a,b)\). For every \(m \ge 1\), and \(L \in (a/(1-\eta ), b/(1-\eta ))\), we have \(L \ge \max \left\{ L_\varepsilon , C(\eta ) \right\} \) and \(L(1-\eta ) \in (a,b)\), hence using (2.25) we obtain

$$\begin{aligned} f(m L) \le f(L (1-\eta )) + c \eta ^{\alpha } + C(\eta ) L^{-\beta } \le \liminf _{L \rightarrow \infty } f(L) + 3 \varepsilon . \end{aligned}$$

Using that \(\cup _{m =1}^\infty (ma/(1-\eta ), mb/(1-\eta ))\) contains a half-line \((A,+\infty )\), it follows that

$$\begin{aligned} \limsup _{L\rightarrow \infty } f(L) \le \liminf _{L \rightarrow \infty } f(L) + 3 \varepsilon , \end{aligned}$$

and the thesis follows letting \(\varepsilon \rightarrow 0\). \(\square \)

2.7 Concentration inequalities

We close this section by recalling some standard concentration inequalities. Let us start with a general definition.

Definition 2.13

We say that a random variable X with \(\mathbb {E}\left[ X \right] =h\) satisfies (algebraic) concentration if for every \(q\ge 1\) there exists \(C(q)\in (0, \infty )\) such that

$$\begin{aligned} \mathbb {E}\left[ |X-h|^q \right] \le C(q) |h|^{\frac{q}{2}}. \end{aligned}$$

We then have

Lemma 2.14

Poisson, binomial and hypergeometric random variables satisfy concentration. More precisely, if:

  1. (i)

    N is a Poisson random variable with parameter \(n \ge 1\) then, for every \(q\ge 1\),

    $$\begin{aligned} \mathbb {E}\left[|N-n|^q\right]\lesssim _q n^{\frac{q}{2}}. \end{aligned}$$
    (2.26)

    Hence, for every \(\gamma \in (0,1)\),

    $$\begin{aligned} \mathbb {P}\left( N < \gamma n \quad \text {or} \quad N > (1+\gamma ) n \right) \lesssim _{q,\gamma } (1-\gamma )^{-2q} n^{-q}. \end{aligned}$$
    (2.27)
  2. (ii)

    B is a binomial random variable with parameters n and \(p\in (0,1)\) (so that \(\mathbb {E}\left[ B \right] = np\)) then, for every \(q \ge 1\),

    $$\begin{aligned} \mathbb {E}\left[|B-np|^q\right]\lesssim _q n^{\frac{q}{2}}. \end{aligned}$$
    (2.28)
  3. (iii)

    H is a hypergeometric random variables counting the number of red marbles extracted in z draws without replacement from an urn containing u marbles, r of which are red (so that \(\mathbb {E}\left[ H \right] = z r/u\)) then, for every \(q \ge 1\),

    $$\begin{aligned} \mathbb {E}\left[ \left| H - zr/u \right| ^q \right] \lesssim _q r^{\frac{q}{2}}. \end{aligned}$$
    (2.29)

Proof

We only prove concentration in the hypergeometric case, since it is classical for both Poisson and binomial random variables. We may assume that \(r \ge 1\), otherwise there is nothing to prove since \(H = \mathbb {E}\left[ H \right] = 0\). From [29, Theorem 1], we have, for \(\lambda \ge 2\),

$$\begin{aligned} \mathbb {P}\left( \left| H - \mathbb {E}\left[ H \right] \right| \ge \lambda \right) \le 2 \exp \left( -\alpha \lambda ^2 \right) , \end{aligned}$$

where

$$\begin{aligned} \alpha = \min \left\{ \frac{1}{z+1} + \frac{1}{u-z+1}, \frac{1}{r+1} + \frac{1}{u-r+1} \right\} \ge \frac{u+2 }{(r+1)(u-r+1)} \ge \frac{1}{r+1}. \end{aligned}$$

As usual, writing

$$\begin{aligned} \mathbb {E}\left[ \left| H - \mathbb {E}\left[ H \right] \right| ^q \right] = \int _0^\infty \mathbb {P}\left( \left| H - \mathbb {E}\left[ H \right] \right| \ge \lambda \right) p\lambda ^{p-1} d \lambda , \end{aligned}$$

yields the bound

$$\begin{aligned} \mathbb {E}\left[ \left| H - \mathbb {E}\left[ H \right] \right| ^q \right] \lesssim _q 1+ \alpha ^{-\frac{q}{2}} \lesssim _q 1 + (r+1)^{\frac{q}{2}}, \end{aligned}$$

which is bounded from above by \(r^{q/2}\), since \(r \ge 1\). \(\square \)

3 Combinatorial optimization problems over bipartite graphs

3.1 Graphs

Although we are interested in random combinatorial optimization over Euclidean bipartite graphs, it is useful to recall some general terminology. A (finite, undirected) graph \(G = (V, E)\) is defined by a finite set \(V = V_G\) of vertices (or nodes) and a set of edges \(E = E_G\), which is a collection of unordered pairs \(e = \left\{ x,y \right\} \subseteq V\) with \(x \ne y\). A graph \(G'\) is a subgraph of G and we write \(G' \subseteq G\), if \(V_{G'} \subseteq V_G\) and \(E_{G'}\subseteq E_G\). The induced subgraph over a subset of vertices \(V' \subseteq V_G\) is defined as the subgraph \(G'\) with \(V_{G'} = V'\) and all the edges from \(E_G\) connecting vertices in \(V'\). It will be useful to denote by \(\emptyset \) the empty graph, i.e., \(V = \emptyset \), \(E = \emptyset \), which is a subgraph of any graph G.

Given a vertex \(x \in V\), its neighborhood in G is the set

$$\begin{aligned} \mathcal {N}_G(x) = \left\{ y \in V\,: \left\{ x,y \right\} \in E \right\} . \end{aligned}$$

The degree of x in G, \(\deg _G(x)\), is the number of elements in \(\mathcal {N}_G(x)\). Given \(\kappa \in \mathbb {N}\), a graph G is \(\kappa \)-regular if \(\deg _G(x) = \kappa \) for every \(x \in V_G\). We say that a subgraph \(G' \subseteq G\) spans \(V_G\) if \(V_{G'} = V_G\) and \(\mathcal {N}_{G'}(x) \ne \emptyset \) for ever \(x \in V_{G'}\). We say that two subgraphs \(G_1\), \(G_2\) of G are disjoint if \(V_{G_1} \cap V_{G_2} =\emptyset \). A graph G is connected if it cannot be decomposed as the union of two disjoint subgraphs \(G = G_1 \cup G_2\), i.e., \(V_G = V_{G_1} \cup V_{G_2}\) with both \(V_{G_1}\), \(V_{G_2} \ne \emptyset \), \(V_{G_1} \cap V_{G_2} = \emptyset \) and \(E_{G} =E_{G_1} \cup E_{G_2}\). Given \(\kappa \in \mathbb {N}\), \(\kappa \ge 1\), we say that a graph G is \(\kappa \)-connected if any subgraph \(G' \subseteq G\) obtained by removing from G \((\kappa -1)\)-edges is still connected. A cycle is a connected 2-regular graph, a tree is a connected graph which contains no cycles as subgraphs.

Given two graphs \(G_1\), \(G_2\) and an injective function \(\sigma : V_{G_1} \rightarrow V_{G_2}\), we let \(\sigma (E_1) = \left\{ \left\{ \sigma (x), \sigma (y) \right\} \,: \, \left\{ x,y \right\} \in E_{G_1} \right\} \). If \(\sigma (E_1) \subseteq E_2\), then we say that \(G_1\) embeds into \(G_2\) via \(\sigma \). If \(\sigma \) is bijective and \(\sigma (E_1) = E_2\), then we say that \(G_1\) is isomorphic to \(G_2\) via \(\sigma \).

A graph G is complete if \(E_G\) consists of all the pairs \(\left\{ x,y \right\} \subseteq V\) with \(x \ne y\). The complete graph over \(V = [n]\) is commonly denoted by \(\mathcal {K}_n\). Any complete graph G with n vertices is isomorphic to \(\mathcal {K}_n\). We say that the graph G is bipartite over a partition \(V = X \cup Y\) (i.e., \(X \cap Y = \emptyset \)), if every \(e\in E\) can be written as \(e= \left\{ x,y \right\} \) with \(x\in X\), \(y \in Y\). A graph is complete bipartite if it is bipartite over a partition \(V = X \cup Y\) and every pair \(\left\{ x,y \right\} \) with \(x\in X\), \(y \in Y\) is an edge. For any \(n, m \in \mathbb {N}\), any two complete bipartite graphs with X having n elements and Y having m elements are isomorphic. To fix a representative, we define \(\mathcal {K}_{n,m}\) as the complete bipartite graph over the vertex set \(V = [n]_1 \cup [m]_2\).

We introduce a weight function on edges \(w: E \rightarrow [0, \infty )\), \(w(e) = w(x,y)\). The total weight of G is then

$$\begin{aligned} w(G) = \sum _{e \in E} w(e). \end{aligned}$$

A subgraph \(G' \subseteq G\) of a weighted graph is always intended with the restriction of w on \(E'\). Notice that for the empty graph \(\emptyset \subseteq G\) we have \(w(\emptyset ) = 0\).

We are interested in geometric realizations of graphs, where vertices are in correspondence with points in a metric space \((\Omega , \textsf{d})\), and the weight function is a power of the distance between the corresponding points, with a fixed exponent \(p>0\). Since we consider only complete and complete bipartite graphs, we introduce the following notation. Given \(\textbf{x}= (x_i)_{i=1}^n \subseteq \Omega \), we let \(\mathcal {K}(\textbf{x})\) be the complete graph \(\mathcal {K}_n\) endowed with the weight function \(w(i,j) = \textsf{d}(x_i, x_j)^p\). Similarly, given \(\textbf{x}= (x_i)_{i=1}^n\), \(\textbf{y}= (y_j)_{j=1}^m \subseteq \Omega \), we let \(\mathcal {K}(\textbf{x}, \textbf{y})\) denote the complete bipartite graph \(\mathcal {K}_{n,m}\) endowed with the weight function \(w((1,i), (2,j)) = \textsf{d}(x_i, y_j)^p\). Notice that the points in \(\textbf{x}\) and \(\textbf{y}\) may not be all distinct, but this will in fact occur with probability zero. If all the points are distinct, then we can and will identify the vertex set directly with the set of points \(\textbf{x}\) for \(\mathcal {K}(\textbf{x})\), and with the set of points in \(\textbf{x}\cup \textbf{y}\) for \(\mathcal {K}(\textbf{x}, \textbf{y})\). With this convention, if \(\textbf{x}= \textbf{x}^0 \cup \textbf{x}^1\), \(\textbf{y}= \textbf{y}^0 \cup \textbf{y}^1\), then both \(\mathcal {K}(\textbf{x}^0, \textbf{y}^0)\) and \(\mathcal {K}(\textbf{x}^1, \textbf{y}^1)\) are naturally seen as subgraphs of \(\mathcal {K}(\textbf{x}, \textbf{y})\).

3.2 Combinatorial problems

A combinatorial optimization problem \(\textsf{P}\) on weighted graphs is informally defined by prescribing, for every graph G, a set of subgraphs \(G' \subseteq G\), also called feasible solutions \(\mathcal {F}_{G}\), and, after introducing a weight w, by minimizing \(w(G')\) over all \(G'\in \mathcal {F}_G\).

Our aim is to study problems on random geometric realizations of complete bipartite graphs \(\mathcal {K}_{n,n}\), thus it is sufficient to define a combinatorial optimization problem over complete bipartite graphs as a collection of feasible solutions \(\textsf{P} = ( \mathcal {F}_{n,n})_{ n \in \mathbb {N}}\), with \(\mathcal {F}_{n,n}\) being the feasible solutions on \(\mathcal {K}_{n,n}\). We will mostly consider problems \(\textsf{P}\) that satisfy the following assumptions:

  1. A1

    (isomorphism) if \(\sigma \) is any isomorphism of \(\mathcal {K}_{n,n}\) into itself and \(G \in \mathcal {F}_{n,n}\), then \(\sigma (G) = (\sigma (V_G), \sigma (E_G)) \in \mathcal {F}_{n,n}\);

  2. A2

    (spanning) for every \(n \in \mathbb {N}\), \(\mathcal {F}_{n,n}\) is not empty and there exists \(\textsf{c}_{{\text {A2}}}>0\) such that, for \(n < \textsf{c}_{{\text {A2}}}\), \(\mathcal {F}_{n,n} = \left\{ \emptyset \right\} \) while for \(n \ge \textsf{c}_{{\text {A2}}}\), every \(G \in \mathcal {F}_{n,n}\) spans \(\mathcal {K}_{n,n}\);

  3. A3

    (bounded degree) there exists \(\textsf{c}_{{\text {A3}}}>0\) such that, for every \(n \in \mathbb {N}\) and every feasible solution \(G \in \mathcal {F}_{n,n}\), one has \(\deg _G(x) \le \textsf{c}_{{\text {A3}}}\) for every \(x \in G\).

Given \(\textsf{P} = ( \mathcal {F}_{n,n})_{n \in \mathbb {N}}\), we canonically extend it to graphs \(\mathcal {K}_{n,m}\), with \(n \ne m\), defining \(\mathcal {F}_{n,m}\) as the collection of all graphs \(\sigma (G)\) where \(G\in \mathcal {F}_{z,z}\), \(z = \min \left\{ n,m \right\} \) and \(\sigma \) is an isomorphism of \(\mathcal {K}_{n,m}\) into itself.

In the geometric setting, i.e., when \(\mathcal {K}_{n,m}\) is mapped into \(\mathcal {K}(\textbf{x},\textbf{y})\) with \(\textbf{x}= (x_i)_{i=1}^n\), \(\textbf{y}= (y_j)_{j=1}^m\subseteq \Omega \), with \((\Omega , \textsf{d})\) metric space, we introduce the following notation for the cost of a problem \(\textsf{P}\):

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p(\textbf{x}, \textbf{y}) = \min _{G \in \mathcal {F}_{n,m}} \sum _{\left\{ (1,i),(2,j) \right\} \in E_G} \textsf{d}(x_i,y_j)^p. \end{aligned}$$

Recalling the definition of \(\mathcal {F}_{n,m}\) if \(n \ne m\), we also have the identity

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p(\textbf{x}, \textbf{y}) = \min _{\begin{array}{c} \textbf{x}' \subseteq \textbf{x}, \textbf{y}' \subseteq \textbf{y}\\ |\textbf{x}'| = |\textbf{y}'| = \min \left\{ |\textbf{x}|, |\textbf{y}| \right\} \end{array}} \mathcal {C}_{\textsf{P}}^p(\textbf{x}', \textbf{y}'). \end{aligned}$$
(3.1)

Remark 3.1

Assumption A2 ensures that, if \(\min \left\{ |\textbf{x}|, |\textbf{y}| \right\} < \textsf{c}_{{\text {A2}}}\), then \(\mathcal {C}_{\textsf{P}}^p(\textbf{x}, \textbf{y}) = 0\).

Remark 3.2

If \((\Omega ', \textsf{d}')\) is a metric space and \(S: \Omega \rightarrow \Omega '\) is Lipschitz, i.e., for some constant \({\text {Lip}}S\) one has \(\textsf{d}'(S(x), S(y)) \le ({\text {Lip}}S) \textsf{d}(x, y)\) for every x, \(y\in \Omega \), then writing \(S(\textbf{x}) = (S(x_i))_{i=1}^n\), \(S(\textbf{y}) = (S(y_j))_{j=1}^m\), we clearly have the inequality

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p( S(\textbf{x}), S(\textbf{y}) ) \le ({\text {Lip}}S)^p \mathcal {C}_{\textsf{P}}^p( \textbf{x}, \textbf{y}). \end{aligned}$$
(3.2)

Remark 3.3

Similar definitions and assumptions may be given in the non-bipartite case, thus defining combinatorial optimization problems \(\textsf{P} = ( \mathcal {F}_{n})_{n \in \mathbb {N}}\) over complete graphs, as a collection of feasible solutions \(\mathcal {F}_{n}\) over the complete graph \(\mathcal {K}_{n}\).

3.3 Examples

Let us introduce some fundamental examples of these problems.

3.3.1 Assignment problem

The minimum weight bipartite matching problem, also called assignment problem, is defined letting \(\mathcal {F}_{n,n}\) be the set of perfect matchings in \(\mathcal {K}_{n,n}\), i.e., spanning subgraphs induced by a collection of edges which have no vertex in common (if \(n=0\) we simply let \(\mathcal {F}_{n,n} = \left\{ \emptyset \right\} \)). Feasible solutions are in correspondence with permutations \(\sigma \) over [n], letting

$$\begin{aligned} E_{\sigma } = \left\{ \left\{ (1,i), (2,\sigma (i)) \right\} \,: \, i \in [n] \right\} . \end{aligned}$$

When \(n \ne m\), e.g. \(n \le m\), the same correspondence holds with the set of injective maps \(\sigma :[n] \rightarrow [m]\). Therefore, given a weight w on \(\mathcal {K}_{n,m}\), the cost of the assignment problem is

$$\begin{aligned} \min _{\sigma } \sum _{i=1}^n w\left( (1,i), (2,\sigma (i)) \right) . \end{aligned}$$

In the geometric case, i.e., on the weighted graph \(\mathcal {K}(\textbf{x}, \textbf{y})\) with \(\textbf{x}= (x_i)_{i=1}^n\), \(\textbf{y}=(y_j)_{j=1}^m \subseteq \Omega \) and \(w\left( (1,i), (2,j) \right) =\textsf{d}(x_i, y_{j} )^p\), this expression becomes

$$\begin{aligned} \textsf{M}^p(\textbf{x}, \textbf{y}) = \min _{\sigma } \sum _{i=1}^n \textsf{d}(x_i, y_{\sigma (i)} )^p. \end{aligned}$$

If \(n>m\), then one simply exchanges the roles of n and m.

Remark 3.4

If \(n=m\), Birkhoff’s theorem ensures equivalence between the bipartite matching problem and the optimal transport between the associated empirical measures \(\mu ^\textbf{x}= \sum _{i=1}^n \delta _{x_i}\), \(\mu ^\textbf{y}= \sum _{j=1}^n \delta _{y_j}\), i.e.,

$$\begin{aligned} \textsf{M}^p(\textbf{x}, \textbf{y}) = \textsf{W}^p( \mu ^\textbf{x}, \nu ^\textbf{y}). \end{aligned}$$
(3.3)

Therefore, using the triangle inequality (2.18), we can bound from above as follows:

$$\begin{aligned} \textsf{M}^p(\textbf{x}, \textbf{y}) \lesssim \textsf{W}^p\left( \mu ^\textbf{x}, n \lambda \right) + \textsf{W}^p\left( \mu ^\textbf{y}, n \lambda \right) , \end{aligned}$$
(3.4)

for every probability measure \(\lambda \) on \(\mathbb {R}^d\).

3.3.2 Travelling salesperson problem

The travelling salesperson problem (TSP) is usually defined on a general graph by prescribing as feasible solutions the cycles visiting each vertex exactly once (also called Hamiltonian cycles). In the complete bipartite case \(\mathcal {K}_{n,n}\), such cycles exist for every \(n \ge 2\), and assumptions A1, A2 and A3 are also clearly satisfied (letting \(\mathcal {F}_{n,n} = \left\{ \emptyset \right\} \) if \(n\in \left\{ 0,1 \right\} \)). Similarly as in the case of the assignment problem, feasible solutions are in this case in correspondence with pairs of permutations \(\sigma \), \(\tau \) over [n], letting

$$\begin{aligned} E_{\sigma , \tau } = \left\{ \left\{ (1,\sigma (i)), (2,\tau (i)) \right\} , \left\{ (1,\sigma (i), (2,\tau (i+1)) ) \right\} \,: \, i \in \left\{ 1, \ldots , n \right\} \right\} ,\qquad \end{aligned}$$
(3.5)

where we conventionally let \(\tau (n+1) = \tau (1)\) (we will always use summation \({\text {mod}} n\) in such cases). In words, \(\sigma \) and \(\tau \) prescribe the order at which the vertices are visited by the cycle. When \(n \ne m\), e.g. \(n \le m\), the same correspondence holds with injective maps \(\sigma \), \(\tau \) from [n] into [m].

Therefore, given a weight w on \(\mathcal {K}_{n,m}\), the cost of the TSP reads

$$\begin{aligned} \min _{\sigma , \tau } \sum _{i=1}^n w\left( (1,\sigma (i)), (2,\tau (i)) \right) + w\left( (1,\sigma (i)), (2,\tau (i+1)) \right) . \end{aligned}$$

In the geometric case, i.e., on the weighted graph \(\mathcal {K}(\textbf{x}, \textbf{y})\) with \(\textbf{x}= (x_i)_{i=1}^n\), \(\textbf{y}=(y_j)_{j=1}^m \subseteq \Omega \), this becomes

$$\begin{aligned} \mathcal {C}_{\textsf{TSP}}^p(\textbf{x}, \textbf{y}) = \min _{\sigma , \tau } \sum _{i=1}^n \textsf{d}(x_{\sigma (i)}, y_{\tau (i)} )^p + \textsf{d}(x_{\sigma (i)}, y_{\tau (i+1)} )^p. \end{aligned}$$

If \(n>m\), then one simply exchanges the roles of n and m.

The non-bipartite version of the TSP, i.e., on \(\mathcal {K}_n\), feasible solutions to the TSP are in correspondence with permutations \(\sigma \) over [n], letting

$$\begin{aligned} E_{\sigma } = \left\{ \left\{ \sigma (i), \sigma (i+1) \right\} \,: \, i \in [n] \right\} . \end{aligned}$$

In the geometric case \(\textbf{x}= (x_i)_{i=1}^n \subseteq \Omega \), it becomes

$$\begin{aligned} \mathcal {C}_{\textsf{TSP}}^p(\textbf{x}) = \min _{\sigma } \sum _{i=1}^n \textsf{d}(x_{\sigma (i)}, x_{\sigma (i+1)} )^p. \end{aligned}$$

3.3.3 Connected \(\kappa \)-factor problem

The TSP can be generalized in many directions. For example, since a cycle is a connected graph such that every vertex has degree 2, i.e., it is 2-regular, we may instead define as feasible solutions \(\kappa \)-regular spanning connected subgraphs, for a fixed \(\kappa \in \mathbb {N}\), \(\kappa \ge 2\). This defines a non-empty set of feasible solutions \(\mathcal {F}_{n,n}\) over \(\mathcal {K}_{n,n}\) if \(n \ge \kappa \) (otherwise we let \(\mathcal {F}_{n,n} = \left\{ \emptyset \right\} \)) and assumptions A1, A2 and A3 are easily seen to be satisfied. We refer to such problem as the (minimum weight) connected \(\kappa \)-factor problem. A simpler variant is to require that feasible solutions are \(\kappa \)-regular but not necessarily connected: this is simply known as (minimum weight) \(\kappa \)-factor problem. Let us notice that, for \(\kappa =1\), this reduces to the assignment problem.

Back to the connected \(\kappa \)-factor problem, a simple fact worth noticing, that we will use below, is that any connected \(\kappa \)-regular bipartite graph G is 2-connected, i.e., it remains connected even after removing a single edge. Assume that \(V_G = X \cup Y\), with \(X \cap Y = \emptyset \) and by contradiction let \(x\in X\), \(y \in Y\) be such that \(\left\{ x,y \right\} \in E_G\) and the subgraph \(G' \subseteq G\) with edge set \(E_{G'} = E_G {\setminus } \left\{ x,y \right\} \) is not connected: there are two disjoint subgraphs \(G_1'\), \(G_2'\) with \(x \in V_{G_1'}\), \(y \in V_{G_2'}\) with \(G' = G_1' \cup G_2'\). All the vertices in \(G_1'\) have degree \(\kappa \), except for x, whose degree is \(\kappa -1\). However, if we let \(n_X = |V_{G_1'} \cap X|\), \(n_Y = |V_{G_1'} \cap Y|\), then using the fact that the graph \(G_1'\) is bipartite we can count the number of edges as the sum of the degrees of the vertices in \(V_{G_1'} \cap X\) or equivalently of those in \(V_{G_1'} \cap Y\), which leads to the identity \(\kappa n_X - 1 = \kappa n_Y\), from which \(\kappa (n_X - n_Y ) = 1\), which gives a contradiction.

3.3.4 \(\kappa \)-bounded degree minimum spanning tree

The minimum weight spanning trees (MST) problem is defined by letting feasible solutions be all spanning subgraphs that are trees, i.e., connect and acyclic, whose existence on any given connected graph is guaranteed by standard algorithms. This problem however may not have uniformly bounded degree, thus assumption A3 may not be satisfied. Therefore, we restrict the set of feasible solutions to spanning trees over \(\mathcal {K}_{n,n}\) such that that each vertex degree is less than or equal to some fixed \(\kappa \ge 2\) (letting \(\mathcal {F}_{0,0}=\left\{ \emptyset \right\} \)). This problem, known as the \(\kappa \)-bounded degree minimum spanning tree (\(\kappa \)-MST), satisfies assumptions A1, A2 and A3: notice in particular that removing any edge from a Hamiltonian cycle, i.e., a feasible solution for the TSP, gives a 2-bounded degree minimum spanning tree.

We remark here that the \(\kappa \)-MST problem may be also directly defined over graphs \(\mathcal {K}_{n,m}\), with \(n \ne m\), with a non trivial set of feasible solutions (provided that \(|n-m|\) is not too large). However, also in this case we follow our general convention, so that if \(n \ne m\), the set \(\mathcal {F}_{n,m}\) does not contain spanning trees of \(\mathcal {K}_{n,m}\) but only spanning trees over subgraphs isomorphic to \(\mathcal {K}_{z,z}\) with \(z = \min \left\{ n,m \right\} \).

A simple fact that we will use below is that any \(G \in \mathcal {F}_{n,n}\) contains at least one leaf (i.e., a vertex with degree 1) in \([n]_1\) and one in \([n]_2\). This is because more generally any spanning tree over \(\mathcal {K}_{n,n}\) contains at least one leaf in \([n]_1\) and one in \([n]_2\). Indeed, assume by contradiction that there are no leaves in \([n]_1\). Then, since the tree spans, all the vertices in \([n]_1\) must have degree at least 2 (the graph is connected, hence every vertex has at least degree 1) and since no edges connect pairs of vertices in \([n]_1\), these are all distinct, hence the tree contains at least 2n edges, which contradicts the well-known fact that any tree (not necessarily bipartite) over 2n vertices must have \(2n-1\) edges.

In order to perform our analysis, we introduce two further assumptions that we discuss in the following subsections.

3.4 Local merging

Our analysis relies on a key subadditivity inequality, that ultimately follows by a stability assumption with respect to local merging operations, besides assumptions A1 and A3. Let us give the following general definition.

Definition 3.5

(gluing) Given a graph G and two disjoint subgraphs \(G_1\), \(G_2 \subseteq G\), we say that \(G' \subseteq G\) is obtained by gluing at \(x_1 \in V_{G_1}\), \(x_2 \in V_{G_2}\) if \(V_{G'} = V_{G_1}\cup V_{G_2}\),

$$\begin{aligned} (E_{G_1} \cup E_{G_2}) \setminus E_{G'} \subseteq \mathcal {N}_{G_1}(x_1) \cup \mathcal {N}_{G_2}(x_2) \end{aligned}$$

and

$$\begin{aligned} E_{G'} \setminus (E_{G_1} \cup E_{G_2}) \subseteq \left\{ \left\{ x_1, y \right\} : y \in \mathcal {N}_{G_2}(x_2) \right\} \cup \left\{ \left\{ x_2, y \right\} : y \in \mathcal {N}_{G_1}(x_1) \right\} . \end{aligned}$$

In words, gluing at \(x_1\), \(x_2\) means that the two subgraphs are joined by (possibly) removing and adding edges connecting \(x_2\) to vertices from the neighborhood of \(x_1\) in \(G_1\), and similarly \(x_1\) to vertices from the the neighborhood of \(x_2\) in \(G_2\). In particular, we have that \(\mathcal {N}_{G'}(x) = \mathcal {N}_{G_1}(x)\) for every \(x\in V_{G_1} {\setminus } \left( \mathcal {N}_{G_1}(x_1)\cup \left\{ x_1 \right\} \right) \), and similarly \(\mathcal {N}_{G'}(x) = \mathcal {N}_{G_2}(x)\) for every \(x\in V_{G_2} {\setminus } \left( \mathcal {N}_{G_2}(x_2)\cup \left\{ x_2 \right\} \right) \).

Back to combinatorial optimization problems over bipartite graphs, our assumption is, loosely speaking, that any two (non empty) feasible solutions \(G \in \mathcal {F}_{n,n}\), \(G' \in \mathcal {F}_{n',n'}\), can be glued together yielding a feasible solution \(G \in \mathcal {F}_{n+n', n+n'}\). In fact, we also allow adding up to \(\textsf{c}\) edges, but only connecting vertices of G, where \(\textsf{c} \in \mathbb {N}\) is a constant (depending only on the problem \(\textsf{P}\)). Before giving a precise formulation of the assumption, we notice that G and \(G'\) are in general not disjoint: what we mean is that \(G'\) must be suitably “translated”. Precisely, given \(n \in \mathbb {N}\), we introduce the map

$$\begin{aligned} \tau _n: V_{G'} \rightarrow [n+n']_1 \cup [n+n']_2 \end{aligned}$$

defined as

$$\begin{aligned} \tau \left( (1,i) \right) = (1, n+i), \quad \tau \left( (2,j) \right) = (2,n+j), \end{aligned}$$

so that G, \(\tau ( G') \subseteq \mathcal {K}_{n+n', n+n'}\) are disjoint.

We consider therefore combinatorial optimization problems \(\textsf{P}\) over bipartite graphs which satisfy the following assumption:

  1. A4

    (local merging) there exists \(\textsf{c}_{{\text {A4}}}\ge 0\) such that, for every n, \(n' \in \mathbb {N}\), and \(G\in \mathcal {F}_{n,n}\), \(G'\in \mathcal {F}_{n',n'}\) with both \(G\ne \emptyset \) and \(G' \ne \emptyset \), one can find \(G'' \in \mathcal {F}_{n+n', n+n'}\) obtained by gluing G and \(\tau (G')\) at the vertices (1, 1), \((1,n+1)\) and possibly adding up to \(\textsf{c}_{{\text {A4}}}\) edges from those of \(\mathcal {K}_{n,n}\).

The reason why we also allow up to \(\textsf{c}_{{\text {A4}}}\) additional edges is to include some problems where connectedness may be destroyed by gluing, such as the \(\kappa \)-MST. This should be compared with the merging assumption [6, (A4)], where a bounded number of edges from the whole \(\mathcal {K}_{n+n', n+n'}\) instead is allowed to be added to the union \(G \cup \tau (G')\) (with our notation). Notice however that, in our case, since the extra edges are from \(\mathcal {K}_{n,n}\) it remains true that

$$\begin{aligned} \mathcal {N}_{G''}(x) = \mathcal {N}_{\tau (G')}(x),\hbox { for every }x\in V_{\tau (G')} \setminus \left( \mathcal {N}_{\tau (G')}((1,n+1))\cup \left\{ (1, n+1) \right\} \right) ,\nonumber \\ \end{aligned}$$
(3.6)

which is a key condition that we use below.

All the problems described in the previous section satisfy A4.

Lemma 3.6

The TSP, the connected \(\kappa \)-factor problem (as well as the non connected one) and the \(\kappa \)-MST over complete bipartite graphs satisfy assumption A4.

Proof

Let \(G \in \mathcal {F}_{n,n}\), \(G'\in \mathcal {F}_{n', n'}\) be both non empty. Then (e.g. by assumption A2) \(\deg _{G}(1,1) \ge 1\) but also \(\deg _{\tau (G')}(1, n+1) \ge 1\). The basic idea is to pick \(y \in \mathcal {N}_{G}(1,1)\), \(y' \in \mathcal {N}_{G'}(1,n+1)\), remove the edges \(\left\{ (1,1), y \right\} \), \(\left\{ (1,n+1), y' \right\} \) and add instead \(\left\{ (1,1), y' \right\} \), \(\left\{ (1,n+1), y' \right\} \). This operation does not change the vertex degrees, in particular at (1, 1) and \((1,n+1)\).

For the TSP and more generally the connected \(\kappa \)-factor problem, the resulting graph \(G''\) is connected, because after removing a single edge, both graphs G and \(\tau (G')\) are still connected, and adding the new edges has the effect of connecting the two graphs (hence in this case \(\textsf{c}_{{\text {A4}}}= 0\)).

For the \(\kappa \)-bounded degree MST, we use the fact that the tree \(G \in \mathcal {F}_{n,n}\) must have at least one leaf in the set of \([n]_1\) and one in the set \([n]_2\). Therefore, we obtain a connected tree (with degree bounded by \(\kappa \)) if we add also one edge connecting two such leaves (hence is this case \(\textsf{c}_{{\text {A4}}}= 1\)). \(\square \)

3.5 Subadditivity inequality

Using all the assumptions introduced so far, in particular A4, we establish a fundamental subadditivity inequality.

Proposition 3.7

(Approximate subadditivity) Let \(\textsf{P}\) be a combinatorial optimization problem over bipartite graphs satisfying assumptions A1, A2, A3 and A4.

For a metric space \((\Omega ,\textsf{d})\) and a finite partition \(\Omega = \cup _{k=1}^K \Omega _k\), \(K \in \mathbb {N}\),

  1. (i)

    let \(\textbf{x}^0\), \(\textbf{y}^0 \subseteq \Omega \) be such that \(\min \left\{ |\textbf{x}^0|, |\textbf{y}^0| \right\} \ge \max \left\{ \textsf{c}_{{\text {A2}}}, K \right\} \),

  2. (ii)

    for every \(k =1, \ldots , K\), let \(\textbf{x}^k\), \(\textbf{y}^k \subseteq \Omega _k\) with \(|\textbf{x}^k| = |\textbf{y}^k| =n_k\), with either \(n_k \ge \textsf{c}_{{\text {A2}}}\) or \(n_k = 0\) (i.e., both families are empty),

  3. (iii)

    let \(\textbf{z}=(z_k)_{k=1}^K\) with \(z_k \in \Omega _k\), for every \(k=1,\ldots , K\).

Then, the following inequality holds:

$$\begin{aligned}{} & {} \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}^0 \cup \bigcup _{k=1}^K \textbf{x}^k, \textbf{y}^0 \cup \bigcup _{k=1}^K \textbf{y}^k \right) - \sum _{k=1}^K \mathcal {C}_{\textsf{P}}^p(\textbf{x}^k, \textbf{y}^k) \nonumber \\{} & {} \quad \lesssim \mathcal {C}_{\textsf{P}}^p(\textbf{x}^0, \textbf{y}^0)+ \textsf{M}^p(\textbf{z}, \textbf{x}^0)+ \sum _{k=1}^K {\text {diam}}(\Omega _k)^p. \end{aligned}$$
(3.7)

The implicit constant depends only upon p, \(\textsf{c}_{{\text {A2}}}\), \(\textsf{c}_{{\text {A3}}}\) and \(\textsf{c}_{{\text {A4}}}\) (in particular not on K).

Remark 3.8

The role played by the points \(\textbf{z}\) is quite marginal, and indeed if \(\textbf{x}^0(\Omega _k) >0\) for every k, then by choosing \(z_k \in \textbf{x}^0_{\Omega _k}\), the term \(\textsf{M}^p(\textbf{z}, \textbf{x}^0)\) vanishes.

Proof

Recalling (3.1), up to replacing \(\textbf{x}^0\), \(\textbf{y}^0\) with subsets \(\textbf{x}'\), \(\textbf{y}'\) with \(|\textbf{x}'| = |\textbf{y}'| = \min \left\{ |\textbf{x}^0|, |\textbf{y}^0| \right\} \), we may also assume that \(|\textbf{x}^0| = |\textbf{y}^0|\). For every \(k=1, \ldots , K\) let \(G_k \subseteq \mathcal {K}(\textbf{x}^k, \textbf{y}^k)\) be a minimizer for \(\textsf{P}\). If \(n_k=0\), then \(G_k =\emptyset \). Otherwise, \(n_k \ge \textsf{c}_{{\text {A2}}}\), and by assumption A2 it is in particular non-empty and using Markov inequality, we can choose \(x^k \in \textbf{x}^k\) such that

$$\begin{aligned} \sum _{y \in \mathcal {N}_{G_k} (x^k)} \textsf{d}(x^k, y)^p \le \frac{ 4\mathcal {C}_{\textsf{P}}^p(\textbf{x}^k, \textbf{y}^k)}{|\textbf{x}^k|}\lesssim {\text {diam}}(\Omega _k)^p. \end{aligned}$$

For the last estimate we used that \(\deg _{x^k}(G_k) \le \textsf{c}_{{\text {A3}}}\).

Similarly, let \(G_0\subseteq \mathcal {K}(\textbf{x}^0, \textbf{y}^0)\) be a (also non-empty) minimizer for \(\mathcal {C}_{\textsf{P}}^p(\textbf{x}^0, \textbf{y}^0)\) and let \(\sigma :\{1, \ldots , K\}\rightarrow \{ 1, \ldots , |\textbf{x}^0|\}\) be an optimal matching between \(\textbf{z}\) and \(\textbf{x}^0\).

We iteratively use assumptions A1 and A4 to define feasible solutions

$$\begin{aligned} \tilde{G}_k \subseteq \mathcal {K}\left( \textbf{x}^0 \cup \bigcup _{i=1}^{k} \textbf{x}^i, \textbf{y}^0 \cup \bigcup _{i=1}^{k} \textbf{y}^i \right) . \end{aligned}$$

We begin by letting \(\tilde{G}_0 = G_0\). For \(k =1, \ldots , K\), having already defined \(\tilde{G}_{k-1}\), if \(n_k = 0\), then we simply let \(\tilde{G}_k = \tilde{G}_{k-1}\). Otherwise, we obtain a feasible solution \(\tilde{G}_k\) by gluing \(G_{k}\) with \(\tilde{G}_{k-1}\) at the vertices \(x^k\), \(x_{k}^0\) and adding up to \(\textsf{c}_{{\text {A4}}}\) edges from \(\mathcal {K}(\textbf{x}^k, \textbf{y}^k)\). The fact that we can glue at any such pair of vertices is due to assumption A1: up to isomorphisms we can assume that \(x^k\) corresponds to the abstract graph vertex (1, 1) and that \(x_{k}^0\) to \((1, n_k+1)\).

This construction gives the following inequality between the graph weights, if \(n_k \ne 0\):

$$\begin{aligned} \begin{aligned}&w( \tilde{G}_k ) - w( \tilde{G}_{k-1}) - w(G_k) \le \textsf{c}_{{\text {A4}}}{\text {diam}}(\Omega _k)^p \\&\quad + \sum _{y \in \mathcal {N}_{G_k}(x^k) } \textsf{d}(x_{\sigma (k)}^0, y)^p + \sum _{y \in \mathcal {N}_{\tilde{G}_{k-1}}(x_{\sigma (k)}^0) } \textsf{d}(x^k, y)^p, \end{aligned} \end{aligned}$$
(3.8)

while if \(n_k = 0\), we simply have \(w(\tilde{G}_k) = w(\tilde{G}_{k-1})\). We bound from above the last two terms in (3.8) as follows: first,

$$\begin{aligned} \begin{aligned} \sum _{y \in \mathcal {N}_{G_k}(x^k) } \textsf{d}(x_{\sigma (k)}^0, y)^p&\lesssim \sum _{y \in \mathcal {N}_{G_k}(x^k) } \textsf{d}( x_{\sigma (k)}^0, z_k)^p + \textsf{d}( z_k, x^k)^p + \textsf{d}(x^k, y)^p\\&\lesssim \textsf{d}( x_{\sigma (k)}^0, z_k)^p + \textsf{d}( z_k, x^k)^p +\sum _{y \in \mathcal {N}_{G_k}(x^k) } \textsf{d}(x^k, y)^p \\&\lesssim \textsf{d}( x_{\sigma (k)}^0, z_k)^p + {\text {diam}}(\Omega _k)^p, \end{aligned} \end{aligned}$$

where we used that \(\deg _{x^k}(G_k) \le \textsf{c}_{{\text {A3}}}\). To bound the last term, we notice that each step in the construction we are locally merging at different points in \(\textbf{x}^0\): since no such points are adjacent because the graph is bipartite, using (3.6) by induction yields

$$\begin{aligned} \mathcal {N}_{\tilde{G}_{k-1}}(x_{\sigma (k)}^0)=\mathcal {N}_{G_0}(x_{\sigma (k)}^0), \end{aligned}$$

which in particular contains at most \(\textsf{c}_{{\text {A3}}}\) elements, since \(G_0\) is feasible. Therefore,

$$\begin{aligned} \begin{aligned} \sum _{y \in \mathcal {N}_{\tilde{G}_{k-1}}(x_{\sigma (k)}^0)} \textsf{d}(x^k, y)^p&= \sum _{y \in \mathcal {N}_{G_0}(x_{\sigma (k)}^0)} \textsf{d}(x^k, y)^p \\&\lesssim \sum _{y \in \mathcal {N}_{G_0}(x_{\sigma (k)}^0)} \textsf{d}(x^k, z_k)^p +\textsf{d}(z_k, x_{\sigma (k)}^0)^p + \textsf{d}(x_{\sigma (k)}^0, y)^p\\&\lesssim {\text {diam}}(\Omega _k)^p + \textsf{d}(z_k, x_{\sigma (k)}^0)^p + \sum _{y \in \mathcal {N}_{G_0}(x^0_{\sigma (k)})}\textsf{d}(x_{\sigma (k)}^0, y)^p. \end{aligned} \end{aligned}$$

Summing (3.8) upon \(k=1,\ldots , K\), we obtain (3.7) because

$$\begin{aligned} \sum _{k=1}^K \textsf{d}(z_k, x_{\sigma (k)}^0)^p = \textsf{M}^p(\textbf{z}, \textbf{x}^0) \end{aligned}$$

and, being all the points \(x^0_{\sigma (k)}\) different,

$$\begin{aligned} \sum _{k=1}^K \sum _{y \in \mathcal {N}_{G_0}(x^0_{\sigma (k)})}\textsf{d}(x_{\sigma (k)}^0, y)^p \le \sum _{x\in \textbf{x}^0} \sum _{y \in \mathcal {N}_{G_0}(x)}\textsf{d}(x, y)^p = \mathcal {C}_{\textsf{P}}^p(\textbf{x}^0, \textbf{y}^0). \end{aligned}$$

\(\square \)

3.6 Growth/regularity

The last assumption that we introduce for a combinatorial optimization problem \(\textsf{P}\) over bipartite graphs is a general upper bound for the cost when specialized to a geometric graph in the Euclidean cube \((0,1)^d\):

  1. A5

    (growth/regularity) There exists \(\textsf{c}_{{\text {A5}}}\ge 0\) such that, for every \(\textbf{x}, \textbf{y}\subseteq (0,1)^d\), we have

    $$\begin{aligned} \mathcal {C}_{\textsf{P}}^p(\textbf{x}, \textbf{y}) \le \textsf{c}_{{\text {A5}}}\left( \min \left\{ |\textbf{x}|^{1-\frac{p}{d}}, |\textbf{y}|^{1-\frac{p}{d}} \right\} + \textsf{M}^p(\textbf{x},\textbf{y}) \right) .\end{aligned}$$
    (3.9)

Remark 3.9

Notice that if \(\Omega \subset (0,1)^d\) then (3.9) applies in particular for \(\textbf{x},\textbf{y}\subseteq \Omega \). By scaling we obtain that for every bounded set \(\Omega \) and every \(\textbf{x},\textbf{y}\subseteq \Omega \),

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p(\textbf{x}, \textbf{y}) \le \textsf{c}_{{\text {A5}}}\left( {\text {diam}}(\Omega )^p\min \left\{ |\textbf{x}|^{1-\frac{p}{d}}, |\textbf{y}|^{1-\frac{p}{d}} \right\} + \textsf{M}^p(\textbf{x},\textbf{y}) \right) . \end{aligned}$$

Using (3.1), we obtain at once that in order to establish that a given problem \(\textsf{P}\) satisfies (3.9) it is enough to consider the case where \(\textbf{x}\), \(\textbf{y}\subseteq (0,1)^d\) have the same number of elements.

Notice that this assumption seems slightly different with respect to the previous ones, as it explicitly refers to the cost for Euclidean realizations of the graph, instead of feasible solutions, and relies as well on the assignment problem. In fact, the constant \(\textsf{c}_{{\text {A5}}}\) depends upon the problem \(\textsf{P}\) but also on the dimension d and the exponent p, which however will be fixed in our derivations so we avoid to explicitly state it.

It is well known that quite general arguments, such as the space-filling curve heuristics [46, Chapter 2], lead to an upper bound in terms of \(n^{1-p/d}\) for non-bipartite combinatorial optimization problems over n points in a cube, under very mild assumptions, including those introduced above. Simple examples (e.g. let \(\textbf{x}\) consist of points close to a given vertex of the cube and \(\textbf{y}\) instead be all close to another vertex) show that similar bounds cannot hold for their bipartite counterparts, which explains the second term in the right-hand side of (3.9).

To establish it in our examples we follow the strategy from [13], where limit results for the random Euclidean bipartite TSP for \(p=d=2\) were first obtained.

Lemma 3.10

The TSP, the connected \(\kappa \)-factor problem (as well as the non-connected one) and the \(\kappa \)-MST problems over complete bipartite graphs satisfy assumption A5 (with a constant \(\textsf{c}_{{\text {A5}}}\) depending on \(\kappa \), p, d only).

Proof

Let us first observe that the cost of the \(\kappa \)-MST problem is always bounded from above by the cost of the minimum weight connected \(\kappa \)-factor problem, since given any connected \(\kappa \)-factor, one can extract from it a MST whose degree at every vertex is then bounded by \(\kappa \). Therefore it is sufficient to check that assumption A5 holds with \(\textsf{P}\) being the connected \(\kappa \)-factor problem, for any \(\kappa \ge 2\) (the case \(\kappa =2\) being the TSP).

For \((\Omega , \textsf{d})\) a general metric space and \(\textbf{x}, \textbf{y}\subseteq \Omega \) we establish first the bound

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p(\textbf{x}, \textbf{y}) \lesssim \mathcal {C}_{\textsf{TSP}}(\textbf{x}) + \textsf{M}^p(\textbf{x}, \textbf{y}). \end{aligned}$$
(3.10)

Combining this with the fact that when \((\Omega ,\textsf{d})\) is the unit cube \((0,1)^d\) with the Euclidean distance, \(\mathcal {C}_{\textsf{TSP}}(\textbf{x}) \lesssim |\textbf{x}|^ {1-p/d}\) (a well-known fact, proved e.g. via space-filling curves) this would conclude the proof of (3.9).

Assume without loss of generality that \(|\textbf{x}| = |\textbf{y}| = n \ge \kappa \) and let \(\rho \) be a permutation over [n] that induces an optimal assignment between \(\textbf{x}\) and \(\textbf{y}\). Consider then an optimizer for the TSP over \(\mathcal {K}(\textbf{x})\), which we also identify with a permutation \(\sigma \) over [n]. We then define the feasible solution \(G \in \mathcal {F}_{n,n}\) for the connected \(\kappa \)-factor problem whose edge set is

$$\begin{aligned} E_{G} = \left\{ \left\{ (1, \sigma (i)), (2, \rho (\sigma (i+\ell )) \right\} \,: \, i \in [n], \ell \in \left\{ 0,1,\ldots , \kappa -1 \right\} \right\} , \end{aligned}$$

which generalizes \(E_{\sigma , \tau }\) from (3.5) with \(\tau = \rho \circ \sigma \) in the \(\kappa =2\) case, and as in (3.5) we use the summation \({\text {mod}} n\), i.e., \(i+ \ell = i + \ell - n\) if \(i+\ell >n\). Clearly, any vertex has degree \(\kappa \) and the graph is connected, since \(E_{G} \supset E_{\sigma , \tau }\).

In follows that

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p(\textbf{x}, \textbf{y}) \le \sum _{i=1}^n \sum _{\ell = 0}^{\alpha -1} \textsf{d}(x_{\sigma (i)}, y_{\rho ( \sigma (i+\ell ))})^p. \end{aligned}$$

Using the triangle inequality for every i and \(\ell \), we bound from above

$$\begin{aligned} \textsf{d}(x_{\sigma (i)}, y_{\rho ( \sigma (i+\ell ))})^p \lesssim \sum _{j = 0}^{\ell -1} \textsf{d}(x_{\sigma (i+j)}, x_{\sigma (i+j-1)})^p +\textsf{d}( x_{\sigma (i+\ell )}, y_{\rho ( \sigma (i+\ell ))})^p. \end{aligned}$$

Summation upon i (keeping \(\ell \) fixed) gives

$$\begin{aligned} \sum _{j = 0}^{\ell -1} \sum _{i=1}^n \textsf{d}(x_{\sigma (i+j)}, x_{\sigma (i+j-1)})^p + \sum _{i=1}^n \textsf{d}( x_{\sigma (i+\ell )}, y_{\rho ( \sigma (i+\ell ))})^p \lesssim \ell \textsf{TSP}^p(\textbf{x}) + \textsf{M}^p(\textbf{x}, \textbf{y}), \end{aligned}$$

hence, after summing upon \(\ell = 0, \ldots , \kappa -1\), we obtain (3.10). \(\square \)

4 Convergence results for Poisson point processes

4.1 Point processes

The aim of this section is to prove the analogue of Theorem 1.1 for Poisson point processes (instead of i.i.d. points).

We define a point process on \(\mathbb {R}^d\) as a random finite family of points \(\mathcal {N}= (X_i)_{i=1}^N \subseteq \mathbb {R}^d\), i.e. a N-uple of random variables with values in \(\mathbb {R}^d\), where the total number of points N is also random and a.s. finite (if \(N=0\), then \(\mathcal {N}= \emptyset \)). We extend the notation for families of points to point processes (naturally defined for each realization of the random variables): for a process \(\mathcal {N}= \left( X_i \right) _{i=1}^N\), write \(\mu ^\mathcal {N}:= \sum _{i=1}^N \delta _{X_i}\) and, given a Borel \(\Omega \subseteq \mathbb {R}^d\), let \(\mathcal {N}(\Omega ) = \mu ^{\mathcal {N}}(\Omega )\) be the (random) number of variables belonging to \(\Omega \), while \(\mathcal {N}_{\Omega }\) denotes its restriction to \(\Omega \), i.e., the collection of the variables such that \(X_i \in \Omega \) (naturally re-indexed over \(i=1, \ldots , \mathcal {N}(\Omega )\), with the order inherited from the original process). Given two point processes \(\mathcal {N}= (X_i)_{i=1}^N\), \(\mathcal {M}= (Y_j)_{j=1}^M\), their union is \(\mathcal {N}\cup \mathcal {M}= (X_1, \ldots , X_N, Y_1, \ldots , Y_M)\).

Given a finite Borel measure \(\lambda \) on \(\mathbb {R}^d\), a Poisson point process \(\mathcal {N}^\lambda \) with intensity \(\lambda \) can be constructed from a random collection of i.i.d. variables \((X_i)_{i=1}^\infty \) with common law \(\lambda /\lambda (\mathbb {R}^d)\) and, after introducing a further independent Poisson variable \(N^{\lambda }\) with mean \(\lambda (\mathbb {R}^d)\), by considering only the first \(N^{\lambda }\) variables, i.e.,

$$\begin{aligned} \mathcal {N}^\lambda := (X_i)_{i=1}^{N^{\lambda }}. \end{aligned}$$

A key property of a Poisson point process (with intensity \(\lambda \)) is that, given any countable Borel partition \(\mathbb {R}^d = \cup _{k} \Omega _k\), the variables \((\mathcal {N}^\lambda (\Omega _k))_k\) are independent Poisson variables, each with mean \(\lambda (\Omega _k)\) and, conditionally upon their value, the points in each \(\Omega _k\) are i.i.d. variables with common probability law \(\lambda \lnot \Omega _k / \lambda (\Omega _k)\). This property can be summarized by stating that the restrictions \((\mathcal {N}^\lambda _{\Omega _k})_{k}\) are independent Poisson point processes, with each \(\mathcal {N}^\lambda _{\Omega _k}\) having intensity given by the restriction \(\lambda \lnot \Omega _k\).

We will use the well-known thinning operation, which apparently dates back to Rényi [42], to split a Poisson point process \(\mathcal {N}^\lambda \) with intensity \(\lambda \) into two independent Poisson point processes, each containing approximatively a given fraction of points: for \(\eta \in [0,1]\), the \(\eta \)-thinning of a Poisson point process \(\mathcal {N}^{\lambda } = (X_i)_{i=1}^{N^\lambda }\) defines the two processes

$$\begin{aligned} \mathcal {N}^{(1-\eta )\lambda } = (X_i)_{i=1}^{N^{(1-\eta )\lambda }} \quad \text {and} \quad \mathcal {N}^{\eta \rho } = (X_{N^{(1-\eta )\lambda }+i})_{i=1}^{N^\lambda -N^{(1-\eta )\lambda }}, \end{aligned}$$

where \(N^{(1-\eta )\lambda } = \sum _{i=1}^{N} Z_i\) is defined using a further sequence of i.i.d. Bernoulli random variables \((Z_i)_{i=1}^\infty \) with \(\mathbb {P}(Z_i=1)=1-\eta \) (independent from the variables \((X_i)_i\) and \(N^{\lambda }\)). Clearly, \(\mathcal {N}^{\lambda } = \mathcal {N}^{(1-\eta )\lambda }\cup \mathcal {N}^{\eta \lambda }\), and it is straightforward to prove that both are independent and Poisson point processes with intensities respectively \((1-\eta )\lambda \) and \(\eta \lambda \).

4.2 Statement

We are in a position to state the main result to be proved in this section.

Theorem 4.1

Let \(d \ge 3\), \(p\in [1,d)\) and let \(\textsf{P}= (\mathcal {F}_{n,n})_{n \in \mathbb {N}}\) be a combinatorial optimization problem over complete bipartite graphs such that assumptions A1, A2, A3, A4 and A5 hold. Then, there exists \(\beta _{\textsf{P}} \in (0, \infty )\) (depending on p and d) such that the following holds.

Let \(\Omega \subseteq \mathbb {R}^d\) be a bounded domain with Lipschitz boundary and such that (2.14) holds. Let \(\rho \) be a Hölder continuous probability density on \(\Omega \), uniformly strictly positive and bounded from above. For every \(n\in (0, \infty )\), let \(\mathcal {N}^{n\rho }\), \(\mathcal {M}^{n\rho }\) be independent Poisson point processes with intensity \(n \rho \) on \(\Omega \). Then,

$$\begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n \rho } \right) \right] \le \beta _{\textsf{P}} \int _{\Omega } \rho ^{1-\frac{p}{d}}. \end{aligned}$$
(4.1)

Moreover, if \(\rho \) is the uniform density and \(\Omega \) is a cube or its boundary is \(C^2\), then the limit exists and equals the right-hand side.

After having introduced some general notation and proved some basic facts, we split the proof into four main cases. We deal first with the case of a uniform density on a cube and establish existence of the limit via subadditivity. Then, we consider Hölder densities on a cube and move next to general domains. Finally, we establish existence of the limit for uniform densities on domains with \(C^2\) boundary.

4.3 General facts

Although each case has its distinctive features, the underlying strategy is common and relies on Proposition 3.7 in combination with a preliminary application of the thinning operation. To avoid repetitions and introduce a general notation, we give a description of the construction and show a first lemma which uses the fundamental ideas upon which we elaborate in the next sections.

Let \(\mathcal {N}\), \(\mathcal {M}\) be two independent Poisson point processes on \(\Omega \) with common intensity given by a finite measure \(\lambda \). In our applications, \(\lambda \) is Lebesgue measure or \(\lambda = n \rho \), but for simplicity here we omit to specify it. We apply the \(\eta \)-thinning to \(\mathcal {N}= \mathcal {N}^{1-\eta } \cup \mathcal {N}^{\eta }\), obtaining independent Poisson point processes with respective intensities \((1-\eta )\lambda \), \(\eta \lambda \), and similarly to \(\mathcal {M}= \mathcal {M}^{1-\eta } \cup \mathcal {M}^{\eta }\). Given a finite Borel partition \(\Omega = \bigcup _{k=1}^K \Omega _k\), for each \(k=1, \ldots , K\), we pick a minimizer \(G_k\subseteq \mathcal {K}\left( \mathcal {N}^{1-\eta }_{\Omega _k}, \mathcal {M}^{1-\eta }_{\Omega _k} \right) \) for the problem

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p( \mathcal {N}^{1-\eta }_{\Omega _k}, \mathcal {M}^{1-\eta }_{\Omega _k}). \end{aligned}$$

Writing

$$\begin{aligned} Z_k = \min \left\{ |\mathcal {N}^{1-\eta }_{\Omega _k}|, |\mathcal {M}^{1-\eta }_{\Omega _k}| \right\} , \end{aligned}$$

we notice that \(G_k = \emptyset \) if and only if \(Z_k < \textsf{c}_{{\text {A2}}}\) (by Remark 5.5 for \(p>1\), \(G_k\) is a.s. unique. For \(p=1\) we can consider a measurable selection).

We define point processes \(\mathcal {U}\), \(\mathcal {V}\) on \(\Omega \) by setting \(\mathcal {U}_{\Omega _k} \subseteq \mathcal {N}^{1-\eta }_{\Omega _k}\), \(\mathcal {V}_{\Omega _k} \subseteq \mathcal {M}^{1-\eta }_{\Omega _k}\), given by all the points, respectively in \(\mathcal {N}^{1-\eta }_{\Omega _k}\) and \(\mathcal {M}^{1-\eta }_{\Omega _k}\), which do not belong to the set of vertices of \(G_k\). In particular, if \(G_k = \emptyset \), then \(\mathcal {U}_{\Omega _k} = \mathcal {N}^{1-\eta }_{\Omega _k}\), \(\mathcal {V}_{\Omega _k} = \mathcal {M}^{1-\eta }_{\Omega _k}\). Notice that by construction the K pairs of processes \(\left( (\mathcal {U}_{\Omega _k}, \mathcal {V}_{\Omega _k}) \right) _{k=1}^K\) are independent, but for any k the two processes \(\mathcal {U}_{\Omega _k}\), \(\mathcal {V}_{\Omega _k}\) are not in general independent. For later use, we prove:

Lemma 4.2

For every \(k=1, \ldots , K\) such that

$$\begin{aligned} \lambda (\Omega _k) > 4 \textsf{c}_{{\text {A2}}}, \end{aligned}$$
(4.2)

we have, for every \(q \ge 1\),

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}_{\Omega _k}|^q +|\mathcal {V}_{\Omega _k}|^q \right] \lesssim _{q} \lambda (\Omega _k)^{\frac{q}{2}}.\end{aligned}$$
(4.3)

Proof

In the event

$$\begin{aligned} A_k = \left\{ Z_k \ge (1-\eta )\lambda (\Omega _k)/2 \right\} , \end{aligned}$$

since \(\eta \in (0,1/2)\) we have that \(Z_k \ge \textsf{c}_{{\text {A2}}}\) hence by assumption A2, every feasible solution (in particular the optimal solution \(G_k\)) spans a subgraph of \(\mathcal {K}_{ \mathcal {N}^{1-\eta }(\Omega _k), \mathcal {M}^{1-\eta }(\Omega _k)}\) isomorphic to \(\mathcal {K}_{Z_k, Z_k}\), so that

$$\begin{aligned} |\mathcal {U}_{\Omega _k}| \le |\mathcal {N}^{1-\eta }_{\Omega _k}| - Z_k \le \left| |\mathcal {M}^{1-\eta }_{\Omega _k}| - |\mathcal {N}^{1-\eta }_{\Omega _k}| \right| . \end{aligned}$$

Using (2.26), we have

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}_{\Omega _k}|^q I_{A_k} \right] \le \mathbb {E}\left[ \left| |\mathcal {M}^{1-\eta }_{\Omega _k}| - |\mathcal {N}^{1-\eta }_{\Omega _k}| \right| ^q \right] \lesssim _q \lambda (Q_L)^{\frac{q}{2}}. \end{aligned}$$

By the union bound and (2.27) with \(n = (1-\eta ) \lambda (\Omega _k)\), \(\gamma =1/2\), we have

$$\begin{aligned} \begin{aligned} \mathbb {P}( A^c_k)&\le \mathbb {P}(|\mathcal {N}^{1-\eta }_{\Omega _k}|< (1-\eta ) \lambda (\Omega _k)/2) + \mathbb {P}(|\mathcal {M}^{1-\eta }_{\Omega _k}| < (1-\eta ) \lambda (\Omega _k)/2) ) \\&\lesssim _q \lambda (\Omega _k)^{-q}. \end{aligned}\end{aligned}$$

Therefore,

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ |\mathcal {U}_{\Omega _k}|^q I_{A^c_k} \right]&\le \mathbb {E}\left[ |\mathcal {N}^{1-\eta }_{\Omega _k}|^q I_{A^c_k} \right] \le \mathbb {E}\left[ |\mathcal {N}^{1-\eta }_{\Omega _k}|^{2q} \right] ^{\frac{1}{2}} \mathbb {P}(A^c_k)^{\frac{1}{2}} \\&\lesssim _q \lambda (\Omega _k)^{\frac{q}{2}}. \end{aligned} \end{aligned}$$

Arguing similarly for \(|\mathcal {V}_{\Omega _k}|\), we obtain (4.3). \(\square \)

For \(k = 1, \ldots , K\), we define

$$\begin{aligned} \textbf{x}^k = \mathcal {N}^{1-\eta }_{\Omega _k} \setminus \mathcal {U}_{\Omega _k}, \quad \textbf{y}^k = \mathcal {M}^{1-\eta }_{\Omega _k} \setminus \mathcal {V}_{\Omega _k}, \end{aligned}$$

so that by construction \(|\textbf{x}^k| = |\textbf{y}^k| =n_k\), with

$$\begin{aligned} n_k = {\left\{ \begin{array}{ll}Z_k &{} \text {if}\,Z_k \ge \textsf{c}_{{\text {A2}}},\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Moreover, since the optimizer \(G_k\) is a feasible solution in \(\mathcal {K}(\textbf{x}^k, \textbf{y}^k)\), we have

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^{1-\eta }_{\Omega _k}, \mathcal {M}^{1-\eta }_{\Omega _k}) = \mathcal {C}_{\textsf{P}}^p( \textbf{x}^k, \textbf{y}^k). \end{aligned}$$

We then let \(\textbf{x}^0 = \mathcal {N}^\eta \cup \mathcal {U}\), \(\textbf{y}^0 = \mathcal {M}^\eta \cup \mathcal {V}\). In the event

$$\begin{aligned} \left\{ \min \left\{ |\mathcal {N}^\eta |, |\mathcal {M}^\eta | \right\} \ge \min \left\{ K, \textsf{c}_{{\text {A2}}} \right\} \right\} , \end{aligned}$$
(4.4)

Proposition 3.7 applies for any choice of points \(\textbf{z}= (z_k)_{k=1}^K\) with \(z_k \in \Omega _k\), yielding the inequality

$$\begin{aligned}{} & {} \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}, \mathcal {M} \right) - \sum _{k=1}^K \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^{1-\eta }_{\Omega _k}, \mathcal {M}^{1-\eta }_{\Omega _k}) \lesssim \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^\eta \cup \mathcal {U}, \mathcal {M}^\eta \cup \mathcal {V})\nonumber \\ {}{} & {} + \textsf{M}^p(\mathcal {N}^\eta \cup \mathcal {U}, \textbf{z}) + \sum _{k=1}^K {\text {diam}}(\Omega _k)^p. \end{aligned}$$
(4.5)

By Remark 3.8, if also

$$\begin{aligned} \left\{ \min _{k=1,\ldots , K} \min \left\{ |\mathcal {N}^\eta _{\Omega _k}|, |\mathcal {M}^{\eta }_{\Omega _k}| \right\} \ge 1 \right\} \end{aligned}$$
(4.6)

then the term \(\textsf{M}^p(\mathcal {N}^\eta \cup \mathcal {U}, \textbf{z})\) can be removed in (4.5).

Once (4.5) is established, the next step is to take expectation and carefully estimate the “error terms” in the right-hand side. To convey the main ideas, we start with the simplest case when K is kept fixed as we let \(n \rightarrow \infty \) in the intensity of the process \(\lambda = n \rho \).

Lemma 4.3

With the notation and assumptions of Theorem 4.1, fix \(K \in \mathbb {N}\) and consider a Borel partition \(\Omega = \bigcup _{k=1}^K \Omega _k\). Then,

$$\begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n \rho } \right) \right] \le \sum _{k=1}^K \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }_{\Omega _k}, \mathcal {M}^{n \rho }_{\Omega _k} \right) \right] .\nonumber \\ \end{aligned}$$
(4.7)

Proof

We can assume that each \(\Omega _k\) is not negligible. Then, condition (4.2) with \(\lambda = n \rho \) holds if n is sufficiently large. Letting

$$\begin{aligned} A = \bigcap _{k=1}^K \left\{ \min \left\{ |\mathcal {N}^{n\eta \rho }_{\Omega _k}|, |\mathcal {M}^{n\eta \rho }_{\Omega _k}| \right\} \ge \textsf{c}_{{\text {A2}}} \right\} =\bigcap _{k=1}^K A_k, \end{aligned}$$
(4.8)

we have that both (4.4) and (4.6) hold on A.

By the union bound in combination with (2.27), we estimate, for every \(q \ge 1\),

$$\begin{aligned} \mathbb {P}(A^c) \le \sum _{k=1}^K \mathbb {P}(A^c_k) \lesssim _{q,\eta , K} n^{-q}. \end{aligned}$$
(4.9)

Combined with the trivial inequality \(\mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n\rho } \right) \lesssim |\mathcal {N}^{n\rho }|\) we obtain that

$$\begin{aligned} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n\rho } \right) I_{A^c} \right] \le \mathbb {E}\left[ |\mathcal {N}^{n\rho }|^2 \right] ^{\frac{1}{2}} \mathbb {P}(A^c)^{\frac{1}{2}} \lesssim _{q,K} n^{\frac{1-q}{2}}, \end{aligned}$$

which is infinitesimal if \(q>1\) (even without dividing by \(n^{1-p/d}\)). Therefore,

$$\begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n \rho } \right) I_A \right] = \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n \rho } \right) \right] \end{aligned}$$

and we only need to prove the following inequality, for fixed \(\eta \),

$$\begin{aligned}{} & {} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n \rho } \right) I_A \right] \\ {}{} & {} - \sum _{k=1}^K \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }_{\Omega _k}, \mathcal {M}^{n \rho }_{\Omega _k} \right) \right] \lesssim _{K} \eta ^{1-\frac{p}{d}}, \end{aligned}$$

and finally let \(\eta \rightarrow 0\) to obtain the thesis. To this aim, we multiply (4.5) by \(I_A\) and take expectation, obtaining the inequality

$$\begin{aligned}{} & {} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n\rho } \right) I_{A} \right] - \sum _{k=1}^K \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \right] \nonumber \\{} & {} \quad \lesssim \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{\eta n\rho }\cup \mathcal {U}, \mathcal {M}^{\eta n\rho }\cup \mathcal {V} \right) \right] +K. \end{aligned}$$
(4.10)

Since, for each \(k=1, \ldots , K\),

$$\begin{aligned} \begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}&\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \right] \\&= (1-\eta )^{1-\frac{p}{d}} \limsup _{n \rightarrow \infty } \left( (1-\eta )n \right) ^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \right] \\&\le \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }_{\Omega _k}, \mathcal {M}^{n\rho }_{\Omega _k} \right) \right] , \end{aligned}\end{aligned}$$

we need to focus only on the terms in the right-hand side of (4.10). Since the last term is constant, we are left with the proof of

$$\begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{\eta n\rho }\cup \mathcal {U}, \mathcal {M}^{\eta n\rho }\cup \mathcal {V} \right) \right] \lesssim \eta ^{1-\frac{p}{d}}. \end{aligned}$$
(4.11)

We first notice that by (4.3) and Hölder inequality we have for every \(q\ge 1\),

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}|^q +|\mathcal {V}|^q \right] \lesssim _{q} K^{\frac{q}{2}}n^{\frac{q}{2}}. \end{aligned}$$
(4.12)

We now use assumption A5 so that

$$\begin{aligned}{} & {} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{\eta n\rho }\cup \mathcal {U}, \mathcal {M}^{\eta n\rho }\cup \mathcal {V} \right) \right] \lesssim \mathbb {E}\left[ | \mathcal {N}^{\eta n\rho }\cup \mathcal {U}|^{1-\frac{p}{d}} \right] \nonumber \\ {}{} & {} + \mathbb {E}\left[ \textsf{M}^p\left( \mathcal {N}^{\eta n\rho }\cup \mathcal {U}, \mathcal {M}^{\eta n\rho }\cup \mathcal {V} \right) \right] . \end{aligned}$$
(4.13)

To estimate the first term in the right-hand side, we use Hölder inequality and (4.12) with \(q=1\),

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ | \mathcal {N}^{\eta n\rho }\cup \mathcal {U}|^{1-\frac{p}{d}} \right]&\lesssim \mathbb {E}\left[ |\mathcal {N}^{\eta n \rho }| \right] ^{1-\frac{p}{d}} + \mathbb {E}\left[ |\mathcal {U}| \right] ^{1-\frac{p}{d}}\\&\lesssim n^{1-\frac{p}{d}}\left( \eta ^{1-\frac{p}{d}} + C_K n^{-\frac{1}{2}(1-\frac{p}{d})} \right) .\end{aligned}\end{aligned}$$
(4.14)

For the second term, thanks to (4.12) we may use Proposition 6.3 with \(H=n^{1/2}\) and \(h=\min \left\{ \mathbb {E}\left[ |\mathcal {N}^{\eta n\rho }| \right] , \mathbb {E}\left[ |\mathcal {M}^{\eta n\rho }| \right] \right\} \sim n\eta \) so that for some \(\alpha <2\) and \(\beta >0\)

$$\begin{aligned} \mathbb {E}\left[ \textsf{M}^p\left( \mathcal {N}^{\eta n\rho }\cup \mathcal {U}, \mathcal {M}^{\eta n\rho }\cup \mathcal {V} \right) \right] \lesssim n^{1-\frac{p}{d}}\left( \eta ^{1-\frac{p}{d}} + C_{K,\eta } n^{-\frac{\beta }{2} (2-\alpha )} \right) . \end{aligned}$$

Plugging this and (4.14) in (4.13) concludes the proof of (4.11). \(\square \)

Remark 4.4

We notice that the proof above yields also the inequality

$$\begin{aligned} \begin{aligned} \liminf _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n \rho } \right) \right]&\le \liminf _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }_{\Omega _1}, \mathcal {M}^{n \rho }_{\Omega _1} \right) \right] \\&\quad + \sum _{k=2}^K \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }_{\Omega _k}, \mathcal {M}^{n \rho }_{\Omega _k} \right) \right] .\end{aligned}\nonumber \\ \end{aligned}$$
(4.15)

This follows by repeating the argument only along a subsequence \(n_\ell \rightarrow \infty \) such that

$$\begin{aligned} \lim _{\ell \rightarrow \infty } n_{\ell }^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n_\ell \rho }_{\Omega _1}, \mathcal {M}^{n_\ell \rho }_{\Omega _1} \right) \right] = \liminf _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }_{\Omega _1}, \mathcal {M}^{n \rho }_{\Omega _1} \right) \right] .\end{aligned}$$

4.4 Uniform density on a cube

In this section we consider the case of a uniform measure on a cube. Up to rescaling (see (4.18)), the thesis originally stated for a cube of e.g. of unit side length is then equivalent to consider two independent Poisson point processes \(\mathcal {N}_{Q_L}\) and \(\mathcal {M}_{Q_L}\) with intensity one on \(Q_L\) and prove that

$$\begin{aligned} f(L) =\frac{1}{|Q_L|} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}_{Q_{L}}, \mathcal {M}_{Q_L} \right) \right] \end{aligned}$$

has a limit as \(L\rightarrow \infty \).

Proposition 4.5

Let \(d \ge 3\), \(p\in [1,d)\) and let \(\textsf{P}= (\mathcal {F}_{n,n})_{n \in \mathbb {N}}\) be a combinatorial optimization problem over complete bipartite graphs such that assumptions A1, A2, A3, A4 and A5 hold. Then, there exists \(\beta _{\textsf{P}} \in (0, \infty )\) (depending on p and d) such that

$$\begin{aligned} \lim _{L \rightarrow \infty } f(L) = \beta _{\textsf{P}}. \end{aligned}$$
(4.16)

Proof

We split the proof into several steps. In the first two steps we establish basic properties of f, before moving to the main argument. This follows the strategy of the previous section and ultimately relies upon an application of Lemma 2.12.

Step 1. Continuity and upper bound. Writing \(z = \min \left\{ n,m \right\} \), we first notice that by Assumption A5 and (6.2) of Proposition 6.1,

$$\begin{aligned} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^m \right) \right] \lesssim z^{1-\frac{p}{d}} + \mathbb {E}\left[ \textsf{M}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^m \right) \right] \lesssim z^{1-\frac{p}{d}}, \end{aligned}$$

where \((X_i)_{i=1}^n\), \((Y_j)_{j=1}^m\) are i.i.d. points on \(Q_1\). This proves on the one hand that f is bounded from above as

$$\begin{aligned} f(L) \lesssim L^{p-d} \mathbb {E}\left[ \min \left\{ |\mathcal {N}_{Q_L}|, |\mathcal {M}_{Q_L}| \right\} ^{1-\frac{p}{d}} \right] \lesssim L^{p-d} \mathbb {E}\left[ |\mathcal {N}_{Q_L}| \right] ^{1-\frac{p}{d}} \lesssim 1.\qquad \end{aligned}$$
(4.17)

On the other hand, combining it with dominated convergence it also gives continuity of f thanks to the representation formula

$$\begin{aligned} \begin{aligned} f(L)&= \sum _{n, m=0}^\infty \frac{1}{L^d} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p(\mathcal {N}_{Q_{L}}, \mathcal {M}_{Q_L}) \Big | |\mathcal {N}_{Q_L}|=n, |\mathcal {M}_{Q_L}|=m \right] e^{-2L^d} \frac{ (L^d)^{n+m}}{n! m!}\\&= L^{p-d} \sum _{n, m=0}^\infty \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^m \right) \right] e^{-2L^d} \frac{ (L^d)^{n+m}}{n! m!}. \end{aligned} \end{aligned}$$

We also notice that by a simple scaling argument, if \(\mathcal {N}^{\lambda }\), \(\mathcal {M}^{\lambda }\) are independent Poisson processes of intensity \(\lambda >0\) on \(Q_L\) then

$$\begin{aligned} \frac{1}{|Q_L|} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}_{Q_{L}}^{\lambda }, \mathcal {M}_{Q_L}^{\lambda } \right) \right] = \frac{\lambda ^{1-\frac{p}{d}} }{|Q_{\lambda L}|} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}_{Q_{\lambda ^{\frac{1}{d}} L}}^1, \mathcal {M}_{Q_{\lambda ^{\frac{1}{d}} L}}^1 \right) \right] = \lambda ^{1-\frac{p}{d}} f(\lambda ^{\frac{1}{d}} L).\nonumber \\ \end{aligned}$$
(4.18)

Combined with (4.17), it yields that for any cube Q and \(\lambda >0\),

$$\begin{aligned} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}_{Q}^{\lambda }, \mathcal {M}_{Q}^{\lambda } \right) \right] \lesssim |Q| \lambda ^{1-\frac{p}{d}}. \end{aligned}$$
(4.19)

Step 2. Lower bound. The spanning assumption A2 yields that, if e.g. \(\textsf{c}_{{\text {A2}}}\le n \le m\), then

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^m \right) \ge \sum _{i=1}^n \min _{j=1, \ldots , m} |X_i - Y_j|^p. \end{aligned}$$

The following classical lower bound, e.g. proved in [46, Chapter 2],

$$\begin{aligned} \mathbb {E}\left[ \min _{j=1, \ldots , m} |X_i - Y_j|^p \right] \gtrsim m^{-\frac{p}{d}} \end{aligned}$$

entails that

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^m \right) \gtrsim m^{-\frac{p}{d}} \cdot n. \end{aligned}$$

Writing \(Z = \min \left\{ |\mathcal {N}_{Q_L}|, |\mathcal {M}_{Q_L}| \right\} \), we deduce that

$$\begin{aligned} f(L) \gtrsim L^{p-d} \mathbb {E}\left[ \max \left\{ |\mathcal {N}_{Q_L}|,|\mathcal {M}_{Q_L}| \right\} ^{-\frac{p}{d}} Z I_{\left\{ Z\ge \textsf{c}_{{\text {A2}}} \right\} } \right] . \end{aligned}$$

Let

$$\begin{aligned} A = \left\{ |Q_L|/2 \le Z \le \max \left\{ |\mathcal {N}_{Q_L}|, |\mathcal {M}_{Q_L}| \right\} \le 3|Q_L|/2 \right\} . \end{aligned}$$

By (2.27) with \(\eta =1/2\), we have \(\mathbb {P}(A) \rightarrow 1\) as \(L \rightarrow \infty \). Therefore if L is large enough,

$$\begin{aligned} f(L) \gtrsim L^{p-d} \mathbb {E}\left[ \max \left\{ |\mathcal {N}_{Q_L}|,|\mathcal {M}_{Q_L}| \right\} ^{-\frac{p}{d}} Z I_{A} \right] \gtrsim L^{p-d} \mathbb {E}\left[ L^{d-p} I_{A} \right] \gtrsim 1. \end{aligned}$$

In the remaining steps we prove the following claim. There exists \(\beta =\beta (p,d)>0\) such that for every \(\eta \in (0,1/2)\), there exists \(C(\eta )>0\) such that, for every \(m \in \mathbb {N}\), \(m \ge 1\) and \(L \ge C(\eta )\),

$$\begin{aligned} f(mL)-f( (1-\eta )L)\lesssim \eta ^{1-\frac{p}{d}} + C(\eta ) L^{-\beta }. \end{aligned}$$
(4.20)

This would conclude the proof of (4.16) by Lemma 2.12.

Step 3. Partitioning and exclusion of the event in which few points are sampled.

Using the notation from Sect. 4.3, we partition \(\Omega = Q_{mL}\) into \(K=m^d\) cubes \(Q_i = Q_L + L z_i \subseteq Q_{mL}\) with \(z_i\in \mathbb {Z}^d\) and two independent Poisson processes \(\mathcal {N}\), \(\mathcal {M}\) of unit intensity on \(Q_{mL}\).

We first reduce to the event

$$\begin{aligned} A = \left\{ \min \left\{ |\mathcal {N}^{\eta }_{Q_{mL}}|, |\mathcal {M}^{\eta }_{Q_{mL}}| \right\} \ge \eta |Q_{mL}|/2 \right\} , \end{aligned}$$

which contains (4.4) provided L is sufficiently large (depending on \(\eta \) only, not on m). We first argue that \(A^c\) is of small probability. Indeed, using a union bound we find that for every \(q \ge 1\),

$$\begin{aligned} \mathbb {P}(A^c) \le \mathbb {P}\left( |\mathcal {N}^{\eta }_{Q_{mL}}|< |Q_{mL}|/2 \right) +\mathbb {P}\left( |\mathcal {M}^{\eta }_{Q_{mL}}| < |Q_{mL}|/2 \right) {\mathop {\lesssim _{\eta ,q}}\limits ^{(2.27)}} |Q_{mL}|^{-q}. \end{aligned}$$

If \(A^c\) holds, we use the trivial bound that follows from Assumption A3:

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p( \mathcal {N}_{Q_{mL}}, \mathcal {M}_{Q_{mL}}) \lesssim |\mathcal {N}_{Q_{mL}}| |Q_{mL}|^{\frac{p}{d}}, \end{aligned}$$

so that for any given \(\beta >0\) and provided we choose q sufficiently large.

$$\begin{aligned} \begin{aligned} \frac{1}{|Q_{mL}|} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p( \mathcal {N}_{Q_{mL}}, \mathcal {M}_{Q_{mL}}) I_{A^c} \right]&\lesssim |Q_{mL}|^{\frac{p}{d}-1} \mathbb {E}\left[ |\mathcal {N}_{Q_{mL}}|^2 \right] ^{\frac{1}{2}} \mathbb {P}(A^c)^{\frac{1}{2}} \\&\lesssim _{\eta ,q} |Q_{mL}|^{\frac{p}{d}-1} |Q_{mL}| \cdot |Q_{mL}|^{-q} \lesssim _\eta L^{-\beta }. \end{aligned} \end{aligned}$$
(4.21)

If A holds, letting \(\textbf{z}\) be the set of centres of the \(m^d\) cubes, inequality (4.5) reads

$$\begin{aligned} \begin{aligned} \mathcal {C}_{\textsf{P}}^p( \mathcal {N}_{Q_{mL}}, \mathcal {M}_{Q_{mL}}) - \sum _{i=1}^{m^d} \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^{1-\eta }_{Q_i}, \mathcal {M}^{1-\eta }_{Q_i})&\lesssim \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^\eta \cup \mathcal {U}, \mathcal {M}^\eta \cup \mathcal {V}) \\&\quad + \textsf{M}^p(\mathcal {N}^\eta \cup \mathcal {U}, \textbf{z}) + m^d L^p. \end{aligned} \end{aligned}$$
(4.22)

Notice that by the properties of the Poisson point process, the law of \(\mathcal {C}_{\textsf{P}}^p(\mathcal {N}^{1-\eta }_{Q_i}\), \(\mathcal {M}^{1-\eta }_{Q_i})\) equals that of \(\mathcal {C}_{\textsf{P}}^p(\mathcal {N}^{1-\eta }_{Q_L}\), \(\mathcal {M}^{1-\eta }_{Q_L})\). In particular

$$\begin{aligned} \begin{aligned} \frac{1}{|Q_L|} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^{1-\eta }_{Q_i}, \mathcal {M}^{1-\eta }_{Q_i})I_A \right]&\le \frac{1}{|Q_L|} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^{1-\eta }_{Q_i}, \mathcal {M}^{1-\eta }_{Q_i}) \right] \\&= \frac{1}{|Q_L|} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^{1-\eta }_{Q_L}, \mathcal {M}^{1-\eta }_{Q_L}) \right] \\&{\mathop {=}\limits ^{(4.18)}} (1-\eta )^{1-\frac{p}{d}}f((1-\eta )^{\frac{1}{d}}L) \le f((1-\eta )^{\frac{1}{d}}L).\end{aligned} \end{aligned}$$

We thus obtain from (4.22),

$$\begin{aligned}{} & {} \frac{1}{|Q_{mL}|} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p( \mathcal {N}_{Q_{mL}}, \mathcal {M}_{Q_{mL}}) I_A \right] - f((1-\eta )^{\frac{1}{d}}L) \lesssim \frac{1}{|Q_{mL}|}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^\eta \cup \mathcal {U}, \mathcal {M}^\eta \cup \mathcal {V}) \right] \\{} & {} \quad + \frac{1}{|Q_{mL}|} \mathbb {E}\left[ \textsf{M}^p(\mathcal {N}^\eta \cup \mathcal {U}, \textbf{z}) I_A \right] + L^{p-d}. \end{aligned}$$

In the final two steps we prove that

$$\begin{aligned} \frac{1}{|Q_{mL}|} \mathbb {E}\left[ \textsf{M}^p(\mathcal {N}^\eta \cup \mathcal {U}, \textbf{z}) I_A \right] \lesssim L^{p-d} \end{aligned}$$
(4.23)

and

$$\begin{aligned} \frac{1}{|Q_{mL}|}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^\eta \cup \mathcal {U}, \mathcal {M}^\eta \cup \mathcal {V}) \right] \lesssim \eta ^{1-\frac{p}{d}} + C(\eta ) L^{-\beta }. \end{aligned}$$
(4.24)

In combination with (4.21) this would conclude the proof of (4.20).

Step 4. Proof of (4.23). On A, we have \(|\mathcal {N}^{\eta }_{Q_{mL}}| \ge m^d\), thus (randomly) choosing \(m^d\) points from \(\mathcal {N}^{\eta }\), we find after relabelling a family \((X_i)_{i=1}^{m^d}\) of points i.i.d. and uniformly distributed on \(Q_{mL}\). Recalling that \(\textbf{z}\) denotes the set of centres of the \(m^d\) cubes \(Q_i\) we can bound

$$\begin{aligned} \mathbb {E}\left[ \textsf{M}^p(\mathcal {N}^\eta \cup \mathcal {U}, \textbf{z}) I_A \right] \le \mathbb {E}\left[ \textsf{M}^p((X_i)_{i=1}^{m^d}, \textbf{z}) \right] . \end{aligned}$$

We then use (3.4) with \(n=m^d\) and \(\lambda \) the uniform density on the cube \(Q_{mL}\), so that

$$\begin{aligned}{} & {} \mathbb {E}\left[ \textsf{M}^p((X_i)_{i=1}^{m^d}, \textbf{z}) \right] \lesssim \mathbb {E}\left[ \textsf{W}^p_{Q_{mL}}\left( \sum _{i=1}^{m^d} \delta _{X_i}, \frac{m^d}{|Q_{mL}|} \right) \right] \\ {}{} & {} + \mathbb {E}\left[ \textsf{W}^p_{Q_{mL}}\left( \mu ^\textbf{z}, \frac{m^d}{|Q_{mL}|} \right) \right] \lesssim m^d L^p, \end{aligned}$$

having used (6.1) to bound the first term (the second term is trivially estimated by transporting the mass on each cube \(Q_i\) to its center). This proves (4.23).

Step 5. Proof of (4.24). We use Assumption A5 (on \(Q_{mL}\) instead of \(Q_1\), see Remark 3.9), so that

$$\begin{aligned}{} & {} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p(\mathcal {N}^\eta \cup \mathcal {U}, \mathcal {M}^\eta \cup \mathcal {V}) \right] \lesssim (mL)^p\mathbb {E}\left[ (|\mathcal {N}^\eta _{Q_{mL}}|+|\mathcal {U}|)^{1-\frac{p}{d}} \right] \\ {}{} & {} + \mathbb {E}\left[ \textsf{M}^p(\mathcal {N}^\eta \cup \mathcal {U}, \mathcal {M}^\eta \cup \mathcal {V}) \right] . \end{aligned}$$

We further bound the first contribution using Hölder inequality

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ (|\mathcal {N}^\eta _{Q_{mL}}|+|\mathcal {U}|)^{1-\frac{p}{d}} \right]&\le \mathbb {E}\left[ |\mathcal {N}^{\eta }_{Q_{mL}}| \right] ^{1-\frac{p}{d}}+ \mathbb {E}\left[ |\mathcal {U}| \right] ^{1-\frac{p}{d}} \\&\lesssim \eta ^{1-\frac{p}{d}} (mL)^{d-p} + \mathbb {E}\left[ |\mathcal {U}| \right] ^{1-\frac{p}{d}}. \end{aligned} \end{aligned}$$
(4.25)

To proceed further, let us recall that in Sect. 4.3 we argued that \(\left( \left( \mathcal {U}_{Q_i}, \mathcal {V}_{Q_i} \right) \right) _{i=1}^{m^d}\) are independent (and also independent from \(\mathcal {N}^{\eta }\), \(\mathcal {M}^{\eta }\)). Moreover, since the law of each \(\left( \mathcal {N}^{1-\eta }_{Q_i}, \mathcal {M}^{1-\eta }_{Q_i} \right) \) coincides with that of \(\left( \mathcal {N}^{1-\eta }_{Q_L}, \mathcal {M}^{1-\eta }_{Q_L} \right) \) (up to a translation by \(-Lz_i\), since \(Q_i = Q_L+Lz_i\)) it follows that the same property holds for the processes \(\left( \mathcal {U}_{Q_i}, \mathcal {V}_{Q_i} \right) \): their law coincides with that of \(\left( \mathcal {U}_{Q_L}, \mathcal {V}_{Q_L} \right) \) (also up to translating by \(-Lz_i\)).

Using (4.3) with \(q=1\), we obtain

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}| \right] = m^d \mathbb {E}\left[ |\mathcal {U}_{Q_L}| \right] \lesssim m^d L^{\frac{d}{2}}, \end{aligned}$$

thus (4.25) yields

$$\begin{aligned} \frac{(mL)^p}{|Q_{mL}|}\mathbb {E}\left[ (|\mathcal {N}^\eta _{Q_{mL}}|+|\mathcal {U}|)^{1-\frac{p}{d}} \right] \lesssim \eta ^{1-\frac{p}{d}} + L^{\frac{p-d}{2}}. \end{aligned}$$

Combining this with Theorem 6.5, concludes the proof of (4.24). \(\square \)

4.5 Hölder density on a cube

In this section, we still assume that \(\Omega = Q\) is a cube, but consider the case of a general Hölder continuous density \(\rho \), uniformly bounded from above and below. Up to rescaling and translation, it is sufficient to consider the case \(\Omega = (0,1)^d\).

The proof of (4.1) in this case is obtained by combining the case of constant density treated above together with Lemma 4.3 and the following claim: there exists a constant \(C = C(\rho )>0\) such that, for \(r<C\) and for every cube \(Q \subseteq (0,1)^d\) with side length r the following inequality holds:

$$\begin{aligned} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n \rho }_{Q}, \mathcal {M}^{n \rho }_{Q} \right) \right] \le (1+C^{-1} r^\alpha ) \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n \rho (Q)/r^d}, \mathcal {M}^{n \rho (Q)/r^d} \right) \right] , \end{aligned}$$
(4.26)

where \(\mathcal {N}^{n \rho (Q)/r^d}\), \(\mathcal {M}^{n \rho (Q)/r^d}\) are two independent Poisson point processes with constant intensity \(n \rho (Q)/r^d\) on the cube \((0,r)^d\), and \(\alpha \) denotes the Hölder exponent of \(\rho \).

Indeed, assume that the claim holds and let us prove (4.1). Given any \(r< C(p,\rho )\) of the form \(r = 1/K^{1/d}\), we consider a partition of \((0,1)^d = \bigcup _{k=1}^K Q_k\) into K disjoint sub-cubes of side length r, so that

$$\begin{aligned} \begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}&\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n \rho }, \mathcal {M}^{n \rho } \right) \right] \\&{\mathop {\le }\limits ^{(4.7)}} \sum _{k=1}^{K} \limsup _{n \rightarrow \infty }n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n \rho }_{Q_k}, \mathcal {M}^{n \rho }_{Q_k} \right) \right] \\&{\mathop {\le }\limits ^{(4.26)}} (1+C^{-1} r^\alpha )\sum _{k=1}^{K} \limsup _{n \rightarrow \infty }n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n \rho (Q_k)/r^d}, \mathcal {M}^{n \rho (Q_k)/r^d} \right) \right] \\&= \beta _{\textsf{P}} (1+C^{-1} r^\alpha ) \sum _{k=1}^{K} \rho (Q_k)^{1-\frac{p}{d}} r^p, \end{aligned} \end{aligned}$$

where the last line follows from (4.1) in the case of a cube and constant intensity. Letting \(K \rightarrow \infty \), we have that \(r\rightarrow 0\) and the easily seen convergence

$$\begin{aligned} \lim _{K \rightarrow \infty } \sum _{k=1}^{K} \rho (Q_k)^{1-\frac{p}{d}} r^p = \lim _{K \rightarrow \infty }\int _{(0,1)^d} \sum _{k=1}^K I_{Q_k} \left( \frac{ \rho (Q_k)}{r^d} \right) ^{-\frac{p}{d}} \rho = \int _{(0,1)^d} \rho ^{1-\frac{p}{d}}. \end{aligned}$$

This would conclude the proof of (4.1) also in this case.

We now prove (4.26) for which we closely follow [4, Lemma 2.5]. Up to translating, we may assume that \(Q = (0,r)^d\). We write \(\rho _0 = \min _{(0,1)^d} \rho \) and define \(\rho ^r(x) = \rho (rx)r^d/\rho (Q)\) for \(x \in (0,1)^d\), so that \(\int _{(0,1)^d} \rho ^r = 1\), and for every x, \(y\in (0,1)^d\),

$$\begin{aligned} \rho ^r(x) - \rho ^r(y) \le \frac{ \left\| \rho \right\| _{C^\alpha }}{\rho _0} r^\alpha |x-y|^\alpha , \end{aligned}$$

thus \(\left\| \rho ^r - 1 \right\| _{C^\alpha }\lesssim r^\alpha \) if r is sufficiently small. We define \(S: Q \rightarrow Q\) as \(S(x) = r T^{-1}(x/r)\), where T is the map provided by Proposition 2.11. It holds \({\text {Lip}}S = {\text {Lip}}T^{-1}\), and \(S_\sharp 1/r^d = \rho /\rho (Q)\). Therefore, \(S\left( \mathcal {N}^{n\rho (Q)/r^d} \right) = (S(X_i))_{i=1}^{N^{n\rho (Q)/r^d}(Q)}\), which is a Poisson point process on Q with intensity \(n \rho \), i.e., it has the same law as \(\mathcal {N}^{n \rho }_Q\), and similarly \(S\left( \mathcal {M}^{n\rho (Q)/r^d} \right) \) has the same law as \(\mathcal {M}^{n\rho }_Q\). Therefore,

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n \rho }_{Q}, \mathcal {M}^{n \rho }_{Q} \right) \right]&= \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( S(\mathcal {N}^{n \rho (Q)/r^d}), S(\mathcal {M}^{n \rho (Q)/r^d}) \right) \right] \\&{\mathop {\le }\limits ^{(3.2)}} ({\text {Lip}}S)^p \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n \rho (Q)/r^d}, \mathcal {M}^{n \rho (Q)/r^d} \right) \right] . \end{aligned} \end{aligned}$$

This proves the claim since \(({\text {Lip}}S)^p = ({\text {Lip}}T^{-1})^p \le 1+C r^\alpha \) if r is sufficiently small.

Remark 4.6

Let us notice that the fact that \(\Omega \) is a cube is not used in the proof of (4.26), which therefore holds true for every bounded domain \(\Omega \) and Hölder continuous density \(\rho \) uniformly bounded from above and below. In particular, combining (4.26) with (4.19) we obtain that there exists \(C=C(\rho )>0\) such that, for every cube \(Q \subseteq \Omega \) with side length \(r<C\),

$$\begin{aligned} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n \rho }_{Q}, \mathcal {M}^{n \rho }_{Q} \right) \right] \lesssim |Q| n^{1-\frac{p}{d}}, \end{aligned}$$
(4.27)

where the implicit constant depends on p, d and \(\rho \) only.

4.6 General density on a domain

We prove (4.1) for a domain \(\Omega \) and a Hölder density \(\rho \). The main difficulty here is that since we rely on the result established in the previous section we need to partition \(\Omega \) into cubes. This is accomplished relying on the Whitney-type decomposition provided by Lemma 2.1. We begin by fixing a Whitney decomposition \(\mathcal {Q}=(Q_i)_i\) such that every cube \(Q_i\) has side length \(r< C\), where \(C=C(\rho )\) is as in Remark 4.6. Then, by Lemma 2.1, for every sufficiently small \(\delta >0\) we have a finite Borel partition of \(\Omega = \bigcup _{k=1}^K \Omega _k\), whose elements are collected into the two disjoint sets \(\mathcal {Q}_{\delta }\), \(\mathcal {R}_{\delta }\).

We fix \(\eta \in (0,1/2)\) and use the construction from Sect. 4.3. We set \(\delta =n^{-\gamma }\) for \(\gamma >0\) to be fixed below. The first constraint is that (4.2) holds with \(\lambda =n\rho \) so that we need \(n\delta ^d\gg 1\), i.e.

$$\begin{aligned} \gamma d<1. \end{aligned}$$
(4.28)

We first reduce to the case when there are many points in each \(\Omega _k\). Defining the event A as in (4.8) and arguing as in (4.9) gives here, for every \(q>0\), the inequality

$$\begin{aligned} \begin{aligned} \mathbb {P}(A^c)&\lesssim _{q,\eta } \sum _{k=1}^K (n |\Omega _k|)^{-q} \lesssim _{q,\eta } n^{-q} \delta ^{1-d-dq} = n^{-q(1-d\gamma )+(d-1)\gamma }, \end{aligned} \end{aligned}$$

where we used (2.1) with \(\alpha = -qd\) in the second inequality. Under the assumption (4.28) this is infinitesimal provided q is chosen sufficiently large. Arguing exactly as before we can thus reduce ourselves to the case where A holds. In that case, both (4.4) and (4.6) hold and thus by (4.5)

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n\rho } \right) I_{A} \right]&- \sum _{k=1}^K \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \right] \\&\qquad \lesssim \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{\eta n\rho }\cup \mathcal {U}, \mathcal {M}^{\eta n\rho }\cup \mathcal {V} \right) \right] + \sum _{k=1}^K {\text {diam}}(\Omega _k)^p. \end{aligned} \end{aligned}$$
(4.29)

We start by considering the left-hand side of (4.29). For \(\Omega _k\in \mathcal {R}_\delta \) we use the simple bound \( \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \lesssim {\text {diam}}(\Omega _k)^p |\mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}|\), to estimate

$$\begin{aligned} \begin{aligned} n^{\frac{p}{d}-1}\sum _{\Omega _k \in \mathcal {R}_\delta } \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \right]&\lesssim n^{\frac{p}{d}-1} \delta ^p \sum _{\Omega _k \in \mathcal {R}_\delta } \mathbb {E}\left[ |\mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}| \right] \\&\lesssim n^{\frac{p}{d}-1} \delta ^p \cdot \delta ^{1-d} \cdot n \delta ^d = n^{-\gamma + \frac{p}{d}(1-d\gamma )}. \end{aligned} \end{aligned}$$

This tends to zero provided \(\gamma d >p/(p+1)\) which is in particular true if (recall that \(p<d\))

$$\begin{aligned} \gamma d > d/(d+1). \end{aligned}$$
(4.30)

Notice that this condition is compatible with (4.28). Under condition (4.30) we thus have

$$\begin{aligned}{} & {} \limsup _{n\rightarrow \infty } n^{1-\frac{p}{d}} \sum _{k=1}^K \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \right] \\{} & {} \quad =\limsup _{n\rightarrow \infty } \sum _{\Omega _k\in \mathcal {Q}_\delta } n^{1-\frac{p}{d}} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \right] . \end{aligned}$$

Since every \(\Omega _k\in \mathcal {Q}_\delta \) is a cube, we may combine (4.1) in \(\Omega _k\) together with the precise limit procedure, justified by the domination given in (4.27) (this is why each cube \(Q_i\) in the Whitney partition has side length \(r<C\)), to obtain

$$\begin{aligned} \limsup _{n\rightarrow \infty } n^{1-\frac{p}{d}} \sum _{k=1}^K \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }_{\Omega _k}, \mathcal {M}^{(1-\eta )n\rho }_{\Omega _k} \right) \right] \le (1-\eta )^{1-\frac{p}{d}} \int _\Omega \rho ^{1-\frac{p}{d}}. \end{aligned}$$

We now turn to the right-hand side of (4.29). The last term is easily estimated using directly (2.1) with \(\alpha =p\). In particular, if \(p<d-1\) we notice that

$$\begin{aligned} n^{\frac{p}{d}-1} \sum _{k=1}^K {\text {diam}}(\Omega _k)^p\lesssim n^{\frac{p}{d}-1} \delta ^{1-(d-p)}=(n\delta ^d)^{-(1-\frac{p}{d})} \delta \end{aligned}$$

which goes to zero if (4.28) holds.

We finally estimate the first term in the right-hand side of (4.29). We argue as in (4.13) and (4.14) which we combine with Proposition 6.4 to obtain that for every \(\varepsilon >0\),

$$\begin{aligned}{} & {} n^{1-\frac{p}{d}}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{\eta n\rho }\cup \mathcal {U}, \mathcal {M}^{\eta n\rho }\cup \mathcal {V} \right) \right] \lesssim \mathbb {E}\left[ |\mathcal {U}|/n \right] ^{1-\frac{p}{d}}+\eta ^{1-\frac{p}{d}}\nonumber \\{} & {} \quad +C(\eta ,\varepsilon ,\gamma ) n^{\varepsilon }\left( \left( \max \left\{ n^{\frac{p}{d}}\delta ^{p+1},n^{\frac{2}{d}} \delta ^3 \right\} \right) ^{\alpha } +\left( n\delta ^d \right) ^{-\beta } \right) . \end{aligned}$$
(4.31)

Using (2.1) with \(\alpha = d/2<d-1\) we have

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}|/n \right] ^{1-\frac{p}{d}}\lesssim \left( \sum _{k=1}^K (|\Omega _k|/n)^{\frac{1}{2}} \right) ^{1-\frac{p}{d}} \lesssim n^{\frac{1}{2}} \delta ^{1-\frac{d}{2}} = \left( \delta (n\delta ^d)^{-\frac{1}{2}} \right) ^{1-\frac{p}{d}}. \end{aligned}$$

Under condition (4.28) this term goes to zero. Regarding the term inside brackets in (4.31) we notice that if \(q=\max \left\{ p,2 \right\} \), then under condition (4.28),

$$\begin{aligned} \max \left\{ n^{\frac{p}{d}}\delta ^{p+1},n^{\frac{2}{d}} \delta ^3 \right\} =n^{-\gamma + \frac{q}{d}(1-d\gamma )}. \end{aligned}$$

In particular, as above this term goes to zero under condition (4.30).

We can thus choose first \(\gamma \) satisfying both (4.28) and (4.30) and then \(\varepsilon =\varepsilon (\alpha ,\beta ,\gamma ,p)>0\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } n^{\varepsilon }\left( \left( \max \left\{ n^{\frac{p}{d}}\delta ^{p+1},n^{\frac{2}{d}} \delta ^3 \right\} \right) ^{\alpha } +\left( n\delta ^d \right) ^{-\beta } \right) =0. \end{aligned}$$

With this choice we find

$$\begin{aligned} \limsup _{n\rightarrow \infty }n^{1-\frac{p}{d}}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{\eta n\rho }\cup \mathcal {U}, \mathcal {M}^{\eta n\rho }\cup \mathcal {V} \right) \right] \lesssim \eta ^{1-\frac{p}{d}}, \end{aligned}$$

from which we conclude the proof of (4.1) after sending \(\eta \rightarrow 0\).

4.7 Uniform density on a domain

In this last case, we assume that \(\Omega \) is a bounded domain with \(C^2\) boundary and \(\rho = I_{\Omega }/|\Omega |\) is uniform. After a simple rescaling, it is more convenient to argue with Poisson point processes \(\mathcal {N}^n_{\Omega }\), \(\mathcal {M}^{n}_\Omega \) with constant intensity n (on \(\Omega \)) so that the thesis reduces to

$$\begin{aligned} \lim _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n}_{\Omega }, \mathcal {M}^{n}_{\Omega } \right) \right] = \beta _{\textsf{P}} |\Omega |. \end{aligned}$$

Since the boundary of \(\Omega \) is \(C^2\), we can apply the result from the previous section and obtain the upper bound

$$\begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n}_{\Omega }, \mathcal {M}^{n}_{\Omega } \right) \right] \le \beta _{\textsf{P}} |\Omega |. \end{aligned}$$

To prove the corresponding lower bound, we follow closely the argument of [6, Theorem 24]: we fix a cube Q sufficiently large so that \(\Omega \subseteq Q\) and introduce a Poisson point process \(\mathcal {N}^n_{Q}\) with intensity n on Q. For \(k=2,\ldots , K\), let \(\Omega _k\) be the connected components of \(Q\backslash \Omega \) so that \(Q\backslash \Omega =\cup _{k=2}^K\Omega _k\). Notice that for every k either \(\partial \Omega _k\) is \(C^2\) or is the union of \(\partial Q\) and a \(C^2\) surface. In particular each \(\Omega _k\) satisfies (2.14). Using (4.15) with the decomposition \(Q = \Omega \cup \bigcup _{k=2}^K \Omega _k\), we obtain

$$\begin{aligned} \begin{aligned} \beta _{\textsf{P}} |Q|=\liminf _{n\rightarrow \infty }n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n}_{Q}, \mathcal {M}^{n}_Q \right) \right]&\le \liminf _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n}_{\Omega }, \mathcal {M}^{n}_{\Omega } \right) \right] \\&\quad + \sum _{k=2}^K\limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n}_{ \Omega _k}, \mathcal {M}^{n}_{ \Omega _k} \right) \right] . \end{aligned} \end{aligned}$$

Now for every k, using (4.1) we have

$$\begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n}_{ \Omega _k}, \mathcal {M}^{n}_{ \Omega _k} \right) \right] \le \beta _{\textsf{P}}|\Omega _k|. \end{aligned}$$

Therefore,

$$\begin{aligned} \liminf _{n \rightarrow \infty } n^{\frac{p}{d}-1}\mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n}_{\Omega }, \mathcal {M}^{n}_{\Omega } \right) \right] \ge \beta _{\textsf{P}} |Q| - \beta _\textsf{P}\sum _{k=2}^K |\Omega _k| = \beta _{\textsf{P}} | Q|, \end{aligned}$$

which is the desired conclusion.

5 Proof of main result

From Theorem 4.1, we deduce our main result Theorem 1.1. We follow a relatively standard strategy, using de-Poissonization and concentration of measure arguments with the necessary adjustments to deal with our setting. First, we argue that Theorem 4.1 yields similar convergence in the case of a deterministic number of independent points. It is worth mentioning that from the stochastic geometry point of view, this case can be also seen as a random point process, often called Binomial point process, see e.g. [17, section 2.2], [32, chapter 3] or [34, example 2.3].

Proposition 5.1

Let \(d \ge 3\), \(p\in [1,d)\) and let \(\textsf{P}= (\mathcal {F}_{n,n})_{n \in \mathbb {N}}\) be a combinatorial optimization problem over complete bipartite graphs such that assumptions A1, A2, A3, A4 and A5 hold. Then, with \(\beta _{\textsf{P}}\in (0, \infty )\) given by Theorem 4.1 the following hold.

Let \(\Omega \subseteq \mathbb {R}^d\) be a bounded domain with Lipschitz boundary and such that (2.14) holds and let \(\rho \) be a Hölder continuous probability density on \(\Omega \), uniformly strictly positive and bounded from above.

Given i.i.d. random variables \((X_i)_{i=1}^\infty \), \((Y_j)_{j=1}^\infty \) with common law \(\rho \), we have

$$\begin{aligned} \limsup _{n \rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) \right] \le \beta _{\textsf{P}} \int _{\Omega } \rho ^{1-\frac{p}{d}}. \end{aligned}$$
(5.1)

Moreover, if \(\rho \) is the uniform density and \(\Omega \) is either a cube or has \(C^2\) boundary, the limit exists and is equal to the right-hand side.

Remark 5.2

The only properties we used to established Proposition 5.1 are the subadditivity property (3.7), the growth condition (3.9) as well as the \(p-\)homogeneity of the problem. In particular it holds for every bipartite \(p-\)homogeneous functional \(\mathcal {C}\) satisfying

  • For every \(\Omega \subset \mathbb {R}^d\) and every partition \(\Omega = \cup _{k=1}^K \Omega _k\), \(K \in \mathbb {N}\), if \(\textbf{x}^0\), \(\textbf{y}^0 \subseteq \Omega \) are such that \(\min \left\{ |\textbf{x}^0|, |\textbf{y}^0| \right\} \ge \max \left\{ \textsf{c}_{{\text {A2}}}, K \right\} \), for every \(k =1, \ldots , K\), \(\textbf{x}^k\), \(\textbf{y}^k \subseteq \Omega _k\) are such that \(|\textbf{x}^k| = |\textbf{y}^k| =n_k\), with either \(n_k \ge \textsf{c}_{{\text {A2}}}\) or \(n_k = 0\) and \(\textbf{z}=(z_k)_{k=1}^K\) with \(z_k \in \Omega _k\), for every \(k=1,\ldots , K\) then

    $$\begin{aligned}{} & {} \mathcal {C}\left( \textbf{x}^0 \cup \bigcup _{k=1}^K \textbf{x}^k, \textbf{y}^0 \cup \bigcup _{k=1}^K \textbf{y}^k \right) - \sum _{k=1}^K \mathcal {C}(\textbf{x}^k, \textbf{y}^k)\nonumber \\{} & {} \quad \lesssim \mathcal {C}(\textbf{x}^0, \textbf{y}^0) + \textsf{M}^p(\textbf{z}, \textbf{x}^0)+ \sum _{k=1}^K {\text {diam}}(\Omega _k)^p. \end{aligned}$$
    (5.2)
  • There exists \(\textsf{c}_{{\text {A5}}}\ge 0\) such that, for every \(\textbf{x}, \textbf{y}\subseteq (0,1)^d\), we have

    $$\begin{aligned} \mathcal {C}(\textbf{x}, \textbf{y}) \le \textsf{c}_{{\text {A5}}}\left( \min \left\{ |\textbf{x}|^{1-\frac{p}{d}}, |\textbf{y}|^{1-\frac{p}{d}} \right\} + \textsf{M}^p(\textbf{x},\textbf{y}) \right) .\end{aligned}$$
    (5.3)

Proof

The proof is similar to the proof of Lemma 4.3. We set \(\textbf{x}=(X_i)_{i=1}^{n}\) and \(\textbf{y}=(Y_j)_{j=1}^{n}\). Let \(\eta \in (0,1/2)\) and consider two independent copies \(\mathcal {N}^{(1-\eta )n\rho }\) and \(\mathcal {M}^{(1-\eta )n\rho }\) of Poisson point processes with intensity \((1-\eta )n\rho \) on \(\Omega \). We claim that

$$\begin{aligned} \limsup _{n\rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) \right] - \limsup _{n\rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{(1-\eta )n\rho }, \mathcal {M}^{(1-\eta )n \rho } \right) \right] \lesssim \eta ^{1-\frac{p}{d}}.\nonumber \\ \end{aligned}$$
(5.4)

By Theorem 4.1, this would conclude the proof of (5.1) since \(\eta \) is arbitrary. We introduce the random variables \(N=\max \left\{ n-|\mathcal {N}^{(1-\eta )n\rho }|,0 \right\} \) and \(M=\max \left\{ n-|\mathcal {M}^{(1-\eta )n\rho }|,0 \right\} \) and notice that by the concentration properties of Poisson random variables, also N and M have the concentration property. Moreover, the event

$$\begin{aligned} A=\left\{ |N-\eta n|\le \eta n/2 \right\} \cap \left\{ |M- \eta n|\le \eta n/2 \right\} \end{aligned}$$

is of overwhelming large probability and thus arguing exactly as in the proof of Lemma 4.3 we have

$$\begin{aligned} \limsup _{n\rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) \right] =\limsup _{n\rightarrow \infty } n^{1-\frac{p}{d}} \mathbb {E}\left[ \mathcal {C}_\textsf{P}( \textbf{x}, \textbf{y}) I_A \right] . \end{aligned}$$

We let \(\mathcal {N}= (X_i)_{i=n-N+1}^n\) and \(\mathcal {M}=(Y_j)_{j=n-M+1}^n\) so that in A, \(\textbf{x}=\mathcal {N}^{(1-\eta )n\rho }\cup \mathcal {N}\), \(\textbf{y}=\mathcal {M}^{(1-\eta )n\rho }\cup \mathcal {M}\) and \(\min \left\{ |\mathcal {N}|,|\mathcal {M}| \right\} \gtrsim \eta n\).

In A we let \(\textbf{x}^1\subset \mathcal {N}^{(1-\eta )n\rho }\) and \(\textbf{y}^1\subset \mathcal {M}^{(1-\eta )n\rho }\) be such that \(|\textbf{x}^1|=|\textbf{y}^1|\) and

$$\begin{aligned} \mathcal {C}_\textsf{P}( \mathcal {N}^{(1-\eta )n\rho }, \mathcal {M}^{(1-\eta )n\rho } ) = \mathcal {C}_\textsf{P}(\textbf{x}^1, \textbf{y}^1 ). \end{aligned}$$

We then set \(\mathcal {U}=\mathcal {N}^{(1-\eta )n\rho }\backslash \textbf{x}^1\), \(\mathcal {V}=\mathcal {M}^{(1-\eta )n\rho }\backslash \textbf{y}^1\), \(\textbf{x}^0=\mathcal {U}\cup \mathcal {N}\) and \(\textbf{y}^0=\mathcal {V}\cup \mathcal {M}\). Using Lemma 2.12 on \(\Omega \) with \(K=1\), i.e. a trivial partition, we find that in A,

$$\begin{aligned}{} & {} \mathcal {C}_\textsf{P}(\textbf{x},\textbf{y})-\mathcal {C}_\textsf{P}( \mathcal {N}^{(1-\eta )n\rho }, \mathcal {M}^{(1-\eta )n\rho } )\lesssim \mathcal {C}_\textsf{P}(\textbf{x}^0,\textbf{y}^0) +1\\{} & {} \quad {\mathop {\lesssim }\limits ^{(3.9)}} \min \left\{ |\textbf{x}^0|^{1-\frac{p}{d}}, |\textbf{y}^0|^{1-\frac{p}{d}} \right\} + \textsf{M}^p(\textbf{x}^0, \textbf{y}^0) + 1. \end{aligned}$$

Multiplying by \(I_A\), taking expectation and arguing exactly as in (4.11) (using in particular Proposition 6.3) we conclude the proof of (5.4).

With a similar argument one can prove that

$$\begin{aligned} \liminf _{n\rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \mathcal {N}^{n\rho }, \mathcal {M}^{n \rho } \right) \right] \le \liminf _{n\rightarrow \infty } n^{\frac{p}{d}-1} \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) \right] , \end{aligned}$$

which concludes the proof of Proposition 5.1. \(\square \)

To conclude the proof of Theorem 1.1, we prove a concentration bound, which improves (5.1) to complete convergence. The argument requires minimal assumptions on the combinatorial optimization problem and relies essentially on the validity of a Poincaré inequality.

Proposition 5.3

Let \(d \ge 3\), \(p\in [1,d)\) and let \(\textsf{P}= (\mathcal {F}_{n,n})_{n \in \mathbb {N}}\) be a combinatorial optimization problem over complete bipartite graphs such that assumptions A3 and A5 hold. Let \(\Omega \subseteq \mathbb {R}^d\) be a bounded domain with Lipschitz boundary and let \(\rho \) be a probability density on \(\Omega \), uniformly strictly positive and bounded from above. Let \((X_i)_{i=1}^\infty \), \((Y_j)_{j=1}^\infty \) be i.i.d. random variables with common law \(\rho \).

For every \(q \ge 2\) and \(\varepsilon >0\),

$$\begin{aligned} \mathbb {P}\left( n^{\frac{p}{d}-1}\left| \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) - \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) \right] \right| > \varepsilon \right) \le _q \frac{1}{ \varepsilon ^q n^{\frac{\alpha q}{2}}}, \end{aligned}$$
(5.5)

with

$$\begin{aligned} \alpha = {\left\{ \begin{array}{ll} 1-2/d &{} \text {if}\,p \in [1,2),\\ 1-p/d &{} \text {if}\,p \ge 2. \end{array}\right. } \end{aligned}$$

In particular, complete (hence \(\mathbb {P}\)-a.s.) convergence holds:

$$\begin{aligned} \lim _{n \rightarrow \infty } n^{\frac{p}{d}-1}\left| \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) - \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) \right] \right| = 0. \end{aligned}$$

Remark 5.4

(Poincaré inequality) We first recall that for every Lipschitz function \(F: \Omega ^{2n} \rightarrow \mathbb {R}\) we have the following \(L^q\)-Poincaré inequality,

$$\begin{aligned}{} & {} \mathbb {E}\left[ \left| F\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) - \mathbb {E}\left[ F\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) \right] \right| ^q \right] \nonumber \\ {}{} & {} \lesssim _q \mathbb {E}\left[ \left| \nabla F \left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) \right| ^q \right] . \end{aligned}$$
(5.6)

Here \(\left| \nabla F \right| \) denotes the usual Euclidean norm of the gradient. We stress the fact that the implicit constant in (5.6) does not depend upon n.

Inequality (5.6) is a consequence of well-known facts: first, the assumptions on \(\Omega \) yield the \(L^2\)-Poincaré inequality with respect to the uniform measure,

$$\begin{aligned} \int _{\Omega } \left| u - \frac{1}{|\Omega |}\int _\Omega u \right| ^2 \lesssim \int _\Omega |\nabla u|^2. \end{aligned}$$

Using that the constant \(c = \int _{\Omega } u \rho \) minimizes \(\int _{\Omega } \left| u - c \right| ^2 \rho \) and that \(\rho \) is bounded from above and below, we obtain the weighted version

$$\begin{aligned} \int _{\Omega } \left| u - \int _\Omega u \rho \right| ^2 \rho \le \int _{\Omega } \left| u - \frac{1}{|\Omega |}\int _\Omega u \right| ^2\rho \le C \int _\Omega |\nabla u|^2 \rho . \end{aligned}$$

for some \(C = C(\rho , \Omega ) \in (0, \infty )\). Then, a standard tensorization argument [36, Corollary 5.7] entails that the inequality holds also on the product space \(\Omega ^{2 n}\), endowed with the product measure \(\rho ^{\otimes 2n}\), with the same constant C. This yields (5.6) with \(q=2\).

The general case \(q\ge 2\) follows finally from the chain rule. Preliminarily, we notice that if \(\mu \) is a probability measure on \(\mathbb {R}^D\), then the validity of the inequality

$$\begin{aligned} \int \left| u - \int u d \mu \right| ^q d \mu \lesssim \int \left| \nabla u \right| ^qd \mu \end{aligned}$$
(5.7)

for every Lipschitz function \(u: \mathbb {R}^D \rightarrow \mathbb {R}\) is equivalent to

$$\begin{aligned} \int \left| u - m_u \right| ^q d \mu \lesssim \int \left| \nabla u \right| ^qd \mu , \end{aligned}$$
(5.8)

where \(m_u\) denotes a median of (the law) of u, i.e. any \(m \in \mathbb {R}\) such that \(\mu ( u \le m) \ge 1/2\) and \(\mu ( u \ge m) \ge 1/2\). Indeed,

$$\begin{aligned} \left| m_u - \int u d \mu \right| \le \left| \int (m_u - u) d \mu \right| \le \int \left| m_u - u \right| d \mu . \end{aligned}$$

Since \(m_u\) can be characterized as a minimizer for \(c \mapsto \int \left| u - c \right| d\mu \), we also have

$$\begin{aligned} \int \left| m_u - u \right| d \mu \le \int \left| u - \int u d \mu \right| d \mu . \end{aligned}$$

Using Jensen’s inequality, we obtain

$$\begin{aligned} \left| m_u - \int u d \mu \right| ^q \le \min \left\{ \int \left| m_u - u \right| ^q d \mu , \int \left| u - \int u d \mu \right| ^q d \mu . \right\} \end{aligned}$$

Then, assuming that (5.7) or (5.8) holds, using the triangle inequality and the bound above, we obtain the validity of the other inequality.

To conclude, we assume that (5.8) holds for \(q=2\) and argue that it also holds for any \(q \ge 2\). Up to adding a suitable constant, we can assume that u is Lipschitz with \(m_u = 0\). We then consider the Lipschitz function \(v = \left| u \right| ^{q/2} {\text {sign}}(u)\) (recall that in our case the support of \(\mu = \rho ^{\otimes 2n}\) is bounded, hence we can assume that also u is bounded), so that \(m_v = 0\) and apply the \(q=2\) case of (5.8):

$$\begin{aligned} \begin{aligned} \int \left| u \right| ^q d \mu&= \int \left| v \right| ^2 d \mu \lesssim \int |\nabla v|^2 d \mu \lesssim \int \left| u \right| ^{q-2} |\nabla u|^2 d \mu \\&\lesssim \left( \int \left| u \right| ^q d \mu \right) ^{1-2/q} \left( \int \left| \nabla u \right| ^q d \mu \right) ^{2/q}. \end{aligned} \end{aligned}$$

Dividing both sides by \(\left( \int \left| u \right| ^q d \mu \right) ^{1-2/q}\) yields the desired conclusion.

Proof of Proposition 5.3

The second statement follows choosing q sufficiently large in (5.5) so that the right-hand side in (5.5) is summable. We thus focus on the proof of (5.5). Given a feasible \(G \subseteq \mathcal {K}_{n,n}\), i.e., \(G \in \mathcal {F}_{n,n}\), and \(\textbf{x}= (x_i)_{i=1}^n\), \(\textbf{y}= (y_j)_{j=1}^n \subseteq \Omega \), write

$$\begin{aligned} w_G(\textbf{x}, \textbf{y}) = \sum _{ \left\{ (1,i), (2,j) \right\} \in E_G } | x_i - y_j|^p. \end{aligned}$$

Since \(p \ge 1\), \(w_G\) is Lipschitz with a.e. derivative given by

$$\begin{aligned} \nabla _{x_i} w_G(\textbf{x}, \textbf{y}) = \sum _{ (2,j) \in \mathcal {N}_G((1,i))} p |x_i-y_j|^{p-2} (x_i-y_j), \end{aligned}$$

and

$$\begin{aligned} \nabla _{y_j} w_G(\textbf{x}, \textbf{y}) = -\sum _{ (1,i) \in \mathcal {N}_G((1,j))} p |x_i-y_j|^{p-2} (x_i-y_j). \end{aligned}$$

Notice also that \(w_G\) is differentiable at every \((\textbf{x},\textbf{y})\) such that \(x_i\ne y_j\) for every ij. Since \(G \in \mathcal {F}_{n,n}\), assumption A3 yields that the sums above contain at most \(\textsf{c}_3\) terms, hence we bound, using Cauchy-Schwarz inequality,

$$\begin{aligned} \left| \nabla _{x_i} w_G(\textbf{x}, \textbf{y}) \right| ^2 \lesssim \sum _{ (2,j) \in \mathcal {N}_G((1,i))} |x_i-y_j|^{2(p-1)}, \end{aligned}$$

and similarly

$$\begin{aligned} \left| \nabla _{y_j} w_G(\textbf{x}, \textbf{y}) \right| ^2 \lesssim \sum _{ (2,j) \in \mathcal {N}_G((1,i))} |x_i-y_j|^{2(p-1)}. \end{aligned}$$

Summing upon i and \(j\in \left\{ 1,\ldots , n \right\} \), we obtain, for the Euclidean norm of the gradient, the inequality

$$\begin{aligned} \left| \nabla w_G(\textbf{x}, \textbf{y}) \right| ^2 \lesssim \sum _{ \left\{ (1,i),(2,j) \right\} \in E_G} |x_i-y_j|^{2(p-1)}. \end{aligned}$$

If \(p \ge 2\), we simply bound each term \(|x_i-y_j|^{2(p-1)} \le {\text {diam}}(\Omega )^{p-2} |x_i-y_j|^{p}\), obtaining

$$\begin{aligned} \left| \nabla w_G(\textbf{x}, \textbf{y}) \right| ^2 \lesssim \sum _{ \left\{ (1,i),(2,j) \right\} \in E_G} |x_i-y_j|^{p} = w_G(\textbf{x}, \textbf{y}). \end{aligned}$$

If \(p \in [1,2)\), we use Hölder inequality and the fact that \(|E_G| \lesssim n\) (again by assumption A3), to obtain

$$\begin{aligned} \left| \nabla w_G(\textbf{x}, \textbf{y}) \right| ^2 \lesssim \left( \sum _{ \left\{ (1,i),(2,j) \right\} \in E_G} |x_i-y_j|^{p} \right) ^{\frac{1}{r}} n^{1-\frac{1}{r}} = w_G(\textbf{x}, \textbf{y})^{\frac{1}{r}} n^{1-\frac{1}{r}}, \end{aligned}$$

with \(r = p/(2(p-1))\).

Using the trivial bound \(w_G(\textbf{x},\textbf{y}) \lesssim n\), it follows in particular that each \(w_G(\textbf{x}, \textbf{y})\) has a Lipschitz constant bounded independently of G (although the bound depends upon n). Therefore, also

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) = \inf _{G \in \mathcal {F}_{n,n}} w_G(\textbf{x}, \textbf{y}), \end{aligned}$$

is Lipschitz, hence differentiable at Lebesgue a.e. \((\textbf{x}, \textbf{y})\), by Rademacher theorem. Let \((\textbf{x},\textbf{y})\) be a point of differentiability for both \(w_G\) and \(\mathcal {C}_{\textsf{P}}^p\) (which holds for Lebesgue a.e. point). Let \(G = G(\textbf{x}, \textbf{y}) \in \mathcal {F}_{n,m}\) be any minimizer for the problem on the graph \(\mathcal {K}(\textbf{x},\textbf{y})\) (which is a.e. unique if \(p>1\) by Remark 5.5). For every \((\textbf{x}', \textbf{y}')\), we have the inequality

$$\begin{aligned} \mathcal {C}_{\textsf{P}}^p(\textbf{x}',\textbf{y}')\le w_G(\textbf{x}', \textbf{y}'), \end{aligned}$$

with equality at \((\textbf{x},\textbf{y})\), hence we obtain the identities,

$$\begin{aligned} \nabla _{x_i} \mathcal {C}_{\textsf{P}}^p(\textbf{x},\textbf{y})= \nabla _{x_i} w_G(\textbf{x}, \textbf{y}), \quad \nabla _{y_j} \mathcal {C}_{\textsf{P}}^p(\textbf{x},\textbf{y})= \nabla _{y_j} w_G(\textbf{x}, \textbf{y}). \end{aligned}$$
(5.9)

Therefore,

$$\begin{aligned} \left| \nabla \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) \right| ^2 \lesssim {\left\{ \begin{array}{ll} \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) ^{\frac{1}{r}} n^{1-\frac{1}{r}} &{} \text {if}\,p \in [1,2),\\ \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) &{} \text {if}\,p \ge 2. \end{array}\right. } \end{aligned}$$

If now \(\textbf{x}=(X_i)_{i=1}^n\) and \(\textbf{y}=(Y_j)_{j=1}^n\), combining this with (5.6) and (3.9) yields

$$\begin{aligned} \begin{aligned} \mathbb {E}&\left[ \left| \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) - \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) \right] \right| ^q \right] \lesssim _q \mathbb {E}\left[ \left| \nabla \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) \right| ^{q} \right] \\&\qquad \qquad \qquad \qquad \lesssim {\left\{ \begin{array}{ll} \vspace{1em} \mathbb {E}\left[ \left( n^{1-\frac{p}{d}}+ \textsf{M}^p\left( \textbf{x}, \textbf{y} \right) \right) ^{\frac{q}{2r}} \right] n^{(1-\frac{1}{r})\frac{q}{2}} &{} \text {if}\,p \in [1,2),\\ \mathbb {E}\left[ \left( n^{1-\frac{p}{d}}+ \textsf{M}^p\left( \textbf{x}, \textbf{y} \right) \right) ^{\frac{q}{2}} \right] &{} \text {if}\, p\ge 2. \end{array}\right. } \end{aligned} \end{aligned}$$

By the equivalence between \(\textsf{M}^p\) and \(\textsf{W}^p\) (recall (3.3)), the triangle inequality (2.18) and (2.20) and finally using (6.2) with qp instead of p, we bound from above

$$\begin{aligned} \mathbb {E}\left[ \left( \textsf{M}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^n \right) \right) ^{\frac{q}{2}} \right] \lesssim n^{(1-\frac{p}{d})\frac{q}{2}}. \end{aligned}$$

If \(p \ge 2\), we conclude at once that

$$\begin{aligned} \mathbb {E}\left[ \left| \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) - \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) \right] \right| ^q \right] \lesssim _q n^{(1-\frac{p}{d})\frac{q}{2}}, \end{aligned}$$

hence (5.5) by Markov inequality. If \(p \in [1,2)\), we bound similarly and obtain, after simple computations,

$$\begin{aligned} \mathbb {E}\left[ \left| \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) - \mathbb {E}\left[ \mathcal {C}_{\textsf{P}}^p\left( \textbf{x}, \textbf{y} \right) \right] \right| ^q \right] \lesssim n^{\left( (1-\frac{p}{d})(1-\frac{1}{p})+\frac{1}{p}-\frac{1}{2} \right) q}, \end{aligned}$$

which leads to the corresponding case of (5.5) by Markov inequality. \(\square \)

Remark 5.5

(uniqueness of minimizers) If \(p>1\), for Lebesgue a.e. \((\textbf{x}, \textbf{y})\), the minimizer \(G \in \mathcal {F}_{n,m}\) for the problem on \(\mathcal {K}(\textbf{x}, \textbf{y})\) is unique. This in particular yields that it is unique a.s., when \(\textbf{x}= (X_i)_{i=1}^n\), \(\textbf{y}= (Y_j)_{j=1}^m\) are random i.i.d. with a common density \(\rho \). For simplicity, we argue in the case of \(|\textbf{x}| = |\textbf{y}|\) only, but the same result holds in general.

Let \((\textbf{x}, \textbf{y})\) be a differentiability point for \(\mathcal {C}_{\textsf{P}}^p(\textbf{x}, \textbf{y})\) with \(X_i\ne Y_j\) for every ij. Notice that by the previous proof this holds a.s.Ŀet \(G, G' \in \mathcal {F}_{n,n}\) be minimizers for the problem on \(\mathcal {K}(\textbf{x}, \textbf{y})\), so that by (5.9) we obtain that, for every \(i\in [n]\), \(\nabla _{x_i} w_G (\textbf{x}, \textbf{y}) = \nabla _{x_i} w_{G'} (\textbf{x}, \textbf{y})\), i.e.,

$$\begin{aligned} \sum _{ (2,j) \in \mathcal {N}_G((1,i))} \left| x_i - y_j \right| ^{p-2} (x_i-y_j) = \sum _{ (2,j) \in \mathcal {N}_{G'}((1,i))} \left| x_i - y_j \right| ^{p-2} (x_i-y_j) \end{aligned}$$

Assuming that \(E_G \ne E_{G'}\), we can find i, \(j \in [n]\) such that \((2,j) \in \mathcal {N}_{G'}((1,i)) \setminus \mathcal {N}_G((1,i))\) (up to exchanging the roles of G and \(G'\)). Then,

$$\begin{aligned} \begin{aligned} \left| x_i-y_j \right| ^{p-2}(x_i-y_j)=&\sum _{(2,k) \in \mathcal {N}_{G}((1,i)} \left| x_i-y_k \right| ^{p-2}(x_i-y_k) \\&- \sum _{ (2,k)\in \mathcal {N}_{G'}((1,i)) \setminus \left\{ (2,j) \right\} } \left| x_i-y_k \right| ^{p-2}(x_i-y_k). \end{aligned} \end{aligned}$$
(5.10)

We notice that the right-hand side above is a function \(U(\textbf{x}, \textbf{y})\) which however does not depend on the variable \(y_j\). The map

$$\begin{aligned} z \in \mathbb {R}^d \mapsto \left| z \right| ^{p-2} z \in \mathbb {R}^d \end{aligned}$$

is invertible, with a Borel inverse which we denote by f, hence we can rewrite (5.10) equivalently as the identity

$$\begin{aligned} y_j = x_i - f\left( U (\textbf{x}, \textbf{y}) \right) , \end{aligned}$$

where right-hand side is a Borel function of \((\textbf{x}, \textbf{y})\) which does not depend on \(y_j\). This identity however cannot hold on a set of positive Lebesgue measure.

6 Bounds for the Euclidean assignment problem

In this section we establish some novel upper bounds for the random Euclidean assignment problem, in the case of not necessarily i.i.d. uniformly distributed points.

6.1 Matching of i.i.d. points

We begin with a general upper bound for the Wasserstein distance between the empirical measure of i.i.d. points and the corresponding common law when \(d\ge 3\) and \(p\ge 1\). As a consequence, we also obtain a similar bound for the Euclidean assignment problem. We derive the general case of a Hölder continuous law bounded above and below on an open connected set with Lipschitz boundary \(\Omega \) from the case of the uniform law on a cube \(Q \subseteq \mathbb {R}^d\). In that case, it is a well-known result, marginally discussed in [1], where the focus is on the \(d=2\) case. However, we point out that the case \(d\ge 3\), \(p\ge d/2\) was, to our knowledge, not explicitly covered in the literature until the proof provided by [35], which clearly extends to any \(p\ne 2\) (see also [25]).

Proposition 6.1

Let \(d \ge 3\), \(p \ge 1\) and \(\Omega \) be a bounded connected open set with Lipschitz boundary. For every Hölder continuous density \(\rho : \Omega \mapsto \mathbb {R}\) bounded above and below and independent sequences \((X_{i})_{i=1}^\infty \), \((Y_j)_{j=1}^\infty \) of i.i.d. random variables with common law \(\rho \),

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( \sum _{i=1}^n \delta _{X_i}, n\rho \right) \right] \lesssim |\Omega |^{\frac{p}{d}} n^{1-\frac{p}{d}}, \end{aligned}$$
(6.1)

and therefore

$$\begin{aligned} \mathbb {E}\left[ \textsf{M}^p\left( (X_i)_{i=1}^n, (Y_j)_{j=1}^m \right) \right] \lesssim |\Omega |^{\frac{p}{d}} \min \left\{ n,m \right\} ^{1-\frac{p}{d}}. \end{aligned}$$
(6.2)

Proof

Inequality (6.2) follows from (6.1) assuming e.g. \(n \le m\) and (3.4) with \(\lambda = \rho \). Hence, we focus on the proof of (6.1). By Jensen inequality (2.20), it is enough to prove this bound for large p so that we may assume without loss of generality that \(p>d/(d-1)\). We then set \(\mu = \frac{1}{n} \sum _{i=1}^n \delta _{X_i}\).

We first prove the statement in the case \(\Omega =Q\) is a cube. By scaling we may assume that \(Q=(0,1)^d\) is the unit cube. By Proposition 2.11, there is a bi-Lipschitz map \(T: Q\mapsto Q\) with Lipschitz constant depending only on \(\rho \) such that \(T\sharp \rho =1\). Then, \(X_i'=T(X_i)\) are i.i.d. uniformly distributed on Q Letting \(\mu '=T\sharp \mu =\frac{1}{n}\sum _{i=1}^n \delta _{X'_i}\) we have

$$\begin{aligned} \textsf{W}^p(\mu , \rho )\lesssim \textsf{W}^p(\mu ',1) \end{aligned}$$

and the statement follows from [35].

Consider now \(\Omega \) a general bounded connected open set with Lipschitz boundary. We say that \(\Omega \) is well-partitioned if there exists convex polytopes \((\Omega _k)_{k=1}^K\) covering \(\Omega \), with \(|\Omega _k\cap \Omega _{k'}|=0\) for \(k\ne k'\) and such that each \(\Omega _k\) is bi-Lipschitz homeomorphic to a cube. By [49], every connected and Lipschitz domain is bi-Lipschitz homeomorphic to a well-partitioned and smooth domain so that arguing exactly as above we may assume that \(\Omega \) itself is smooth and well-partitioned. Let \(T_k: \Omega _k\mapsto Q_k\) be Lipschitz homeomorphisms between \(\Omega _k\) and some cubes \(Q_k\). We then define \(\rho _k=T_k\sharp \rho \), \(n_k=\mu (\Omega _k)\) and \(\mu _k= \frac{n}{n_k}T_k\sharp \mu \). Notice in particular that we may write \(\mu _k=\frac{1}{n_k}\sum _{i=1}^{n_k} \delta _{Y_i}\) where \((Y_i)_{i=1}^\infty \) are i.i.d. with common law \(\rho _k/\rho _k(Q_k)\) and that \(n_k\) is a Binomial random variable with parameters n and \(\rho _k(Q_k)=\rho (\Omega _k)\). Using (2.21) with \(\varepsilon =1\) we thus find

$$\begin{aligned} \begin{aligned} \textsf{W}^p(\mu ,\rho )&\lesssim \sum _{k=1}^K \textsf{W}^p_{\Omega _k}\left( \mu , \frac{n_k}{n\rho (\Omega _k)}\rho \right) +\textsf{W}^p\left( \sum _{k=1}^K \frac{n_k}{n \rho (\Omega _k)} \rho I_{\Omega _k},\rho \right) \\&{\mathop {\lesssim }\limits ^{(2.22)}} \sum _{k=1}^K \frac{n_k}{n}\textsf{W}^p_{Q_k}\left( \mu _k, \rho _k \right) +\left\| \sum _{k=1}^K \left( \frac{n_k}{n \rho (\Omega _k)}-1 \right) I_{\Omega _k} \rho \right\| _{W^{-1,p}(\Omega )}^p\\&{\mathop {\lesssim }\limits ^{(2.13)}}\sum _{k=1}^K \frac{n_k}{n}\textsf{W}^p_{Q_k}\left( \mu _k, \rho _k \right) +\sum _{k=1}^K |\Omega _k| \left|\frac{n_k}{n \rho (\Omega _k)}-1\right|^p. \end{aligned}\end{aligned}$$

Taking the expectation and using the concentration properties of binomial random variables (2.28) we find

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p(\mu ,\rho ) \right] \lesssim \sum _{k=1}^K\mathbb {E}\left[ \frac{n_k}{n}\textsf{W}^p_{Q_k}\left( \mu _k, \rho _k \right) \right] + \frac{1}{n^{\frac{p}{2}}}. \end{aligned}$$

By the first part of the proof and the concentration properties of Binomial random variables we get

$$\begin{aligned} \mathbb {E}\left[ \frac{n_k}{n}\textsf{W}^p_{Q_k}\left( \mu _k, \rho _k \right) \right] \lesssim \frac{1}{n^{\frac{p}{d}}} \end{aligned}$$

which concludes the proof of (6.2) since \(p/2>p/d\). \(\square \)

Remark 6.2

By translation and scaling invariance, when \(\Omega =Q\) and \(\rho = \frac{1}{|\Omega |} I_{Q}\) is the uniform measure of a cube \(Q\subset \mathbb {R}^d\), the implicit constant in (6.2) does not depend on Q.

6.2 Matching with a fraction of i.i.d. points

In this section we extend the bound (6.2) for the matching to the case where most of the points are still i.i.d. but essentially no assumption is made on the remaining points. This is used in Theorem 4.1 and in the de-Poissonization procedure (see Proposition 5.1). Just like in Theorem 4.1 we will have to consider three different situations. Let us however set some common notation. Letting \(\mathcal {N}\), \(\mathcal {M}\), \(\mathcal {U}\) and \(\mathcal {V}\) be point processes on \(\Omega \) (\(\mathcal {N}\) and \(\mathcal {M}\) will contain the i.i.d. points), we want to estimate

$$\begin{aligned} \mathbb {E}\left[ \textsf{M}^p( \mathcal {U}\cup \mathcal {N}, \mathcal {V}\cup \mathcal {M}) \right] . \end{aligned}$$

Setting

$$\begin{aligned}Z=\min \left\{ |\mathcal {U}|+|\mathcal {N}|,|\mathcal {V}|+|\mathcal {M}| \right\} , \end{aligned}$$

we want to construct two (random) subsets \(\mathcal {S}\subseteq \mathcal {U}\cup \mathcal {N}\), \(\mathcal {T}\subseteq \mathcal {V}\cup \mathcal {M}\), both containing Z points, so that

$$\begin{aligned} \textsf{M}^p\left( \mathcal {U}\cup \mathcal {N}, \mathcal {V}\cup \mathcal {M} \right) \le \textsf{W}^p\left( \mu ^{\mathcal {S}}, \nu ^{\mathcal {T}} \right) {\mathop {\lesssim }\limits ^{(2.18)}} \textsf{W}^p\left( \mu ^{\mathcal {S}}, Z\rho \right) +\textsf{W}^p\left( \nu ^{\mathcal {T}}, Z\rho \right) , \nonumber \\ \end{aligned}$$
(6.3)

where \(\mu ^{\mathcal {S}}, \mu ^{\mathcal {T}}\) are the associated empirical measures. We then separately estimate the two terms on the right-hand side of (6.3). Since the construction is completely symmetric, we detail it only for \(\mathcal {S}\subseteq \mathcal {U}\cup \mathcal {N}\). It is given as the union of two sets, a “good” set \(\mathcal {G}\subseteq \mathcal {N}\) and a “bad” set \(\mathcal {B}\subseteq \mathcal {U}\). We first define the set \(\mathcal {G}\) by sampling without replacement

$$\begin{aligned} |\mathcal {G}|=\min \left\{ |\mathcal {N}|,Z \right\} \end{aligned}$$

points from \(\mathcal {N}\). Similarly, the set \(\mathcal {B}\) is constructed by sampling without replacement

$$\begin{aligned} |\mathcal {B}|=\max \left\{ Z-|\mathcal {N}|,0 \right\} \end{aligned}$$

points from \(\mathcal {U}\). Notice that

$$\begin{aligned} Z=|\mathcal {G}|+|\mathcal {B}| \end{aligned}$$
(6.4)

and that when conditioned on \(|\mathcal {G}|\), the points in \(\mathcal {G}\) are still i.i.d. with common law \(\rho \). We then write \(\mu ^{\mathcal {S}} = \mu ^{\mathcal {G}} + \mu ^{\mathcal {B}}\) for the associated empirical measure. Using the triangle inequality (2.18) and (2.19), we then split the estimate in two:

$$\begin{aligned} \begin{aligned} \textsf{W}^p(\mu ^{\mathcal {S}}, Z\rho )&\lesssim \textsf{W}^p\left( \mu ^{\mathcal {G}} + \mu ^{\mathcal {B}}, |\mathcal {G}|\rho + \mu ^{\mathcal {B}} \right) + \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) \\&\lesssim \textsf{W}^p\left( \mu ^{\mathcal {G}},|\mathcal {G}|\rho \right) + \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) . \end{aligned} \end{aligned}$$

Taking expectation we find

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p(\mu ^{\mathcal {S}}, Z\rho ) \right] \lesssim \mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {G}},|\mathcal {G}|\rho \right) \right] + \mathbb {E}\left[ \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) \right] . \end{aligned}$$
(6.5)

To estimate the first term in the right-hand side, we will rely on (6.1). It is in the estimate of the last term that we need to argue differently depending on the cases. In the first one (see Proposition 6.3), since we have a good control on the moments of \(|\mathcal {U}|\) we can directly appeal to Proposition 2.9. In the two other cases (see Proposition 6.4 and theorem 6.5) we need to combine it with a localization argument.

We start with the first case.

Proposition 6.3

Let \(d \ge 3\), \(p \ge 1\) and \(\Omega \) be a bounded domain with Lipschitz boundary, \(\rho : \Omega \mapsto \mathbb {R}\) be a Hölder continuous density bounded above and below and \((X_{i})_{i=1}^\infty \), \((Y_j)_{j=1}^\infty \) be independent sequences of i.i.d. random variables with common law \(\rho \).

Then, there exists \(\alpha =\alpha (p,d)<2\) and \(\beta =\beta (p)>0\) such that the following holds. Let \(M, N\in \mathbb {N}\) be random variables satisfying concentration (recall Definition 2.13) and set \(\mathcal {N}=(X_{i})_{i=1}^N\), \(\mathcal {M}= (Y_{j})_{j=1}^M\) and \(h=\min \left\{ \mathbb {E}\left[ M \right] ,\mathbb {E}\left[ N \right] \right\} \). Then, for every point processes \(\mathcal {U}\), \(\mathcal {V}\), for which there exists \(1\le H\le h\) such that for every \(q\ge 1\)

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}|^q+|\mathcal {V}|^q \right] \le C(q) H^q \end{aligned}$$
(6.6)

for some \(C(q)>0\), we have

$$\begin{aligned} \mathbb {E}\left[ \textsf{M}^p( \mathcal {U}\cup \mathcal {N}, \mathcal {V}\cup \mathcal {M}) \right] \lesssim h^{1-\frac{p}{d}}\left( 1+ \left( \frac{H^\alpha }{h} \right) ^\beta \right) . \end{aligned}$$

Here the implicit constant depends only on p, d, the constants involved in the concentration properties of MN and \((C(q))_{q\ge 1}\) from (6.6).

Proof

Starting from (6.3) and (6.5) we first estimate by (6.2) and Hölder inequality,

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {G}},|\mathcal {G}|\rho \right) \right] \lesssim \mathbb {E}\left[ |\mathcal {G}|^{1-\frac{p}{d}} \right] \le \mathbb {E}\left[ |\mathcal {G}| \right] ^{1-\frac{p}{d}}. \end{aligned}$$

Since \(|\mathcal {G}|\le \min \left\{ M,N \right\} +|\mathcal {V}|\), by (6.6) with \(q=1\) and \(H\le h\), we have \(\mathbb {E}\left[ |\mathcal {G}| \right] \lesssim h\) and thus

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {G}},|\mathcal {G}|\rho \right) \right] \lesssim h^{1-\frac{p}{d}}. \end{aligned}$$

We are then left with the proof of

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) \right] \lesssim h^{1-\frac{p}{d}} \left( \frac{H^\alpha }{h} \right) ^\beta . \end{aligned}$$
(6.7)

We first single out the event

$$\begin{aligned} A=\left\{ |\mathcal {G}|\ge h/2 \right\} \end{aligned}$$

and claim that for \(q\ge 1\)

$$\begin{aligned} \mathbb {P}\left[ A^c \right] \lesssim _q h^{-q}. \end{aligned}$$
(6.8)

Indeed, since \(A^c\subset \{N\le \mathbb {E}\left[ N \right] /2\}\cup \{M\le \mathbb {E}\left[ M \right] /2\}\), (6.8) follows by combining a union bound together with the concentration properties of M and N. Since

$$\begin{aligned} \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) \le \textsf{W}^p\left( \mu ^{\mathcal {B}}, |\mathcal {B}|\rho \right) \lesssim |\mathcal {B}|\le |\mathcal {U}|, \end{aligned}$$

we then find

$$ \begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) I_{A^c} \right] \lesssim \mathbb {E}\left[ |\mathcal {U}|^2 \right] ^{\frac{1}{2}} \mathbb {P}\left[ A^c \right] ^{\frac{1}{2}}{\mathop {\lesssim _q}\limits ^{(6.6) \& (6.8)}} h^{-q} H. \end{aligned}$$

By taking q large enough, in order to prove (6.7) it is therefore sufficient to show

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) I_A \right] \lesssim h^{1-\frac{p}{d}} \left( \frac{H^\alpha }{h} \right) ^\beta . \end{aligned}$$
(6.9)

We start with the case \(p>d/(d-1)\). By Proposition 2.9,

$$\begin{aligned}{} & {} \mathbb {E}\left[ \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) I_A \right] \\{} & {} \quad \lesssim \mathbb {E}\left[ \frac{|\mathcal {B}|^{1+\frac{p}{d}}}{|\mathcal {G}|^{\frac{p}{d}}} I_A \right] \lesssim h^{-\frac{p}{d}}\mathbb {E}\left[ |\mathcal {U}|^{1+\frac{p}{d}} I_A \right] {\mathop {\lesssim }\limits ^{(6.6)}} h^{1-\frac{p}{d}} \frac{H^{1+\frac{p}{d}}}{h}. \end{aligned}$$

This proves (6.9) in this case with \(\alpha =1+p/d\) and \(\beta =1\).

If now \(p<d/(d-1)<2\), we use Jensen’s inequality (2.20) to obtain

$$\begin{aligned}{} & {} \mathbb {E}\left[ \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) I_A \right] \lesssim \mathbb {E}\left[ Z^{1-\frac{p}{2}} \left( \textsf{W}^2\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) I_A \right) ^{\frac{p}{2}} \right] \\{} & {} \quad \le \mathbb {E}\left[ Z \right] ^{1-\frac{p}{2}} \mathbb {E}\left[ \textsf{W}^2\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) I_A \right] ^{\frac{p}{2}}. \end{aligned}$$

Recalling (6.4) we find \(\mathbb {E}\left[ Z \right] \lesssim h+H\lesssim h\). Using finally (6.9) with \(p=2\) we conclude that

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( |\mathcal {G}|\rho + \mu ^{\mathcal {B}}, Z\rho \right) I_A \right] \lesssim h^{1-\frac{p}{2}} \left( h^{1-\frac{2}{d}} \left( \frac{H^\alpha }{h} \right) ^\beta \right) ^{\frac{p}{2}}= h^{1-\frac{p}{d}} \left( \frac{H^\alpha }{h} \right) ^{\beta \frac{p}{2}}. \end{aligned}$$

This proves (6.9) also in this case. \(\square \)

We now consider the case when the moment bounds for \(\mathcal {U}\) and \(\mathcal {V}\) are only valid after restricting on a Whitney-type decomposition from Lemma 2.1.

Proposition 6.4

Let \(d \ge 3\), \(p \ge 1\) and \(\Omega \subseteq \mathbb {R}^d\) be a bounded connected open set with Lipschitz boundary and such that (2.14) holds. Fix a Whitney partition \(\mathcal {Q}= (Q_i)_i\), and for \(\delta >0\) let \((\Omega _k)_{k=1}^K = \mathcal {Q}_\delta \cup \mathcal {R}_\delta \) be given by Lemma 2.1. Let finally \(\rho \) be a Hölder continuous probability density on \(\Omega \), bounded above and below.

Then, there exist \(\alpha =\alpha (p,d)>0\) and \(\beta =\beta (p,d) >0\) such that the following holds. For every \(\eta \in (0,1)\), \(\varepsilon >0\) and \(\gamma \in ( 0, 1/d)\), there exists \(C(\eta ,\varepsilon ,\gamma )\) such that for every Poisson point processes \(\mathcal {N}^{\eta n \rho }, \mathcal {M}^{\eta n \rho }\) with intensity \(\eta n\rho \) and every point processes \(\mathcal {U}\) and \(\mathcal {V}\) on \(\Omega \) such that

$$\begin{aligned} \mathbb {E}\left[|\mathcal {U}_{\Omega _k}|^q+ |\mathcal {V}_{\Omega _k}|^q\right]\lesssim _q (n|\Omega _k|)^{\frac{q}{2}} \qquad \forall q>0, \end{aligned}$$
(6.10)

if \(\delta = n^{-\gamma }\) then

$$\begin{aligned}{} & {} n^{\frac{p}{d}-1}\mathbb {E}\left[ \textsf{M}^p( \mathcal {U}\cup \mathcal {N}^{\eta n\rho }, \mathcal {V}\cup \mathcal {M}^{\eta n\rho }) \right] \lesssim \eta ^{1-\frac{p}{d}}\\{} & {} \quad +C(\eta ,\varepsilon ,\gamma ) n^{\varepsilon }\left( \left( \max \left\{ n^{\frac{p}{d}}\delta ^{p+1},n^{\frac{2}{d}} \delta ^3 \right\} \right) ^{\alpha } +\left( n\delta ^d \right) ^{-\beta } \right) . \end{aligned}$$

Proof

Using the notation from the beginning of this section, we start as above from (6.3) and (6.5) and estimate by (6.2),

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {G}},|\mathcal {G}|\rho \right) \right] \lesssim \mathbb {E}\left[ |\mathcal {G}| \right] ^{1-\frac{p}{d}}. \end{aligned}$$

Since \(|\mathcal {G}|\le |\mathcal {N}^{n\eta \rho }|\) we get

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {G}},|\mathcal {G}|\rho \right) \right] \lesssim (\eta n)^{1-\frac{p}{d}}. \end{aligned}$$

In order to conclude the proof it is thus enough to show

$$\begin{aligned}{} & {} n^{\frac{p}{d}-1} \mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) \right] \nonumber \\ {}{} & {} \le C(\eta ,\varepsilon ,\gamma ) n^{\varepsilon }\left( \left( \max \left\{ n^{\frac{p}{d}}\delta ^{p+1},n^{\frac{2}{d}} \delta ^3 \right\} \right) ^{\alpha } +\left( n\delta ^d \right) ^{-\beta } \right) . \end{aligned}$$
(6.11)

Step 1. Reduction to a “good” event. We let

$$\begin{aligned} A=\{|\mathcal {G}|\in [\eta n/2, 3\eta n]\}\cap \bigcap _{k=1}^K \left\{ \max \left\{ |\mathcal {U}_{\Omega _k}|, |\mathcal {V}_{\Omega _k}| \right\} \le (n |\Omega _k|)^{\frac{1}{2}} \cdot n^{\varepsilon } \right\} . \end{aligned}$$

and claim that

$$\begin{aligned} n^{\frac{p}{d}-1}\mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) I_{A^c} \right] \le C(\eta , \varepsilon ,\gamma )\left( n\delta ^d \right) ^{-\beta }. \end{aligned}$$
(6.12)

We first prove that for every \(q>0\),

$$\begin{aligned} \mathbb {P}\left[ A^c \right] \lesssim _q C(\eta ) n^{-q} +\delta ^{1-d} n^{-\varepsilon q}. \end{aligned}$$
(6.13)

To prove this we use a union bound and split

$$\begin{aligned}{} & {} \mathbb {P}\left[ A^c \right] \le \mathbb {P}\left[ |\mathcal {G}|\notin [\eta n/2, 3\eta n] \right] \\{} & {} \quad +\sum _{k=1}^K \mathbb {P}\left[ |\mathcal {U}_{\Omega _k}|\ge (n |\Omega _k|)^{\frac{1}{2}} \cdot n^{\varepsilon } \right] + \sum _{k=1}^K \mathbb {P}\left[ |\mathcal {V}_{\Omega _k}|\ge (n |\Omega _k|)^{\frac{1}{2}} \cdot n^{\varepsilon } \right] . \end{aligned}$$

Regarding the first term we notice that

$$\begin{aligned}{} & {} \{|\mathcal {G}|\notin [\eta n/2, 3\eta n]\}\subset \{ |\mathcal {N}^{\eta n \rho }|< \eta n/2\}\cup \{ |\mathcal {N}^{\eta n \rho }|> 3\eta n\} \cup \{ |\mathcal {M}^{\eta n \rho }|< \eta n/2\}. \end{aligned}$$

Using once more a union bound and (2.27), we find

$$\begin{aligned} \mathbb {P}\left[ |\mathcal {G}|\notin [\eta n/2, 3\eta n] \right] \lesssim _q C(\eta ) n^{-q}. \end{aligned}$$

Regarding the two sums, by (6.10), we have for every \(k\in [1,K]\)

$$\begin{aligned} \mathbb {P}\left[ |\mathcal {U}_{\Omega _k}|\ge (n |\Omega _k|)^{\frac{1}{2}} \cdot n^{\varepsilon } \right] \lesssim _q n^{-\varepsilon q} \end{aligned}$$

and similarly for \(\mathcal {V}\). Since \(K\lesssim \delta ^{1-d}\) by (2.1) this concludes the proof of (6.13).

We now turn to (6.12). As above by the bound \(\textsf{W}^p\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) \lesssim |\mathcal {U}|\) and Cauchy-Schwarz, we have

$$\begin{aligned} \mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) I_{A^c} \right] \lesssim \mathbb {E}\left[ |\mathcal {U}|^2 \right] ^{\frac{1}{2}}\mathbb {P}\left[ A^c \right] ^{\frac{1}{2}}. \end{aligned}$$

Using once more Cauchy-Schwarz together with (6.10) with \(q=2\) we have

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}|^2 \right] ^{\frac{1}{2}}\lesssim K^{\frac{1}{2}} n^{\frac{1}{2}} \end{aligned}$$

so that by (6.13) and \(K\lesssim \delta ^{1-d}\)

$$\begin{aligned} n^{\frac{p}{d}-1} \mathbb {E}\left[ \textsf{W}^p\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) I_{A^c} \right] \lesssim _q C(\eta ) \delta ^{\frac{1}{2}(1-d)} (n^{-q} +\delta ^{1-d} n^{-\varepsilon q})^{\frac{1}{2}}n^{\frac{p}{d}-\frac{1}{2}}. \end{aligned}$$

Since \(\delta =n^{-\gamma }\), this concludes the proof of (6.12) provided we choose q large enough depending on \(\varepsilon \) and \(\gamma \).

In the remaining two steps we prove that in A,

$$\begin{aligned} n^{\frac{p}{d}-1}\textsf{W}^p\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) \lesssim _\eta n^{\varepsilon }\left( \left( \max \left\{ n^{\frac{p}{d}}\delta ^{p+1},n^{\frac{2}{d}} \delta ^3 \right\} \right) ^{\alpha } +\left( n\delta ^d \right) ^{-\beta } \right) .\nonumber \\ \end{aligned}$$
(6.14)

After taking expectation and in combination with (6.12) this would conclude the proof of (6.11). From this point all the estimates are deterministic.

Step 2. Estimate for \(p>d/(d-1)\). We first use (2.21), e.g. with \(\varepsilon =1\), to obtain

$$\begin{aligned} \textsf{W}^p_{\Omega } \left( \mu ^{\mathcal {B}}+ |\mathcal {G}|\rho , Z\rho \right) \lesssim \sum _{k=1}^K \textsf{W}^p_{\Omega _k}\left( \mu ^{\mathcal {B}} + |\mathcal {G}|\rho , \alpha _k \rho \right) + \textsf{W}^p_{\Omega }\left( \sum _{k=1}^K \alpha _k I_{\Omega _k} \rho , Z\rho \right) ,\nonumber \\ \end{aligned}$$
(6.15)

with

$$\begin{aligned} \alpha _k = \frac{ \mu ^{\mathcal {B}}(\Omega _k)}{\rho (\Omega _k)} + |\mathcal {G}|. \end{aligned}$$
(6.16)

We bound the terms in the right-hand side separately. For the sum of “local” terms, we estimate differently according to \(\Omega _k\in \mathcal {R}_\delta \) or \(\Omega _k\in \mathcal {Q}_\delta \). In the first case we use the naive bound

$$\begin{aligned} \begin{aligned} \textsf{W}^p_{\Omega _k}\left( {\mu ^{\mathcal {B}} + |\mathcal {G}|\rho }, \alpha _k \rho \right)&{\mathop {\le }\limits ^{(2.19)}} \textsf{W}^p_{\Omega _k}\left( \mu ^{\mathcal {B}}, \frac{\mu ^{\mathcal {B}}(\Omega _k)}{\rho (\Omega _k)} \rho \right) {\mathop {\le }\limits ^{(2.23)}} {\text {diam}}(\Omega _k)^p |\mathcal {U}_{\Omega _k}| \\&\lesssim n^{\frac{1}{2}+\varepsilon } \delta ^{p + \frac{d}{2}}. \end{aligned} \end{aligned}$$

Since \(K\lesssim \delta ^{1-d}\) we find

$$\begin{aligned} n^{\frac{p}{d}-1} \sum _{\Omega _k \in \mathcal {R}_\delta } \textsf{W}^p_{\Omega _k}\left( \mu ^{\mathcal {B}} + |\mathcal {G}| \rho , \alpha _k \rho \right) \lesssim n^\varepsilon n^{\frac{p}{d}} \delta ^{1+p} (n\delta ^{-d})^{ - \frac{1}{2}}\le n^\varepsilon n^{\frac{p}{d}} \delta ^{1+p}.\qquad \end{aligned}$$
(6.17)

If \(\Omega _k \in \mathcal {Q}_\delta \) is a cube, we use instead Proposition 2.9 with \(\mu ^{\mathcal {B}}\) instead of \(\mu \) and \(|\mathcal {G}|\) instead of h, so that

$$\begin{aligned} \begin{aligned} \textsf{W}^p_{\Omega _k}\left( {\mu ^{\mathcal {B}} + |\mathcal {G}| \rho }, \alpha _k\rho \right)&\lesssim _\eta \frac{\mu ^{\mathcal {B}}(\Omega _k)^{1+\frac{p}{d}}}{n^{\frac{p}{d}}} \lesssim _\eta n^{-\frac{p}{d}} |\mathcal {U}_{\Omega _k}|^{1+\frac{p}{d}} \\&\lesssim _\eta n^{(1+\frac{p}{d})\varepsilon } n^{\frac{1}{2} (1-\frac{p}{d})} |\Omega _k|^{\frac{1}{2}(1+\frac{p}{d})}. \end{aligned} \end{aligned}$$

Summing this inequality yields

$$\begin{aligned} \begin{aligned} n^{\frac{p}{d}-1}\sum _{\Omega _k \in \mathcal {Q}_\delta } \textsf{W}^p_{\Omega _k}\left( \mu ^{\mathcal {B}} + |\mathcal {G}| \rho , \alpha _k \rho \right)&\lesssim _\eta n^{(1+\frac{p}{d})\varepsilon } n^{-\frac{1}{2} (1-\frac{p}{d})} \sum _{k=1}^K |\Omega _k|^{\frac{1}{2}(1+\frac{p}{d})} \\&{\mathop {\lesssim _\eta }\limits ^{(2.1)}} n^{2 \varepsilon } n^{-\frac{1}{2} (1-\frac{p}{d})}\max \left\{ 1,\delta ^{\frac{1}{2}(d-2-p)} \right\} . \end{aligned} \end{aligned}$$

Notice that since \(n \delta ^d\ge 1\),

$$\begin{aligned} n^{-\frac{1}{2} (1-\frac{p}{d})}\max \left\{ 1,\delta ^{\frac{1}{2}(d-2-p)} \right\} \le (n\delta ^d)^{-\frac{1}{2} (1-\frac{p}{d})} \end{aligned}$$

so that

$$\begin{aligned} n^{\frac{p}{d}-1}\sum _{\Omega _k \in \mathcal {Q}_\delta } \textsf{W}^p_{\Omega _k}\left( \mu ^{\mathcal {B}} + |\mathcal {G}| \rho , \alpha _k \rho \right) \lesssim _\eta n^{2 \varepsilon } (n\delta ^d)^{-\frac{1}{2} (1-\frac{p}{d})}. \end{aligned}$$
(6.18)

We then consider the last term in (6.15). Using Lemma 2.8 with \(Z \rho \gtrsim \eta n\) in place of \(\lambda \) (recall that we assume here that A holds), we get

$$\begin{aligned} \textsf{W}^p_{\Omega }\left( \sum _{k=1}^K \alpha _k I_{\Omega _k} \rho , Z\rho \right) \lesssim _\eta n^{1-p} \left\| \sum _{k=1}^K \alpha _kI_{\Omega _k} \rho - Z\rho \right\| _{W^{-1,p}(\Omega )}^p. \end{aligned}$$
(6.19)

Recalling (6.16) and that \(Z = \mu ^{\mathcal {B}}(\Omega ) + |\mathcal {G}|\), we can rewrite

$$\begin{aligned} \sum _{k=1}^K \alpha _kI_{\Omega _k} \rho - Z\rho = \sum _{k=1}^K \frac{ \mu ^{\mathcal {B}}(\Omega _k)}{\rho (\Omega _k)} \left( I_{\Omega _k}- \rho (\Omega _k) \right) \rho . \end{aligned}$$

By (2.15) of Lemma 2.6 with \(h= n^{1+2\varepsilon }\) we thus have in A

$$\begin{aligned} \left\| \sum _{k=1}^K \alpha _kI_{\Omega _k} \rho - Z\rho \right\| _{W^{-1,p}(\Omega )}\lesssim n^{\varepsilon } \delta ^{1-\frac{d}{2}} |\log (\delta )| n^{\frac{1}{2}}. \end{aligned}$$

Combining this with (6.19) we get that in A,

$$\begin{aligned} n^{\frac{p}{d}-1} \textsf{W}^p_{\Omega }\left( \sum _{k=1}^K \alpha _k I_{\Omega _k} \rho , Z\rho \right) \lesssim n^{p\varepsilon } (n\delta ^d)^{-p\frac{(d-2)}{2d}} |\log (\delta )|^p. \end{aligned}$$

Inserting this estimate, (6.17) and (6.18) in (6.15) we finally obtain (notice that \(p(d-2)>d-p\) for \(p>d/(d-1)\)) that in A,

$$\begin{aligned} n^{\frac{p}{d}-1}\textsf{W}^p\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) \lesssim _\eta n^{\max \left\{ p,2 \right\} \varepsilon } \left( n^{\frac{p}{d}} \delta ^{1+p}+(n\delta ^d)^{-\frac{(d-p)}{2d}} |\log (\delta )|^p \right) . \end{aligned}$$

Up to replacing \(\varepsilon \) by \(\max \left\{ p,2 \right\} \varepsilon \) and choosing \(\beta < \frac{(d-p)}{2d}\) this concludes the proof of (6.14) if \(p>d/(d-1)\).

Step 3. Estimate for \(p\le d/(d-1)\). Since \(2>d/(d-1)\ge p\), we may use Jensen’s inequality (2.20) to infer that in A,

$$\begin{aligned} \begin{aligned} n^{\frac{p}{d}-1} \textsf{W}^p\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right)&\le n^{\frac{p}{d}-1} Z^{1-\frac{p}{2}}\left( \textsf{W}^2\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) \right) ^{\frac{p}{2}}\\&\lesssim (\eta +n^{-\frac{1}{2}})^{1-\frac{p}{2}}\left( n^{\frac{2}{d}-1}\textsf{W}^2\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) \right) ^{\frac{p}{2}}\\&\lesssim \left( n^{\frac{2}{d}-1}\textsf{W}^2\left( \mu ^{\mathcal {B}}+|\mathcal {G}|\rho , Z\rho \right) \right) ^{\frac{p}{2}}. \end{aligned} \end{aligned}$$

Using (6.14) for \(p=2\) concludes the proof of (6.14) also in this case. \(\square \)

Finally, we consider the case of a cube \(Q_{mL}\) decomposed into cubes of sidelength L. The difficulty compared to the previous two cases is to obtain bounds which are independent of m. This is achieved using the additional independence for the point processes \(\mathcal {U}\), \(\mathcal {V}\). While we believe that a direct proof combining Green kernel bounds in the spirit of the proof of Lemma 2.6 together with a Rosenthal type inequality for the (non independent) random variables \(\mu ^{\mathcal {B}}(Q_i)\) should be possible we give a more elementary proof based on subadditivity and concentration.

Proposition 6.5

Let \(d \ge 3\), \(\eta \in (0,1/2)\), \(L \ge 1\) and \(m \in \mathbb {N}{\setminus }\left\{ 0 \right\} \). Let \(\mathcal {U}\), \(\mathcal {V}\) be point processes on \(Q_{mL}\) such that the restrictions \((\mathcal {U}_{Q_i}, \mathcal {V}_{Q_i})_{i}\) on all sub-cubes \(Q_i= Q_L + Lz_i\subseteq Q_{mL}\), with \(z_i \in \mathbb {Z}^d\), are independent copies (translated by the vector \(Lz_i\)) of the pair of processes \((\mathcal {U}_{Q_L}, \mathcal {V}_{Q_L})\) and such that for every \(q\ge 1\), there exists \(C(q)>0\) such that

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}_{Q_i}|^q +|\mathcal {V}_{Q_i}|^q \right] \le C(q) L^{d\frac{q}{2}}. \end{aligned}$$
(6.20)

Let \(\mathcal {N}^{\eta }\), \(\mathcal {M}^{\eta }\) independent Poisson processes on \(Q_{mL}\) with constant intensity \(\eta \), also independent from \((\mathcal {U},\mathcal {V})\). Then, for every \(p \in [1, d)\), there exists \(C(\eta ) = C(\eta ,p,d, (C(q))_{q\ge 1})>0\) and \(\alpha = \alpha (p,d)>0\), such that, if \(L \ge C(\eta )\),

$$\begin{aligned} \mathbb {E}\left[ \frac{1}{|Q_{m L}|} \textsf{M}^p\left( \mathcal {U}\cup \mathcal {N}^\eta , \mathcal {V}\cup \mathcal {M}^{\eta } \right) \right] \lesssim \eta ^{1-\frac{p}{d}} + \frac{ C(\eta )}{L^\alpha }. \end{aligned}$$

Remark 6.6

Let us preliminarily notice that, for any \(R \subseteq Q_{mL}\) that is the disjoint union of k cubes among the cubes \(Q_i = Q_L + Lz_i\), \(z_i \in \mathbb {Z}^d\), we have the upper bound, if \(q \ge 1\),

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ |\mathcal {U}_R|^{q} \right]&= k^{q} \mathbb {E}\left[ \left( \frac{1}{k} \sum _{Q_i \subseteq R} |\mathcal {U}_{Q_i}| \right) ^{q} \right] \le k^{q} \mathbb {E}\left[ \frac{1}{k} \sum _{Q_i \subseteq R} |\mathcal {U}_{Q_i}|^{q} \right] \\&\lesssim k^{q} L^{d\frac{q}{2}} \lesssim (kL^d)^{q} = |R|^{q}. \end{aligned} \end{aligned}$$
(6.21)

In particular, we have

$$\begin{aligned} \mathbb {E}\left[ |\mathcal {U}|^{q} \right] \lesssim m^{dq} L^{d\frac{q}{2}} \lesssim |Q_{mL}|^q. \end{aligned}$$
(6.22)

Moreover, by Rosenthal inequalities [43], if \(q \ge 2\),

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \left| |\mathcal {U}_R| - \mathbb {E}\left[ |\mathcal {U}_R| \right] \right| ^q \right]&= \mathbb {E}\left[ \left| \sum _{Q_i\subseteq R} \left( |\mathcal {U}_{Q_i}| - \mathbb {E}\left[ |\mathcal {U}_{Q_i}| \right] \right) \right| ^q \right] \\&\lesssim k \mathbb {E}\left[ \left| |\mathcal {U}_{Q_L}| - \mathbb {E}\left[ |\mathcal {U}_{Q_L}| \right] \right| ^q \right] + k^{\frac{q}{2}} \mathbb {E}\left[ \left| |\mathcal {U}_{Q_L}| - \mathbb {E}\left[ |\mathcal {U}_{Q_L}| \right] \right| ^2 \right] ^{\frac{q}{2}} \\&\lesssim k L^{d\frac{q}{2}} + k^{\frac{q}{2}} L^{d\frac{q}{2}} \lesssim |R|^{\frac{q}{2}}. \end{aligned} \end{aligned}$$

We will use all these bounds in the proof below.

Proof of Theorem 6.5

For simplicity, we write throughout the proof Q instead of \(Q_{mL}\). As in the previous two proofs, we start from (6.3) and (6.5) (with \(\rho =I_Q/|Q|\)) and estimate by (6.2), see also Remark 6.2,

$$\begin{aligned} \mathbb {E}\left[ \frac{1}{|Q|}\textsf{W}^p_Q \left( \mu ^{\mathcal {G}},\frac{|\mathcal {G}|}{|Q|} \right) \right] \lesssim \mathbb {E}\left[ |\mathcal {G}| \right] ^{1-\frac{p}{d}} |Q|^{\frac{p}{d}-1}. \end{aligned}$$

Since \(|\mathcal {G}|\le |\mathcal {N}^\eta |\) we get

$$\begin{aligned} \mathbb {E}\left[ \frac{1}{|Q|}\textsf{W}^p_Q \left( \mu ^{\mathcal {G}},\frac{|\mathcal {G}|}{|Q|} \right) \right] \lesssim \eta ^{1-\frac{p}{d}}. \end{aligned}$$

In order to conclude the proof it is thus enough to show

$$\begin{aligned} \mathbb {E}\left[ \frac{1}{|Q|} W_{Q}^p \left( \frac{ |\mathcal {G}|}{|Q|} + \mu ^{\mathcal {B}}, \frac{Z}{|Q|} \right) \right] \lesssim \frac{C(\eta )}{L^\alpha }. \end{aligned}$$
(6.23)

We split the proof into several steps. We first consider the case \(p\ge 2\ge d/(d-1)\).

Step 1. Concentration bounds for \(\mu ^\mathcal {B}\). In this intermediate step, we collect some facts about \(\mu ^{\mathcal {B}}(R)\), where \(R \subseteq Q\) is a disjoint union of k cubes \(Q_i = Q_L + Lz_i\), \(z_i \in \mathbb {Z}^d\). First of all, the construction of \(\mathcal {B}\) ensures that \(\mathbb {E}\left[ \mu ^{\mathcal {B}}(Q_i) \right] \) does not depend on \(Q_i\) (one could in fact prove that \((\mu ^{\mathcal {B}}(Q_i))_{i}\) is an exchangeable sequence). We deduce that

$$\begin{aligned} \mathbb {E}\left[ \mu ^{\mathcal {B}}(Q_i) \right] = \frac{ \mathbb {E}\left[ \mu ^{\mathcal {B}}(Q) \right] }{m^d}, \quad \text {hence} \quad \frac{\mathbb {E}\left[ \mu ^{\mathcal {B}}(R) \right] }{|R|} = \frac{ \mathbb {E}\left[ \mu ^{\mathcal {B}}(Q) \right] }{|Q|}. \end{aligned}$$
(6.24)

Indeed, when conditioned on \(Z=z\), \(|\mathcal {N}^\eta |=n\), \(|\mathcal {U}| = u_Q\), and the number of points \(|\mathcal {U}_R|=u_R \le u_Q\), \(\mu ^{\mathcal {B}}(R)\) is the number of “successes” in the random sampling procedure, without replacement which we used to define \(\mathcal {B}\), with \(b = \max \left\{ z-n,0 \right\} \) draws from an urn containing \( u_Q\) marbles, \(u_R\) of which have the desired feature (their extraction defines a success). This is explicitly given by a hypergeometric distribution with parameters \(\left( u_Q, u_R, b \right) \): given \(s_R \le u_R\),

$$\begin{aligned} \mathbb {P}\left( \mu ^{\mathcal {B}}(R) = s_R | B \right) = { u_R \atopwithdelims ()s_R} {u_Q-u_R \atopwithdelims ()b - s_R} / { u_Q \atopwithdelims ()b}, \end{aligned}$$

where for brevity we write

$$\begin{aligned} B = \left\{ Z=z, |\mathcal {N}^\eta |=n_Q, |\mathcal {U}| = u_Q, |\mathcal {U}_R| = u_R \right\} . \end{aligned}$$

Specializing to \(R = Q_i\), we see that this quantity does not depend on \(Q_i\), since \(|\mathcal {U}_{Q_i}|\) are i.i.d. variables, hence the joint laws of the variables \((Z, |\mathcal {N}^\eta |, |\mathcal {U}|, |\mathcal {U}_{Q_i}| )\) involved in the definition of the law of \(\mu ^{\mathcal {B}}(Q_i)\) do not depend on i.

Using the concentration inequality (2.29) for hypergeometric random variables, we have

$$\begin{aligned} \mathbb {E}\left[ \left| \mu ^{\mathcal {B}}(R) - \mathbb {E}\left[ \mu ^{\mathcal {B}}(R)|B \right] \right| ^p |B \right] \lesssim \left( u_R \right) ^{\frac{p}{2}}, \end{aligned}$$

from which we find, thanks to (6.21) (recall that \(p\ge 2\)),

$$\begin{aligned} \mathbb {E}\left[ \left| \mu ^\mathcal {B}(R)- \mathbb {E}\left[ \mu ^\mathcal {B}(R) \right] \right| ^p \right] \lesssim |R|^{\frac{p}{2}}. \end{aligned}$$
(6.25)

Step 2. Subadditivity bound. Using (6.24) and (6.25) above, we are in a position to follow closely the main argument of [25, Proposition 5.4]. We define, for a rectangle \(R \subseteq Q\) that is a union of cubes \(Q_i\)’s,

$$\begin{aligned} f(R)= \mathbb {E}\left[\frac{1}{|R|}\textsf{W}^p_R\left( \mu ^\mathcal {B}+ \frac{|\mathcal {G}|}{|Q|}, \frac{\mu ^{\mathcal {B}}(R)}{|R|} + \frac{|\mathcal {G}|}{|Q|} \right) \right]. \end{aligned}$$

We say that \(\mathcal {R}\) is an admissible partition of R if it is made of rectangles satisfying the following conditions. Each \(R_k\in \mathcal {R}\) is a union of cubes \(Q_i\), it is of moderate aspect ratio and \(3^{-d}|R|\le |R_j|\le |R|\). We claim that there exists \(C_\eta =C(d,p,\eta )>0\) such that for every admissible partition \(\mathcal {R}\) of R and every \(\varepsilon \in (0,1)\), we have

$$\begin{aligned} f(R)\le (1+\varepsilon )\sum _i \frac{|R_i|}{|R|} f(R_i) +\frac{C_\eta }{\varepsilon ^{p-1}} \frac{1}{|R|^{\frac{p(d-2)}{2d}}}. \end{aligned}$$
(6.26)

Setting

$$\begin{aligned} \alpha = \frac{ \mu ^{\mathcal {B}}(R)}{|R|}+\frac{ |\mathcal {G}|}{|Q|}, \quad \alpha _i = \frac{ \mu ^{\mathcal {B}}(R_i)}{|R_i|}+\frac{|\mathcal {G}|}{|Q|} \end{aligned}$$

and using (2.21), this reduces to

$$\begin{aligned} \mathbb {E}\left[ \frac{1}{|R|}\textsf{W}^p_{R}\left( \sum _{i} \alpha _i I_{R_i},\alpha \right) \right] \le \frac{C_\eta }{ |R|^{\frac{p(d-2)}{2d}}}. \end{aligned}$$
(6.27)

First, we single out the event

$$\begin{aligned} A= \left\{ \min \left\{ |\mathcal {N}^\eta |, |\mathcal {M}^\eta | \right\} \ge \eta |Q| /2 \right\} . \end{aligned}$$

Notice that on A, we have \(\alpha \gtrsim \eta \). By the concentration bound (2.27), for every \(q\ge 1\), \(\mathbb {P}(A^c) \lesssim _q (\eta |Q|)^{-q}\le (\eta |R|)^{-q}\). Therefore, if \(A^c\) holds, we can use the trivial bound

$$\begin{aligned} \begin{aligned} \frac{1}{|R|}\textsf{W}^p_{R}\left( \sum _{i} \alpha _i I_{R_i},\alpha \right)&\le \frac{1}{|R|}\textsf{W}^p_{R}\left( \sum _{i} \frac{ \mu ^{\mathcal {B}}(R_i)}{|R_i|} I_{R_i},\frac{ \mu ^{\mathcal {B}}(R)}{|R|} \right) \\&\le |R|^{\frac{p}{d}-1} \mu ^{\mathcal {B}}(R) \le |R|^{\frac{p}{d}-1} |\mathcal {U}_R|. \end{aligned} \end{aligned}$$

Using Cauchy-Schwarz inequality and (6.21) with \(q=2\), we get for any \(q \ge 1\),

$$\begin{aligned} \mathbb {E}\left[ \frac{1}{|R|}\textsf{W}^p_{R}\left( \sum _{i} \alpha _i I_{R_i},\alpha \right) I_{A^c} \right] \lesssim _{\eta ,q} |R|^{\frac{p}{d}-q}, \end{aligned}$$

which is estimated by the right-hand side of (6.27) provided we choose q large enough.

If A holds, we use (2.22) in combination with (2.13) (recall that for rectangles of moderate aspect ratio the Sobolev constant is uniformly bounded) to get

$$\begin{aligned} \frac{1}{|R|}\textsf{W}^p_{R}\left( \sum _{i} \alpha _i I_{R_i},\alpha \right)&\lesssim \frac{ |R|^{\frac{p}{d}-1} }{ \alpha ^{p-1}} \sum _{i} |R_i| \left| \alpha _i - \alpha \right| ^p\\&\lesssim \eta ^{1-p} |R|^{\frac{p}{d}}\sum _{i} \left| \alpha _i - \alpha \right| ^p. \end{aligned}$$

We thus have

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \frac{1}{|R|}\textsf{W}^p_{R}\left( \sum _{i} \alpha _i I_{R_i},\alpha \right) I_A \right]&\lesssim \eta ^{1-p} |R|^{\frac{p}{d}}\sum _{i}\mathbb {E}\left[ \left| \alpha _i - \alpha \right| ^pI_A \right] \\ {}&\le \eta ^{1-p} |R|^{\frac{p}{d}}\sum _{i}\mathbb {E}\left[ \left| \alpha _i - \alpha \right| ^p \right] .\end{aligned} \end{aligned}$$

Using that \(\alpha _i-\alpha =\frac{\mu ^{\mathcal {B}}(R_i)}{|R_i|}-\frac{\mu ^{\mathcal {B}}(R)}{|R|}\), (6.24) and triangle inequality we have

$$\begin{aligned} \begin{aligned} \sum _{i}\mathbb {E}\left[ \left| \alpha _i - \alpha \right| ^p \right]&\lesssim \sum _i \mathbb {E}\left[ \left| \frac{\mu ^{\mathcal {B}}(R_i)}{|R_i|}- \mathbb {E}\left[ \frac{\mu ^{\mathcal {B}}(R_i)}{|R_i|} \right] \right| ^p \right] \\ {}&\quad + \mathbb {E}\left[ \left| \frac{\mu ^{\mathcal {B}}(R)}{|R|}- \mathbb {E}\left[ \frac{\mu ^{\mathcal {B}}(R)}{|R|} \right] \right| ^p \right] \\&{\mathop {\lesssim }\limits ^{(6.25)}} |R|^{-\frac{p}{2}}. \end{aligned}\end{aligned}$$

This proves

$$\begin{aligned} \mathbb {E}\left[ \frac{1}{|R|}\textsf{W}^p_{R}\left( \sum _{i} \alpha _i I_{R_i},\alpha \right) I_A \right] \lesssim \frac{\eta ^{1-p}}{ |R|^{\frac{p(d-2)}{2d}}}, \end{aligned}$$

concluding the proof of (6.27).

Step 3. Dyadic approximation. Starting from the cube \(Q=Q_{mL}\), we build a sequence of finer and finer partitions of \(Q_{mL}\) by rectangles of moderate aspect ratios that are unions of sub-cubes \(Q_i\)’s. We let \(\mathcal {R}_0=\{Q_{mL}\}\) and define \(\mathcal {R}_k\) inductively as follows. Let \(R\in \mathcal {R}_k\). Up to translation we may assume that \(R=\prod _{i=1}^d (0, m_i L)\) for some \(m_i\in \mathbb {N}\). We then split each interval \((0,m_i L)\) into \((0,\lfloor \frac{m_i}{2}\rfloor L)\cup (\lfloor \frac{m_i}{2}\rfloor L, m_i L)\). It is readily seen that this induces an admissible partition of R. Let us point out that when \(m_i=1\) for some i, the corresponding interval \((0,\lfloor \frac{m_i}{2}\rfloor L)\) is empty. This procedure stops after a finite number of steps K once \(\mathcal {R}_K=\{Q_L+z_i, z_i\in [0,m-1]^d \cap \mathbb {Z}^d\}\). It is also readily seen that \(2^{K-1}<m\le 2^K\) and that for every \(k\in [0,K]\) and every \(R\in \mathcal {R}_k\) we have \(|R|\sim (2^{K-k} L)^d\).

We prove via a downward induction the existence of \(\Lambda _\eta >0\) such that for every \(k\in [0,K]\) and every \(R\in \mathcal {R}_{k}\),

$$\begin{aligned} f(R)\le f(Q_L)+ \Lambda _\eta (1+f(Q_L)) L^{-\frac{d-2}{2}} \sum _{j=K-k}^K 2^{- j\frac{d-2}{2}}. \end{aligned}$$
(6.28)

The statement is clearly true for \(k=K\), since the law of the point process on each cube \(Q_i = Q_L + z_i\) is the same, hence \(f(Q_i) = f(Q_L)\). Assume that it holds true for \(k+1\). Let \(R\in \mathcal {R}_{k}\). Applying (6.26) with \(\varepsilon = (2^{K-k} L)^{-(d-2)/2}\ll 1\), we get

$$\begin{aligned} \begin{aligned} f(R)&\le (1+ \varepsilon ) \sum _{R_i\in \mathcal {R}_{k+1}, R_i\subset R} \frac{|R_i|}{|R|} f(R_i) + \frac{C_\eta }{\varepsilon ^{p-1}} \frac{1}{|R|^{\frac{p(d-2)}{2d}}}\\&{\mathop {\le }\limits ^{(6.28)}} (1+\varepsilon ) \left(f(Q_L)+ \Lambda _\eta (1+f(Q_L))L^{-\frac{d-2}{2}} \sum _{j=K-k+1}^K 2^{- j\frac{d-2}{2}}\right) \\&\qquad \qquad + C_\eta (2^{K-k} L)^{-\frac{d-2}{2}}\\&\le f(Q_L)+ \Lambda _\eta (1+f(Q_L))L^{-\frac{d-2}{2}}\cdot \\&\qquad \qquad \cdot \left[\sum _{j=K-k+1}^K 2^{- j\frac{d-2}{2}}+2^{-(K-k)\frac{d-2}{2}}\left( \frac{C_\eta +1}{\Lambda _\eta }+L^{-\frac{d-2}{2}} \sum _{j=K-k+1}^K 2^{- j\frac{d-2}{2}} \right)\right]. \end{aligned}\end{aligned}$$

If L is large enough (depending on \(\eta \)) then

$$\begin{aligned} \left( \sum _{j=K-k+1}^K 2^{- j\frac{d-2}{2}} \right) \eta ^{1-3p} L^{-\frac{(d-2)}{2}}\lesssim _\eta \left( \sum _{j=0}^\infty 2^{- j\frac{d-2}{2}} \right) L^{-\frac{(d-2)}{2} } \le \frac{1}{2}. \end{aligned}$$

Finally, choosing \(\Lambda _\eta \ge 2(C+1)\) yields (6.28). Applying (6.28) to \(R=Q_{mL}\) and using that \(\sum _{j\ge 0} 2^{- j\frac{d-2}{2}}<\infty \), we get

$$\begin{aligned} f(Q_{mL})\le f(Q_L)+ \Lambda _\eta (1+f(Q_L)) \frac{1}{L^{\frac{d-2}{2}}}. \end{aligned}$$
(6.29)

Step 4. Conclusion in the case \(p\ge 2\). We finally claim that

$$\begin{aligned} f(Q_L)\le \frac{C_\eta }{L^{\frac{1}{2}(d-p)}}. \end{aligned}$$
(6.30)

Arguing verbatim as in the proof of (6.27) of Step 5, we see that it is enough to assume that we are in the event \(A= \left\{ \min \left\{ |\mathcal {N}^\eta |, |\mathcal {M}^\eta | \right\} \ge \eta |Q| /2 \right\} \). Since in this case \(|\mathcal {G}|/|Q| \gtrsim \eta \), Proposition 2.9 yields

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \frac{1}{|Q_L|} \textsf{W}^p_{Q_L}\left( \mu ^\mathcal {B}+ \frac{|\mathcal {G}|}{|Q|}, \frac{\mu ^\mathcal {B}(Q_L)}{|Q_L|}+ \frac{|\mathcal {G}|}{|Q|} \right) I_A \right]&\lesssim \frac{1}{|Q_L| \eta ^{\frac{p}{d}}} \mathbb {E}\left[ \left( \mu ^\mathcal {B}(Q_L) \right) ^{1+\frac{p}{d}} I_A \right] \\&\le \frac{1}{|Q_L| \eta ^{\frac{p}{d}}} \mathbb {E}\left[ \left( |\mathcal {U}_{Q_L}| \right) ^{1+\frac{p}{d}} \right] \\ {}&{\mathop {\lesssim }\limits ^{(6.20)}} \frac{L^{\frac{d+p}{2}}}{L^d \eta ^{\frac{p}{d}}} \lesssim \frac{1}{\eta ^{\frac{p}{d}}L^{\frac{d-p}{2}}}. \end{aligned} \end{aligned}$$

This proves (6.30). Inserting this into (6.29) finally gives (recall that \(p>2\))

$$\begin{aligned} f(Q)\le \frac{C_\eta }{L^{\frac{1}{2}(d-p)}}. \end{aligned}$$

This concludes the proof of (6.23) with \(\alpha =(d-p)/2\) when \(p\ge 2\).

Step 5. The case \(p\le 2\). If \(p\le 2\), we argue as in the previous two proofs and use (2.20) to obtain (recall that \(Z=|\mathcal {G}|+|\mathcal {B}|\le |\mathcal {U}|+|\mathcal {N}^\eta |\))

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \frac{1}{|Q|} \textsf{W}^p_{Q} \left( \frac{ |\mathcal {G}|}{|Q|} + \mu ^{\mathcal {B}}, \frac{ Z}{|Q|} \right) \right]&\le \mathbb {E}\left[ \left( \frac{Z}{|Q|} \right) ^{1-\frac{p}{2}} \left( \frac{1}{|Q|} \textsf{W}_{Q}^2 \left( \frac{ |\mathcal {G}|}{|Q|} + \mu ^{\mathcal {B}}, \frac{ Z}{|Q|} \right) \right) ^{\frac{p}{2}} \right] \\&\le \left( \frac{ \mathbb {E}\left[ Z \right] }{|Q|} \right) ^{1-\frac{p}{2}} \mathbb {E}\left[ \frac{1}{|Q|} \textsf{W}_{Q}^2 \left( \frac{ |\mathcal {G}|}{|Q|} + \mu ^{\mathcal {B}}, \frac{ Z}{|Q|} \right) \right] ^{\frac{p}{2}}\\&\lesssim \left( L^{-\frac{d}{2}}+\eta \right) ^{1-\frac{p}{2}}\left( \frac{C(\eta )}{L^\alpha } \right) ^{\frac{p}{2}} \lesssim \left( \frac{C(\eta )}{L^\alpha } \right) ^{\frac{p}{2}}, \end{aligned} \end{aligned}$$

where in the last step we used (6.22) and (6.23) with \(p=2\). This concludes the proof of (6.23) for any \(p<d\). \(\square \)