Belief propagation for minimum weight many-to-one matchings in the random complete graph

In a complete bipartite graph with vertex sets of cardinalities $n$ and $m$, assign random weights from exponential distribution with mean 1, independently to each edge. We show that, as $n\rightarrow\infty$, with $m = \lceil n/\alpha\rceil$ for any fixed $\alpha>1$, the minimum weight of many-to-one matchings converges to a constant (depending on $\alpha$). Many-to-one matching arises as an optimization step in an algorithm for genome sequencing and as a measure of distance between finite sets. We prove that a belief propagation (BP) algorithm converges asymptotically to the optimal solution. We use the objective method of Aldous to prove our results. We build on previous works on minimum weight matching and minimum weight edge-cover problems to extend the objective method and to further the applicability of belief propagation to random combinatorial optimization problems.


Introduction
We address here an optimization problem over bipartite graphs related to the matching problem -called the many-to-one matching problem, which has applications in an algorithm for genome sequencing and as a measure of distance between finite sets. Given two sets A and B, with |A| ≥ |B|, consider a bipartite graph G with vertex set V = A ∪ B, and edge set E ⊂ A × B. A many-to-one matching in G is a subgraph M such that each vertex in A -called the one sidehas degree 1 in M, and each vertex in B -called the many side -has degree 1 or more in M. A many-to-one matching can be viewed as an onto function from A to B. Each edge e ∈ E has a weight ξ e ∈ R + = [0, ∞). The cost of the matching M is the sum of the weights of the edges in M. We focus here on the minimum cost many-to-one matchings on complete bipartite graphs where the edge-weights are independent and identically distributed (i.i.d.) random variables.
For several optimization problems over graphs with random edge-weights, where the objective is to minimize the sum weight of a subset of edges under some constraints, the expected optimal cost converges to a constant as n → ∞. Such results are known for the minimum spanning tree [12], matching [1,4], edge-cover [14,15], and traveling salesman problem (TSP) [23] over complete graphs or complete bipartite graphs with i.i.d. edge-weights. Furthermore, simple, iterative belief propagation algorithms can be used to find asymptotically optimal solutions for matching [21] and edge-cover [15].
We prove a similar result for many-to-one matching. Let us be a little more specific. Fix a real number α > 1. In the complete bipartite graph K n,m with vertex sets A and B of cardinality n and m = n/α respectively, we take the edge-weights to be i.i.d. random variables having the exponential distribution with mean 1. Denote by M α n the minimum cost of many-to-one matching on K n,m . We show that the expected value of M α n converges to a constant (which depends on α and can be computed) as n → ∞. Further, we show that a belief propagation (BP) algorithm finds the asymptotically optimal many-to-one matching in O(n 2 ) steps. "Fraction of a contiguration and one enzyme data. Top: the contigs; bold arrow denote the orientation of each contig. Vertical lines: restrictions sites. Bottom: restriction fragments between neighboring sites." Image and the quoted caption from [7].
We proceed to show these results via the objective method following [4] for matching and [15] for edge-cover. Before we give the background and overview of the methods, we describe two practical applications of the many-to-one matching problem.
2. Applications of many-to-one matching

Restriction scaffold problem
Ben-Dor et al. [7] introduced the restriction scaffold problem for assisting the finishing step in a shotgun genome sequencing project. Shotgun sequencing refers to the process of generating several short clones of a large target genome and then sequencing these individual clones. Sequencing a short genome is much easier than sequencing the entire genome, and the cloning process is automated and fast, but the locations of clones are random. This process, which is like a random sampling, leaves us with several sequenced segments of the target. It is possible to construct the target sequence by assembling these short sequences by matching their overlaps, provided that we have sufficient redundancy in the samples to cover the entire sequence with enough overlap.
Usually, asking for enough redundancy -that this assembly based on overlaps reconstructs the whole sequence -will result in excessive cloning and sequencing tasks. As a tradeoff, we can give up on this high redundancy, and accept as the output of this assembly task large contiguous sequences (Ben-Dor et al. call these contigs), which together cover a large part of the target sequence. If the gaps -the positions in the target sequence that are not covered by these contigs -are small, those areas of the genome can be targeted manually to finish the sequencing project. However, the relative position and orientation of the contigs, and the sizes of gaps between successive contigs are unknown. We need some information to figure out how these contigs align along the target genome. See the top part of figure 1.
Ben-Dor et al. propose the use of restriction enzymes to generate this information. Restriction enzymes cut the DNA at specific recognition sequences (differing according to the enzyme) called the restriction site of the enzyme. These restriction sites form a scaffold over which the contigs can be assembled. Again, see figure 1. They measure the approximate sizes of the fragments obtained by digesting the target DNA with a restriction enzyme. On the other hand, by assuming a particular ordering and orientation of the contigs and the gaps between successive contigs the sizes of the fragments can be computed by reading out the recognition sequence of the enzyme along the contigs (their sequences are known). Comparing these computed fragment sizes with the measured fragment sizes for several different restriction enzymes, gives them a way to obtain a new estimate of the ordering, orientation and gaps. This is the basis of their iterative algorithm for solving the restriction scaffold problem.
The step of the iterative algorithm involving the comparison of the computed fragment sizes with the measured fragment sizes is formulated as a many-to-one matching problem -the computed fragment sizes on the "one" side, and the measured fragment sizes on the "many" side. The measurement step process does not indicate the number of fragments corresponding 2 to a particular measurement. The many-to-one nature of the association captures the notion that a measurement may correspond to one or more physical fragments, and each physical fragment must correspond to one measurement. The weight of an edge captures how much the two sizes differ. Specifically, the weight of an edge joining a computed fragment size s(c) to a measured fragment size s(m) is |log(s(c)) − log(s(m))|. They solve the many-to-one matching problem using this linear property: the vertices are represented as points on a line, a vertex corresponding to a size s is at position log(s) on the line, and the cost of matching two points is the distance between the points. We do not assume this linear structure in our formulation of the many-to-one problem on the complete bipartite graph. However the i.i.d. weight assumption is the mean-field approximation of a geometric structure in the following sense. Take a set A of n points and a set B of m = n/α points independently and uniformly at random in the unit sphere in R d . When m and n are large, the probability that the distance between a typical point of A and a typical point of B is less than r is approximately r d for small r. We can represent the distances between points of the two sets by forming a complete bipartite graph and taking the weight of an edge as the distance between the corresponding points. For a mean-field model we ignore the geometric structure of R d (and triangular inequality) and only take into account the interpoint distances, by taking the edge-weights to be i.i.d. copies of a nonnegative random variable ξ satisfying Pr(ξ < r) ≈ r d as r → 0.

A measure of distance between sets of points
A concept of distance between finite sets of points is useful in many areas like machine learning, computational geometry, and comparison of theories. Such a distance is derived from a given distance function (or a metric) between the points. For example, in a clustering process over a set of examples, suppose we are given a function d such that d(e 1 , e 2 ) corresponds to the distance between two examples e 1 and e 2 . Clustering methods such as TIC [9] rely on a function d 1 that specifies the distance d 1 (C 1 , C 2 ) for two clusters (which are sets of examples) C 1 and C 2 .
Eiter and Mannila [10] use the term surjection distance for a measure of similarity, based on optimal many-to-one matching, between two finite sets of points in a metric space. The surjection distance is the minimum cost of a many-to-one matching, with the points of the larger set forming the vertices of the one side, the points of the smaller set forming the vertices of the many side of a bipartite graph, and the edge-weights taken as the distance between the corresponding points. By reduction to the minimum weight matching problem they find an O(n 3 ) algorithm for many-to-one matching. Belief propagation (in the random setting we consider here) yields an asymptotically optimal solution in O(n 2 ) steps.

Background and overview of the methods
Several of the results for random combinatorial optimization problems have originated as conjectures supported by heuristic statistical mechanical methods such as the replica method and the cavity method [20,16,17,18,19]. Establishing the validity of these methods is a challenge even for mathematically simple models.
Aldous [1,4] provided a rigorous proof of the ζ(2) limit conjecture for the random matching (or assignment) problem. The method he used is called the objective method. A survey of this method and its application to a few other problems was given in [3]. The objective method provides a rigorous counterpart to the cavity method, but has had success only in a few settings.
The crux of the objective method is to obtain a suitable distributional limit for a sequence of finite random weighted graphs of growing vertex set size. The space of weighted graphs is endowed with a metric that captures the proximity based on the local neighborhood around a vertex. The weak convergence of probability measures on this space is called local weak convergence to emphasize that the convergence applies to events that are essentially characterized by the local structure around a typical vertex, and not to events that crucially depend on the global structure of the graph. This form of convergence has appeared in varying forms in [13,8,3]. Once we have this convergence, we can relate the optimization problem on finite graphs to an optimization problem on the limit object. For example, Aldous [4] showed that the limit object for the sequence of random weighted complete bipartite graphs K n,n is the Poisson weighted infinite tree (PWIT). Aldous then writes the optimization problem in terms of local conditions at the vertices of the tree, and using the symmetry in the structure of the PWIT, these relations result in a distributional identity called the recursive distributional equation. A solution to this equation is then used to describe asymptotically optimal matchings on K n,n .
One can exploit the recursive distributional equation to construct an iterative decentralized message passing algorithm, a versions of which is called belief propagation (BP) in the computer science literature. BP algorithms are known to converge to the correct solution on graphs without cycles. For graphs with cycles, provable guarantees are known for BP only for certain problems. For example, Bayati, Shah and Sharma [6] proved that the BP algorithm for maximum weight matching on bipartite graphs converges to the correct value as long as the maximum weight matching is unique, and Sanghavi, Malioutov and Willsky [22] proved that BP for matching is as powerful as LP relaxation. Salez and Shah [21] studied belief propagation for the random assignment problem, and showed that a BP algorithm on K n,n converges to an update rule on the limit object PWIT. The iterates on the PWIT converge in distribution to the minimum cost assignment. The iterates are near the optimal solution in O(n 2 ) steps whereas the worst case optimal algorithm on bipartite graphs is O(n 3 ) (expected time O(n 2 log n) for i.i.d. edge capacities); see Salez and Shah [21] and references therein.
In message passing algorithms like belief propagation, the calculations are performed at each vertex of a graph from the messages received from its neighbors. This local nature makes feasible its analysis via the objective method. The author and Sundaresan, in [15], extended the objective method by combining local weak convergence and belief propagation to prove and characterize the limiting expected minimum cost for the edge-cover problem. In this paper, we implement this general program for the many-to-one matching problem. The proof relies on two properties: (1) the neighborhoods around most vertices are tree-like, and (2) the effect of the boundary on the belief propagation messages at the root of the tree vanishes as the boundary is taken to infinity. The second property is formalized as "endogeny" (see [5]) and is handled in Section 10.

Contributions of this paper
This paper extends the line of work on combinatorial optimization problems over graphs with i.i.d. edge-weights. Although the methods are similar to those in [4,21,15], there are key differences from earlier work in addressing the many-to-one matching problem. We now highlight them. For example, we must deal with two types of vertices corresponding to two different constraints on the degrees. The different viewpoints from a vertex in A and a vertex in B makes the limit object for the sequence of graphs K n,m different from the Poisson weighted infinite tree (PWIT) obtained as the limit of bipartite graphs K n,n with vertex sets of the same cardinality. The difference is in the distribution of the weights of the edges at odd and even levels from the root. As a consequence, the recursive distributional equation (RDE) -guiding the construction of an optimal many-to-one matching on the limit object -is expressed in terms of a two-step recursion.
We prove that this recursive distributional equation has a unique solution, and that the manyto-one matching on the limit object generated by the solution of this equation is optimal among all involution invariant many-to-one matchings. To establish correctness of BP, we establish that 4 the fixed-point solution of the RDE has a full domain of attraction, and that the stationary process with the fixed-point marginal distribution on the limit object satisfies endogeny.

Main results
From now on we will write K n,n/α to mean K n,m , where m = n/α . Recall that A and B denote the one side and the many side in the bipartite graph K n,n/α . The weights of the edges are i.i.d. random variables with the exponential distribution, and for simplicity in the subsequent analysis, we take the mean of the edge-weights to be n. This only changes the cost of the optimal solution by a factor of n, and so we have a rescaling in the left hand side of (1).
Our first result establishes the limit of the scaled expected minimum cost of the random many-to-one matching problem.
Theorem 1. For α > 1, the minimum cost M α n of many-to-one matching on K n,n/α , satisfies where Li 2 is the dilogarithm: Our second result shows that a belief propagation algorithm gives a many-to-one matching that is asymptotically optimal as n → ∞. The BP algorithm is based on the corresponding BP algorithms developed in [21,6] for matching and in [15] for edge-cover. The proof uses the technique of showing that the update rule of BP converges to an update rule on a limit infinite tree.
We now define a BP algorithm on an arbitrary bipartite graph G = (A ∪ B, E) with edgecosts. First, some notation. We write w ∼ v to mean w is a neighbor of v, i.e., {v, w} ∈ E. For an edge e = {v, w} ∈ E, we write its cost as ξ G (e) or ξ G (v, w) (with the ordering irrelevant). For each vertex v ∈ V, we associate a nonempty subset π G (v) of its neighbors such that π G (v) has cardinality The BP algorithm is an iterative message passing algorithm. In each iteration k ≥ 0, every vertex v ∈ V sends a message X k G (w, v) to each neighbor w ∼ v according to the following rules: Update rule: Decision rule: Note that the subset M(π k G (v)) is not necessarily a many-to-one matching, but it can be modified to be a many-to-one matching M k n by some patching. We also remark that while ξ G (v, u) = ξ G (u, v) , the messages X k G (w, v) and X k G (v, w) can differ. We analyze the belief propagation algorithm for G = K n,n/α and i.i.d. exponential random edge-costs, and prove that after sufficiently large number of iterates, the expected cost of the many-to-one matching given by the BP algorithm is close to the limit value in Theorem 1. Theorem 2. On K n,n/α , the output M π k K n,n/α of the BP algorithm can be patched to get a many-to-one matching M k n that satisfies

Local weak convergence
First we recollect the terminology for defining convergence of graphs, borrowed from [3].

Rooted geometric networks
A graph G = (V, E) along with a length function l : E → (0, ∞] and label : V → L is called a network. L is a finite set of labels. The distance between two vertices in the network is the infimum of the sum of lengths of the edges of a path connecting the two vertices, the infimum being taken over all such paths. We call the network a geometric network if for each vertex v ∈ V and positive real ρ, the number of vertices within a distance ρ of v is finite. We denote the space of geometric networks by G. A geometric network with a distinguished vertex v is called a rooted geometric network with root v. We denote the space of all connected rooted geometric networks by G * . It is important to note that in G * we do not distinguish between rooted isomorphisms of the same networkisomorphisms that preserve the graph structure along with the root, vertex labels, and edgelengths. We will use the notation (G, φ) to denote an element of G * which is the isomorphism class of rooted networks with underlying network G and root φ.

Local weak convergence
We call a positive real number ρ a continuity point of G if no vertex of G is exactly at a distance ρ from the root of G. Let N ρ (G) denote the neighborhood of the root of G up to distance ρ. N ρ (G) contains all vertices of G which are within a distance ρ from the root of G ( Figure 2). We take N ρ (G) to be an element of G * by inheriting the same length function l as G, same label function as G, and the same root as that of G.
We say that a sequence of rooted geometric networks G n , n ≥ 1 converges locally to an element G ∞ in G * if for each continuity point ρ of G ∞ , there is an n ρ such that for all n ≥ n ρ , there exists a graph isomorphism γ n,ρ from N ρ (G ∞ ) to N ρ (G n ) that maps the root of the former to the root of the latter, preserves the labels of the vertices, and for each edge e of N ρ (G ∞ ), the length of γ n,ρ (e) converges to the length of e as n → ∞.

6
The space G * can be suitably metrized to make it a separable and complete metric space. See [2] for details on the metric. One can then consider probability measures on this space and endow that space with the topology of weak convergence of measures. This notion of convergence is called local weak convergence.
In our setting of complete graphs K n,n/α = (A∪B, E) with random i.i.d. edge-costs {ξ e , e ∈ E} , we regard the edge-costs to be the lengths of the edges, and declare a vertex φ of K n,n/α chosen uniformly at random as the root of K n,n/α . Assign the label o to the vertices in A, and the label m to the vertices in B; the label of a vertex will indicate its permitted degrees in a many-to-one matching: o for one and m for many. This makes (K n,n/α , φ), the isomorphism class of K n,n/α and the root φ, a random element of G * . We will show (Theorem 3 below) that the sequence of random rooted geometric networks K n,n/α converges in the local weak sense to an element of G * , which we call the Poisson weighted infinite tree (PWIT).

Poisson weighted infinite tree
The graph that we define here is a variant of the Poisson weighted infinite tree that appears in [3]. We use the notation from [21] to define the variant. This object is a random element of G * whose distribution is a mixture of the distributions of two random trees: T o α and T m α . Let V denote the set of all finite strings over the alphabet N = {1, 2, 3, . . .}. Let φ denote the empty string and "." the concatenation operator. Let |v| denote the length of the string v. For a string v with |v| ≥ 1, letv denote the parent of v, i.e., the string with the last letter deleted. Let if |v| is even (respectively odd), and label m if |v| is odd (respectively even). In particular, the root φ has label o in T o α , and label m in T m α . Construct a family of independent Poisson processes, with intensity 1, on R + : If we denote the probability measure of T o α and T m α by ν o and ν m respectively, then the probability measure of T α is the mixture We now prove that this distribution is the local weak limit of the distributions of (K n,n/α , φ), where φ is chosen uniformly at random from the vertices of K n,n/α . Theorem 3. The sequence (K n,n/α , φ) having edge-weights that are independent random variables with exponential mean n distribution converges to T α (with root φ) as n → ∞ in the local weak sense.
Proof. In K n,n/α , let A denote the larger set with cardinality n and let B denote the set with smaller cardinality m = n/α . Assign the label o to vertices in A (the one side), and label m to the vertices in B (the many side). Condition the root φ to have label o: this occurs with probability n/(n + n/α ). We bound the likelihood ratio dν N,H,n dν N,H for the restriction that each vertex of T N,H,n corresponds to a unique vertex in K n,n/α . For any vertex v, the vertex v.1 is taken to be the w that minimizes ξ(v, w) among all w =v, and w ∼ v. Suppose that we have associated vertices upto v.i for some where |T N,H | is the number of vertices in T N,H . To see this, observe that the term with square brackets is, using the memoryless property of the exponential distribution, the density of the minimum among the n/α −(i+1) exponential random variables corresponding to nodes on the many side (excluding {v, v.1, . . . , v.i}). The term outside the square brackets is a lower bound on the probability that the minimum is not at one of the vertices of K n,n/α already picked. This establishes (6). The density of the corresponding difference in T N,H is The ratio of the conditional densities is at least Doing this for all vertices of T N,H , and using the fact that under the restriction all edges are distinct, we get The lower bound converges to 1 as n → ∞. This implies that as n → ∞. Now to get the local weak convergence result, fix a ρ > 0 and > 0, and take N and H large enough such that uniformly for all n > |T N,H | the probability that the ρ neighborhood of the root of K n,n/α is a subset of T N,H,n is greater than 1 − ; under this event the ρ neighborhood of the root of K n,n/α converges weakly to the ρ neighborhood of the root of T o α by (7). This shows that conditional on the root having label o, T o α is the local weak limit of K n,n/α . Similarly we can show that conditional on the root having label m, the local weak limit is T m α . The probability of the label of the root to be o and m converges respectively to α/(1 + α) and 1/(1 + α). This completes the proof.
The above theorem is a generalization of Aldous's result [1, Lemma 10] for the local weak limit of complete bipartite graphs in which both the vertex subsets have the same cardinality.

8
A similar result was earlier established by Hajek [13, Sec. IV] for a class of sparse Erdős-Rényi random graphs.
The above theorem says that if we look at an arbitrary large but fixed neighborhood of the root of K n,n/α , then for large n, it looks like the corresponding neighborhood of the root of T . This suggests that if boundary conditions can be ignored we may be able to relate optimal manyto-one matchings on K n,n/α with an appropriate many-to-one matching on T (to be precise, an optimal involution invariant many-to-one matching on the PWIT). Furthermore, the local neighborhood of the root of K n,n/α is a tree for large enough n (with high probability). So we may expect belief propagation on K n,n/α to converge. Both these observations form the basis of the proof technique here, just like in the case of edge-cover in [15]. In the matching case these ideas were developed by Aldous in [1,4] to obtain the limit, and later by Salez and Shah [21] to show the applicability of belief propagation.
First, let us see that the expected cost of an optimal many-to-one matching on K n,n/α can be written in terms of the weights of the edges from the root: This is because the root, conditioned to have label o, is uniform among the n vertices of the subset A. We seek to relate the expectation on the right with the expectation corresponding to optimal many-to-one matchings on the PWIT.

Recursive distributional equation
For a labeled tree T with root φ, a many-to-one matching is a collection of edges such that vertices with label o have degree 1 and the vertices with label m have degree at least 1. The cost is the sum of the weights of the edges in the collection. This is well defined for a finite T . Write C(T ) for the minimum cost of many-to-one matching on T , and write C(T \ φ) for the minimum cost of many-to-one matching on T \ φ, which is the forest obtained after removing the root φ from T . Assume that the size of the tree is such that such many-to-one matchings exist. We derive a recursive relation involving these quantities.
If v is a child of the root φ of T let T v denote the subtree containing v and all its descendants, with v as the root. Irrespective of the label of φ, we have If φ has label o then among all matchings which match φ to a vertex u the minimum cost is C(T ) is the minimum of the above over all children of the root: Subtracting (8) from (9), we get If φ has label m then among all matchings which match φ to vertices in a nonempty set I the minimum cost is C(T ) is the minimum of the above over all nonempty I: Subtracting (8) from (11), we get the positive part of which is Now consider what happens when we take T to be the infinite tree T α (PWIT). The quantities C(T α ) and C(T α \ φ) become infinite, and the relations (10) and (12) are meaningless. Observe that conditional on the label of the root φ the subtrees T α,i , i ∼ φ are i.i.d., and the conditional distribution is same as the distribution of T m α (T o α ) given that φ has label o (m). With this distribution structure of the subtrees, (10) and (12) motivate us to write a two-step recursive distributional equation (RDE): an equation in two distributions µ o on R and µ m on R + that satisfy such that We will call (µ o , µ m ) the solution to the RDE (13).
A recursive distributional equation is the primary tool that leads to the optimal solution on the PWIT T α . The limit value of the expected optimal cost is obtained by calculations from the solution of the RDE. Recursive distributional equations come up in the related problems of random matching [4] and edge-cover. Aldous and Bandyopadhyay [5] present a survey of this type of RDEs and introduce several RDE concepts that we will use later.
If P(S) denotes the space of probability measures on a space S, a recursive distributional equation (RDE) is defined in [5] as a fixed-point equation on P(S) of the form where X j , j ≥ 1, are i.i.d. S-valued random variables having the same distribution as X, and are independent of the pair (ξ, N ), ξ is a random variable on some space, and N is a random variable on N ∪ {+∞}. g is a given S-valued function. A solution to the RDE is a common distribution of X, X j , j ≥ 1 satisfying (14). The two-step equation (13) can easily be expressed in the form (14) by composing the two steps of (13) into a single step. One can then solve for a fixed point distribution, say µ o , which then automatically yields µ m via one of the steps of (13).
We can use the relation (14) to construct a tree indexed stochastic process, say X i , i ∈ V, which is called a recursive tree process (RTP) [5]. Associate to each vertex i ∈ V, an independent copy (ξ i , N i ) of the pair (ξ, N ), and require X i to satisfy (14), there exists a stationary RTP, i.e., each X i is distributed as µ. Such a process is called an invariant RTP with marginal distribution µ.

Solution to the RDE
Write F and G for the complementary cdf of µ o and µ m respectively, i.e., F (t) = P {X o > t} and G(t) = P {X m > t}. F and G satisfy the following: For t ≥ 0, we can write where (17), we have w o = αe −wm/α . We can now rewrite (17) using w o as From (15) and (16), F and G are differentiable for almost all t ∈ R. Differentiating (15) and (16), we get These two equations imply that The validity of this equation can be extended to t = 0 because F is continuous for all t and G is right-continuous with left limits at t = 0. Taking t → ∞ in the above relation shows that the constant is equal to α, and so Substituting this in (20) results in the differential equation To get the boundary conditions take t = 0 in (16), (18), and (21) to get These imply G(0) = α − w o and α = w o + e −wo . Since α > 1, the latter equation determines a unique w o > 0. Integration of the differential equation (22) gives for t ≥ 0. Using (21), we get F for negative values of t as well. We thus obtain the following expressions for the distributions F and G: where γ = w o e wo . For future use, we note that the density of µ o is:

Optimal involution invariant many-to-one matching on the infinite tree
The RDE (13) guides us to construct a many-to-one matching on T α ; the following Lemma forms the first step.

Lemma 1. There exists a process
where T α is a modified PWIT with edge-lengths {ξ e , e ∈ E(T )}, and is a stochastic process satisfying the following properties.
(a) For each directed edge (u, v) ∈ − → E (T ), is directed away from the root of T , then X(u, v) has distribution F conditioned on v having label o, and distribution G conditioned on v having label m; F and G are as in (23) and (24).
the random variables X(u, v) and X(v, u) are independent. (d) For a fixed z > 0, conditional on the event that there exists an edge of length z at the root, say {φ, v z } , and conditional on the root having label m, the random variables X(φ, v z ) and X(v z , φ) are independent random variables having distributions F and G respectively; if the root is conditioned to have label o, the distributions are reversed.
Proof. This Lemma is the analogue of Lemma 5.8 of [3] and Lemma 1 of [15]. It has essentially the same proof, with appropriate changes to handle vertex labels. We omit the details.
We use the process {X( − → e )} to construct a many-to-one matching M opt on T α . For each By continuity of the edge-weight distribution, when v has label o, almost surely M opt (v) will be a singleton, whereas when v has label m, M opt (v) may have more than one element -e.g., all vertices y for which the difference ξ The following lemma proves that the collection M opt yields a consistent many-to-one matching -if v with label o picks w with label m, then w is the only m-labeled vertex that picks v.
Lemma 2. For any two vertices v, w of T α , we have As a consequence, Proof. Let w ∈ M opt (v). Suppose v has label o. Then This implies Conversely, if ξ (v, w) < X(v, w) + X(w, v) then using the definition of X(w, v) , it follows that w = arg min {ξ (v, y) − X(v, y) , y ∼ v}, and so w ∈ M opt (v). Now suppose v has label m. Then there are two cases.

13
which then yields ξ (v, w) < X(v, w) + X(w, v). Again, it is easy to see that if ξ (v, w) < X(v, w) + X(w, v) then w ∈ arg min (ξ (v, y) − X(v, y)) + , y ∼ v , which implies w ∈ M opt (v). Thus, we have established the first statement of the lemma, which is The condition on the right-hand side above is symmetric in v, w, and hence the second statement of the lemma is proved.

Evaluating the cost
The next result is merely to demonstrate that the limiting expected minimum cost can be computed using numerical methods. Define where Li 2 is the dilogarithm: Proof. Condition on the event that the root φ has label o. Now fix a z > 0, and condition on the event that there is a neighbor v z of φ with ξ (φ, v z ) = z. Call this event E z . If we condition a Poisson process to have a point at some location, then the conditional process on removing this point is again a Poisson process with the same intensity. This shows that under E z , X(φ, v z ) and X(v z , φ) have the distributions G and F respectively. They are also independent. Using these observations, the conditional expected cost can be written as (using the expression for the density of µ o (25)) As α → 1, the expected cost converges to π 2 /6, which is the limiting expected minimum cost of a standard matching on K n,n .

Optimality in the class of involution invariant many-to-one matchings
The local weak limit of a sequence of uniformly rooted random network satisfies a property called involution invariance [3,2]. Informally, involution invariance and its equivalent property called unimodularity specify that a distribution on the space of rooted networks G * does not depend on the choice of the root.
Refer to Section 5 of [15] for a discussion on involution invariance as it relates to our setting; a very general study of involution invariance is central to [2].
Let G * * be the space of isomorphism classes of connected geometric networks with a distinguished directed edge. An element of G * * (G, o, x) will denote the isomorphism class corresponding to a network G and distinguished vertex pair (o, x), where {o, x} is an edge in G. A probability measure µ on G * is called involution invariant if it satisfies the following for all Borel f : We can represent a many-to-one matching M on a graph G by a membership map on the edge-set of G: e → 1 {e∈M} . We call a random many-to-one matching on a random graph G involution invariant if the distribution of G with the above map on its edges is involution invariant. Following the argument in the last paragraph of Section 5 of [15] we can restrict our attention to involution invariant many-to-one matchings on the PWIT. We now show that our candidate many-to-one matching M opt has the minimum expected cost among such many-toone matchings.
Before we start to prove Theorem 5, we examine a consequence of Theorem 5. 15 Using that P {label(φ) = o} = α 1+α and P {label(φ) = m} = 1 1+α , we get The same holds for M opt too, and so the corollary follows from Theorem 5.
We will need some definitions for the proof of Theorem 5. For each directed edge (v, w) of T α , if v has label o and w has label m, define a random variable where N w is the set of neighbors of w. It is easy to see that this random variable can be written as y∼w,y =v (ξ (w, y) − X(w, y))1 {ξ(w,y)−X(w,y)<0} otherwise.
Note that (Y (v, w)) + = X(v, w). If v has label o and w has label m, define Suppose that E v∈M(φ) ξ (φ, v) < ∞. Then M(φ) is a finite set with probability 1 because {ξ (φ, v) , v ∼ φ} are points of a Poisson process of rate 1 or 1/α. For such a many-to-one matching M, define The following two lemmas will be used to prove Theorem 5.
Lemma 3. Let M be a many-to-one matching on the PWIT such that Then almost surely, Lemma 4. Let M be a many-to-one matching on the PWIT such that Proof of Theorem 5. If E v∈M(φ) ξ (φ, v) = ∞ the statement of the theorem is trivially true. Assume that it is finite. We are now in a position to apply Lemmas 3 and 4 as follows to get the result: Let us now complete the proofs of Lemmas 3 and 4.
If v = u, replacing y with u on the right-hand side, we get

This implies max
Thanks to the finite expectation assumption in the lemma, M(φ) is a finite set almost surely, and so y∈M(φ) X(φ, y) is finite almost surely. Rearrangement of (31) then yields We can write M opt (φ) as From (28) and (32), for any v / ∈ M opt (φ), we have It follows by rearrangement that

Proof of Lemma 4. Definẽ
We first prove (34). First, by involution invariance of M, we have Indeed, the left-hand side equals where µ M is the probability measure on G * corresponding to M. By involution invariance, this equals which is equal to the right-hand side of (36). Thanks to the finite expectation assumption of the lemma, we saw in the proof of Lemma 3 that the term max v / ∈M(φ),v∼φ Y (v, φ) is finite almost surely. Now observe that A(M) (respectivelyÃ(M)) is obtained by adding the almost surely finite random variable max v / ∈M(φ),v∼φ Y (v, φ) to the random variable which is the argument of the expectation in the left-hand side of (36) (respectively the right-hand side of (36)). Taking expectation and using the equality in (36), we get (34). Now we will prove (35). Suppose that φ has label o. Both M(φ) and M opt (φ) are singleton sets: suppose Consequently, This implies thatÃ (M opt ) = X(u * , φ) + X(u, φ) , ThusÃ(M) =Ã(M opt ). Now suppose that φ has label m. First condition on the event Observe that, under L 1 , ξ (φ, y) − X(φ, y) < 0, y ∼ φ if and only if y ∈ M opt (φ), and there are at least two such y. Then, by (26), Also, from (28) and (32), for any v ∼ φ, we have In particular, Combining (37) and (38) gives where min (2) stands for the second minimum. Therefore, Adding (39) and (40), and canceling X where the last inequality follows because there exists a v ∈ M(φ) \ M opt (φ) by virtue of our assumption that M(φ) = M opt (φ). ThusÃ(M) ≥Ã(M opt ) under L 2 as well.

Completing the lower bound
Theorem 6. Let M * n be the optimal many-to-one matching on K n,n/α . Then Proof. First observe that because the root, conditioned to have label o, is uniform among the n vertices of the subset A.
The theorem can now be proved from Corollary 1 by following the exact steps of the proof of Theorem 9 in [15].

Endogeny
As we now move to establish the upper bound, our aim will be to show that BP run for large enough iterations on K n,n/α gives us a many-to-one matching that has cost close to the optimal for large n. We will show this by relating BP on K n,n/α with BP on the PWIT through local weak convergence. It turns out that the X process of Lemma 1 generated from the RDE (13) arises as the limit of the message process on the PWIT (Theorem 10). This result depends on an important property of the X process called endogeny, which was defined in [5].
Definition. An invariant RTP with marginal distribution µ is said to be endogenous if the root variable X φ is almost surely measurable with respect to the σ-algebra σ ({(ξ i , N i ) | i ∈ V}).
Conceptually this means that the random variable at the root X φ is a function of the "data" {(ξ i , N i ) | i ∈ V}, and there is no external randomness. This measurability also implies that if we restrict attention to a large neighborhood of the root, the root variable X φ is almost deterministic given this neighborhood: the effect of the boundary vanishes.
This sort of long range independence allows the use of belief propagation in conjunction with local weak convergence. When we run BP updates on a graph the number of iterations correspond to the size of the neighborhood around a vertex involved in the computation of the messages at that vertex. Endogeny gives us a way to say that this information is sufficient to make decisions at each vertex and jointly yield an almost optimal solution.
For a general RDE (14), write T : P(S) → P(S) for the map induced by the function g on the space of probability measures on S. We now define a bivariate map T (2) : P(S × S) → P(S × S), which maps a distribution µ (2) ∈ P(S × S) to the joint distribution of are independent with joint distribution µ (2) on S × S, and the family of are independent of the pair (ξ, N ).
It is easy to see that if µ is a fixed point of the RDE then the associated diagonal measure µ := Law(X, X) where X ∼ µ is a fixed point of the operator T (2) . Theorem 11c of [5], reproduced below, provides a way of proving endogeny.
Theorem 7 (Theorem 11c of [5]). For a Polish space S, an invariant recursive tree process with marginal distribution µ is endogenous if and only if where µ ⊗ µ is the product measure.
We will use the above theorem to prove the following.
Theorem 8. The invariant recursive tree process of Lemma 1 arising from the solution to the RDE (13) is endogenous.
Proof. We first write an RDE for .
We now have the RDE where (a) (ξ i ) i≥1 is a Poisson process of rate 1/α on R + , (b) for each i ≥ 1, (ξ ij ) j≥1 is Poisson process of rate 1 on R + , (c) X o ij , i, j ≥ 1 are independent random variables having the same distribution as X o , and (d) the Poisson processes in (a) and (b) and the random variables X i,j in (c) are independent of each other.
Call the induced mapping from P(R + ) to P(R + ) as T . The fixed point of this map is the distribution µ, with complementary cdf F as in (23). We shall argue that Thus the invariant RTP associated with g o is endogenous. A similar argument shows that one can define an RDE g m and show that the corresponding invariant RTP associated with g m is endogenous. Recognizing that the layers of the two RTPs interlace, we easily see that the invariant RTP of Lemma 1 is endogenous. Let us now show (41). Consider the bivariate distributional equations

22
Let P (2) denote the space of joint complementary cdfs of R × R valued random variables. Write Γ (2) : P (2) → P (2) and ϕ (2) : P (2) → P (2) for the maps defined by Let F 0 (x, y) = F (x)F (y) for all x, y ∈ R, and define two sequences of joint distributions G (2) k , k ≥ 0 and F (2) k , k ≥ 0 : It is clear that the marginal distributions corresponding to F (2) k and G (2) k are F and G respectively for each k. By Theorem 7, the invariant RTP associated with g o is endogenous if F k , k ≥ 1 converges to the joint distribution of the degenerate random variable that has both components equal. This is equivalent to the condition F From the bivariate RDE, if the joint distribution of x < 0.

Similar analysis with the equation for
Define for k ≥ 0, Then We now prove certain relations for the functions β k and γ k .
Lemma 5. For all k ≥ 0: Proof. We will prove the Lemma by induction. f 0 (z) = F 2 (z) ≤ F (z) for all z implies that β 0 (x) ≥ 0 for all x. This implies g 0 (z) ≤ G(z) for all z, and hence γ 0 (x) ≥ 0 for all x. By (42), we have (43), (45), We have showed that the equations (47) hold for k = 0, now assume that they hold for some k ≥ 0.
By induction, the properties (47) hold for all k ≥ 0.
Because for each x, the sequences {β k (x), k ≥ 0} and {γ k (x), k ≥ 0} are decreasing, the sequences of functions {β k , k ≥ 0} and {γ k , k ≥ 0} converge pointwise to functions, say β and γ respectively, as k → ∞. The requirement for endogeny that f k (x) → F (x) for all x as k → ∞ reduces to showing that γ(x) = 0 for all x. We will show that both β and γ are identically 0.

Domain of attraction of the RDE map
Consider the maps Write T = Γϕ. We have F = ϕG, G = ΓF, and G = T G is the unique fixed point of T .
We are interested in the domain of attraction of G. Specifically, we investigate the convergence of T k G 0 for an arbitrary complementary cdf G 0 . We will see that the domain of attraction is all those G 0 satisfying ∞ 0 G 0 (z) dz < ∞. We proceed via a sequence of Lemmas. To get (e), observe that F 0 = ϕG 0 satisfies, for z ≤ 0, F 0 (z) ≥ F 0 (0) > 0 by the assumption that This implies, for t ≥ 0, which is integrable.
Since, after one step of the iteration, T G 0 has properties (a), (b), (c), (d), and (e), we might assume, without loss of generality, that G 0 itself has these properties.
Define the transformation . It has the following properties.
Proof. (a) Clearly G 0 (x) ≤ x, because the second term that is subtracted in (53) is ≥ 0. Equality holds if and only if the logarithm in (53) is ≤ 0 (b) When G 0 (x) ≥ G(0), we saw in (a) that G 0 (x) = x. Hence in this case G 0 (x) ≥ G(0) = G(x− G 0 (x)). We deal with the other case and the second part of (a) simultaneously by showing This follows from the implications: Lemma 8. Let G 0 satisfy the properties in Lemma 6. Let x ≥ 0.
The above statements also hold when all the inequalities are replaced by strict inequalities.
The inequality extends to x = M by right-continuity of G 0 and G at M and 0 respectively.
Also by strict monotonicity of G 0 (x) for x ≥ 0, all the inequalities can be made strict. Proof. By Lemma 8, we have for all z ∈ R as well, because for z < 0, G 0 (z) = G(z − M ) = 1. Using these in (51), for all t ∈ R, we have Then by (52), for x ≥ 0, we have where the last inequality follows from (16) to handle the case x < M . Lemma 8 now completes the proof. 27 The above Lemma shows that if G 0 is bounded then inf T k G 0 is increasing in k and sup T k G 0 + is decreasing in k, and so they converge to some m * and M * respectively (m * ≤ 0 ≤ M * ): inf T k G 0 ↑ m * and sup T k G 0 We will show that m * = M * = 0. (This will prove that both inf T k G 0 and sup T k G 0 converge to 0.) First we show that the terms inf T k G 0 and sup T k G 0 + are strictly monotone in k unless they are 0.
where κ > 1 (because G 0 (z) < 1 for z ≥ 0). Let x ≥ M . Using (55) to bound ϕG 0 (z) for z ∈ [−x, 0) and (57) to bound ϕG 0 (z) for z ≥ 0 in (52), we get where κ < 1. Thus T G 0 (x) < G(x−M ), and so by Lemma 8(b) for strict inequalities, T G 0 (x) < M . For x ∈ [0, M ), by Lemma 7(a), we have T G 0 (x) ≤ x < M . By continuity of T G 0 , the supremum of T G 0 (x) over any compact subset of [0, ∞) is strictly less than M . We now need to verify the strict inequality as x → ∞. To show this, we have the following: We use the same technique for the infimum. Suppose m = inf G 0 < 0. Fix b ∈ (m, 0). Using where δ < 1. Using (54) to bound ϕG 0 (z) for z < 0 and (59) to bound ϕG 0 (z) for z ≥ 0 in (52), we get for x ≥ 0 where δ > 1. This implies the strict inequality T G 0 (x) > m for all x ≥ 0 (Lemma 8(a)). We now verify the inequality for x → ∞, as in the case of supremum: Lemma 11. Suppose G 0 satisfies the properties in Lemma 6. Then T 2 G 0 is bounded.
For future reference, let us also note that, by (51), Lemma 12. Suppose G 0 is bounded and G 0 satisfies the properties in Lemma 6. Then T k G 0 is differentiable on (0, ∞) except possibly at one point, and the derivatives are uniformly integrable for k ≥ 3: , and T k G 0 (x) = 1, x ∈ (0, x k 0 ). The boundedness of G 0 (x) by M , say, implies by Lemma 9 that T k G 0 (x) are uniformly upper bounded by M for all x, and by Lemma 7(a) x k 0 ≤ M . For x > x k 0 , by definition of x k 0 , we have T k G 0 (x) ≤ G(0), in which case Consequently, .
By (52), As (61) and (62) hold respectively for T k G 0 and ϕT k G 0 for all k ≥ 2, the above equation gives This gives the uniform integrability of the derivatives.
Theorem 9. For any complementary cdf G 0 satisfying ∞ 0 G 0 (z) dz < ∞, The expectations also converge: Proof. By Lemmas 6, 9, 11, and 12, for k ≥ 5, T k G 0 are uniformly bounded and have derivatives that are uniformly integrable. We prove that the limits in (56) satisfy m * = M * = 0. The functions T k G 0 , k ≥ 1 are bounded and 1-Lipshitz on (0, ∞). By Arzela-Ascoli theorem this sequence is relatively compact with respect to compact convergence. There exists a subsequence that converges: The uniform continuity of the function y → log( α−y γy ) on every compact subset of (0, 1) implies that the transforms (which are uniformly bounded) converge (compact convergence): The uniform integrability of the derivatives in Lemma 12 says that for any > 0, there exists M sufficiently large such that T n k G 0 (x) ∈ ( T n k G 0 (M ) − , T n k G 0 (M ) + ) for all x > M and for all k ≥ 3. This extends the uniform convergence of T n k G 0 → G ∞ in the compact set [0, M ] to [0, ∞). The uniform convergence implies Using the dominated convergence theorem in (51) and (52), the convergence (63) results in the compact convergence The same arguments now conclude that Lemma 10 allows this only if m * = M * = 0, and so for all x ≥ 0 (Lemma 8), which by dominated convergence theorem gives convergence of the expectations:

Belief propagation
We now show that the belief propagation (BP) algorithm on K n,n/α converges to a many-to-one matching solution that is asymptotically optimal as n → ∞.

Convergence of BP on the PWIT
In this section we will prove that the messages on T converge, and relate the resulting matching with the matching M opt of Section 8.
The message process can essentially be written as where the initial messages X 0 T (v, v) are i.i.d. random variables (zero in the case of our algorithm; see (2)).
By the structure of T it is clear that -the initial distribution being fixed -the distribution of the messages X k T (v, v) , v ∈ V depends only on k and the label of the corresponding vertex v. Also, it can be seen from the analysis of RDE (13) in Section 7 that if we denote the complementary cdf of the distribution of messages at some step k by F 0 for vertices with label o and G 0 for vertices with label m, then after one update the respective complementary cdfs will be given by the maps (51) and (52): ϕG 0 and ΓF 0 . Considering the messages X k T (v, v) only for vertices v having label m, the common distribution G 0 at step k evolves to T G 0 at step k + 2 (T = Γϕ); the distribution ΓF 0 at step k + 1 changes to T ΓF 0 at k + 3. The distribution sequence generated by the BP iterations (for k ≥ k ) is obtained by interleaving the two sequences generated by applying the map T iteratively to F 0 and ΓF 0 . By Section 11, the distributions along both the odd and even subsequences converge to G, and consequently from the nature of the two step RDE, we conclude that the common distribution of X k T (v, v) , v having label m converges to G, and the common distribution of X k T (v, v) , v having label o converges to F as k → ∞.
By the endogeny (Section 10) of the RDE -considering the evolution of messages at vertices of the same label -the following result follows in the same way as Lemma 6 in [15]. We now prove that if the initial values are i.i.d. random variables with some arbitrary distribution (not necessarily one with complementary cdf G), then the message process (64) does indeed converge to the unique stationary configuration. Of course, the initial condition of particular interest to us is the all zero initial condition (2), but we will prove a more general result.
The following lemma will allow us to interchange limit and minimization while working with the updates on T . The proof of this Lemma follows from the proofs of Lemma 7 of [15] and Lemma 5.4 of [21].
We are now in a position to prove the required convergence.
Theorem 10. The recursive tree process defined by (64) with i.i.d. initial messages (according to labels) converges to the unique stationary configuration in the following sense. For every Also, the decisions at the root converge, i.e., P π k T (φ) = M opt (φ) → 0 as k → ∞. This can be proved by applying the proof of Theorem 13 of [15] and Theorem 5.2 of [21] separately to the vertices having label o and m.

12.2.
Convergence of the update rule on K n,n/α to the update rule on T We use from [21] the modified definition of local convergence applied to geometric networks with edge labels, i.e., networks in which each directed edge (v, w) has a label λ(v, w) taking values in some Polish space. For local convergence of a sequence of such labeled networks G 1 , G 2 , . . . to a labeled geometric network G ∞ , we add the additional requirement that the rooted graph isomorphisms γ n,ρ satisfy lim n→∞ λ Gn (γ n,ρ (v, w)) = λ G∞ (v, w) for each directed edge (v, w) in N ρ (G ∞ ). Now we view the configuration of BP on a graph G at the k th iteration as a labeled geometric network with the label on edge (v, w) given by the pair X k G (v, w) , 1 {v∈π k G (w)} . With this definition, our convergence result can be written as the following theorem.
Theorem 11. For every fixed k ≥ 0, the k th step configuration of BP on K n,n/α converges in the local weak sense to the k th step configuration of BP on T .
K n,n/α , X k K n,n/α (v, w) , 1 Proof. The proof of this theorem proceeds along the lines of the proof of Theorem 4.1 of [21]. Consider an almost sure realization of the convergence K n,n/α → T . For such convergence to hold, the labels of the roots must match eventually, i.e., almost surely label of the root of K n,n/α equals the label of the root of T for sufficiently large n.
Recall from Section 6 the enumeration of the vertices of T from the set V. We now recursively enumerate the vertices of K n,n/α with multiple members of V. Denote the root by φ. If v ∈ V denotes a vertex of K n,n/α , then (v.1, v.2, . . . , v.(h v − 1)) denote the neighbors of v in K n,n/α ordered by increasing lengths of the corresponding edge with v; h v is the number of neighbors of v ( n/α if v has label o and n if v has label m) . Then the convergence in (65) is shown if we argue that ∀ {v, w} ∈ E X k K n,n/α (v, w) P − → X k T (v, w) and ∀v ∈ V π k K n,n/α (v) It can be verified that the sum on the right-hand side in the above equation is an integrable random variable. Equation (66) and the dominated convergence theorem give Involution invariance gives (see Corollary 1) where the last equality follows from Theorem 4. By Theorem 11 and Lemma 15, using the definition of local weak convergence, we have v∈π k K n,n/α (φ) We now apply the arguments that lead to (67) to π k K n,n/α (φ), and obtain v ∈ π k K n,n/α (φ) ⇒ ξ K n,n/α (φ, v) ≤ ξ K n,n/α (φ, 1) + ξ K n,n/α (v, v.1) .
Using local weak convergence, we can see that Since is arbitrary, we get the upper bound lim sup n→∞ n −1 E M n ≤ c * α .
We have the lower bound from Theorem 6. This completes the proof of Theorem 1.
Observe that for any > 0, there exist K and N such that for all k ≥ K and n ≥ N , we have n −1 E    e∈M(π k K n,n/α ) ξ K n,n/α (e)    ≤ c * α + .
Thus for large n the BP algorithm gives a solution with cost within of the optimal value in K iterations. In an iteration, the algorithm requires O(n) computations at every vertex. This gives an O(K n 2 ) running time for the BP algorithm to compute an -approximate solution.