Abstract
We consider a set of Euclidean optimization problems in one dimension, where the cost function associated to the couple of points x and y is the Euclidean distance between them to an arbitrary power , the points are chosen at random with uniform measure. We derive the exact average cost for the random assignment problem, for any number of points using Selberg's integrals. Some variants of these integrals enable the exact average cost for the bipartite travelling salesman problem to be derived.
Export citation and abstract BibTeX RIS
1. Selberg integrals
Euler Beta integrals
with and and , were generalized in the 1940s by Selberg [1]
where
with and and , , see chapter 8 of [2]. Indeed (1.2) reduces to (1.1) when n = 1.
These integrals have been used by Bombieri [3] to prove what is referred to as the Mehta–Dyson conjecture [4–7] in random matrix theory, which is
but many applications were found in the context of conformal field theories [8] and exactly solvable models, for example in the evaluation of the norm of Jack polynomials [9]. In the following we shall also have need of an extension of Selberg integrals [2, section 8.3]
In this paper we present one more application of Selberg integrals in the context of Euclidean random combinatorial optimization problems in one dimension. While these problems have been well understood in the case in which the entries of the cost matrix, that is the costs between couple of points, are independent and equally distributed random variables. Thanks to the methods developed in the context of statistical mechanics of disordered systems [10], the interesting physical case in which the cost matrix is a Euclidean random matrix [11] is much less studied because of the difficulties induced by correlations among the entries. The possibility of dealing with correlations as a perturbation has been investigated [12] and is effective for very large dimensions of the Euclidean space. On the contrary, detailed results, at least in the limit of an asymptotic number of points, have been recently obtained in one dimension [13–18] and in two dimensions [19–23]. Note that related problems in dimensions higher than one have also been rigorously studied [24–26].
We shall concentrate here in the one dimensional case, with the aim of proving new exact results, which we have obtained by exploiting the Selberg integrals. Let us consider N points chosen at random in the interval . Let us order them in such a way that for . The probability of finding the kth point in the interval is
To every couple xi and xj we associate a cost which depends only on their Euclidean distance, in the form
for some p > 1, so that the function f is an increasing and convex function of its argument.
The new results presented in this paper regard finite-size systems. The paper is organized as follows. In section 2 and section 3 we present the evaluation of the average cost for the assignment and the travelling salesman problem (TSP), respectively, for all values of the parameter p > 1 and for all numbers of points. In section 4 we evaluate under the same conditions, what is lost in the average cost if we do not match the kth point in the best way for each k. This is needed if we want to compute upper bounds for the average cost of the two-factor problem.
2. Average cost in the assignment
Consider two sets of N points chosen at random in the interval , the red , and blue , points. In the assignment problem we need to find the one-to-one correspondence between the red and blue points, which is a permutation in the symmetric group of N elements , such that the total cost
is minimized.
Once both sets of points have been ordered when p > 1, the optimal solution is the identity permutation [13, 27]. It follows that the optimal cost is
By using (1.6) and the Selberg integral (1.2)
we get that the average of the kth contribution is given by
and therefore we get the exact result
where we made repeated use of the duplication and Euler's inversion formula for -functions
The exact result (2.4) was known only in the cases where the computation can be carried out by using the Euler Beta function (1.1), see [15]. With , the average optimal cost has been computed only in the limit of large N [15] at the order o(N−1). From (2.4) we easily get the next correction in 1/N
3. Average cost in the TSP
Again consider two sets of N points chosen at random in the interval , the red , and the blue , points. In the TSP we have to choose a closed path that visits all the 2N points only once, alternating red and blue points along the way [16],that is two permutations and in the symmetric group of N elements , such that the total cost
is minimal, where is the shifting permutation, for and .
When p > 1 the optimal solution is given by the permutations [16]
and
once both sets of points have been ordered. It follows that the optimal cost is
By using (1.6) and the generalized Selberg integral (1.5)
we get
from which we obtain
In addition
Finally, the average optimal cost for every N and every p > 1 is
For p = 2, this reduces to
which was already known [16]. For generic p > 1 only the asymptotic behaviour for large N was obtained in [16]
which agrees with the large N limit of equation (3.9).
4. Cutting shoelaces: the two-factor problem
In the two-factor, or two-assignment problem we carry out minimization on the set of all the spanning subgraphs in which each vertex is shared by two edges. In other words, this problem corresponds to a loop covering of the graph, i.e. we relax the unique cycle condition that we had in the TSP. As shown in [17], for every value of N, the optimal two-factor solution is always composed of a union of shoelace loops with only two or three points of each colour. As a consequence, different from the assignment and TSP cases, different instances of the disorder can have different spanning subgraphs that minimize the cost function. In particular these spanning subgraphs can always be obtained by 'cutting' the optimal TSP cycle (see figure 1) in a way that depends on the specific instance. This 'instance dependence' makes the computation of the average optimal cost particularly difficult. However, one can show that the average optimal cost of the two-factor problem is bounded from above by the TSP average optimal cost and from below by double the assignment average optimal cost. Since in the large N limit these two quantities coincide, one obtains immediately the large N limit of the average optimal cost of the two-factor problem. Unfortunately, this approach is not useful for a finite-size system, but we can use Selberg integrals to obtain an upper bound. We can compute the average cost obtained by 'cutting' the TSP optimal cycle in specific ways. When we cut the optimal TSP into two different cycles at the k-position we gain an average cost
Once more, by using (1.6) and the generalized Selberg integral (1.5), we obtain
and similarly
Their sum is
For p = 2 this quantity is in agreement with what we got in [17]
For , Ek depends on k. In particular, for 1 < p < 2 the cut near 0 and 1 are (on average) more convenient than those near the center. For p > 2 the reverse is true (see figure 2). Notice that for p = 2 one can see that the best upper bound for the average optimal cost is given by summing the maximum number of cuts that can be done on the optimal TSP cycle. For , however, this sum does not give a simple formula.
Download figure:
Standard image High-resolution image5. Conclusions
In this work we have been able to obtain some finite-size properties of a set of bipartite Euclidean optimization problems: the assignment, the bipartite TSP and the bipartite two-factor problems by extensive use of the Selberg integrals. This confirms once more their importance and their wide application range. Interestingly, we have confirmed that the one dimensional two-factor problem is more subtle to deal with than the other problems considered here, because even using Selberg integrals, we have not been able to find a finite-size upper-bound for the average optimal cost in the generic case. However, our approach allowed us to understand the source of this difficulty: for 1 < p < 2, the shortest loops tend to concentrate at the border of the interval, while for p > 2 they tend to concentrate on the center of it.