Brought to you by:
Paper: Interdisciplinary statistical mechanics

Selberg integrals in 1D random Euclidean optimization problems

, , and

Published 6 June 2019 © 2019 IOP Publishing Ltd and SISSA Medialab srl
, , Citation Sergio Caracciolo et al J. Stat. Mech. (2019) 063401 DOI 10.1088/1742-5468/ab11d7

1742-5468/2019/6/063401

Abstract

We consider a set of Euclidean optimization problems in one dimension, where the cost function associated to the couple of points x and y  is the Euclidean distance between them to an arbitrary power , the points are chosen at random with uniform measure. We derive the exact average cost for the random assignment problem, for any number of points using Selberg's integrals. Some variants of these integrals enable the exact average cost for the bipartite travelling salesman problem to be derived.

Export citation and abstract BibTeX RIS

1. Selberg integrals

Euler Beta integrals

Equation (1.1)

with $\alpha, \beta \in \mathbb{C}$ and $\Re(\alpha)>0$ and $\Re(\beta)>0$ , were generalized in the 1940s by Selberg [1]

Equation (1.2)

where

Equation (1.3)

with $\alpha, \beta, \gamma \in \mathbb{C}$ and $\Re(\alpha)>0$ and $\Re(\beta)>0$ , $\Re(\gamma) > -\min(1/n, \Re(\alpha)/(n-1),$ $\Re(\beta)/(n-1))$ , see chapter 8 of [2]. Indeed (1.2) reduces to (1.1) when n  =  1.

These integrals have been used by Bombieri [3] to prove what is referred to as the Mehta–Dyson conjecture [47] in random matrix theory, which is

Equation (1.4)

but many applications were found in the context of conformal field theories [8] and exactly solvable models, for example in the evaluation of the norm of Jack polynomials [9]. In the following we shall also have need of an extension of Selberg integrals [2, section 8.3]

Equation (1.5)

In this paper we present one more application of Selberg integrals in the context of Euclidean random combinatorial optimization problems in one dimension. While these problems have been well understood in the case in which the entries of the cost matrix, that is the costs between couple of points, are independent and equally distributed random variables. Thanks to the methods developed in the context of statistical mechanics of disordered systems [10], the interesting physical case in which the cost matrix is a Euclidean random matrix [11] is much less studied because of the difficulties induced by correlations among the entries. The possibility of dealing with correlations as a perturbation has been investigated [12] and is effective for very large dimensions of the Euclidean space. On the contrary, detailed results, at least in the limit of an asymptotic number of points, have been recently obtained in one dimension [1318] and in two dimensions [1923]. Note that related problems in dimensions higher than one have also been rigorously studied [2426].

We shall concentrate here in the one dimensional case, with the aim of proving new exact results, which we have obtained by exploiting the Selberg integrals. Let us consider N points chosen at random in the interval $[0,1]$ . Let us order them in such a way that $x_i < x_{i+1}$ for $i=1, \dots, N-1$ . The probability of finding the kth point in the interval $[x, x+{\rm d}x]$ is

Equation (1.6)

To every couple xi and xj we associate a cost which depends only on their Euclidean distance, in the form

Equation (1.7)

for some p   >  1, so that the function f  is an increasing and convex function of its argument.

The new results presented in this paper regard finite-size systems. The paper is organized as follows. In section 2 and section 3 we present the evaluation of the average cost for the assignment and the travelling salesman problem (TSP), respectively, for all values of the parameter p   >  1 and for all numbers of points. In section 4 we evaluate under the same conditions, what is lost in the average cost if we do not match the kth point in the best way for each k. This is needed if we want to compute upper bounds for the average cost of the two-factor problem.

2. Average cost in the assignment

Consider two sets of N points chosen at random in the interval $[0,1]$ , the red $\{r_i\}_{\in[N]}$ , and blue $\{b_i\}_{i\in[N]}$ , points. In the assignment problem we need to find the one-to-one correspondence between the red and blue points, which is a permutation $\sigma$ in the symmetric group of N elements ${{\mathcal S}}_N$ , such that the total cost

Equation (2.1)

is minimized.

Once both sets of points have been ordered when p   >  1, the optimal solution is the identity permutation [13, 27]. It follows that the optimal cost is

Equation (2.2)

By using (1.6) and the Selberg integral (1.2)

Equation (2.3)

we get that the average of the kth contribution is given by

Equation (2.4)

and therefore we get the exact result

Equation (2.5)

where we made repeated use of the duplication and Euler's inversion formula for $\Gamma$ -functions

Equation (2.6a)

Equation (2.6b)

The exact result (2.4) was known only in the cases $p=2, 4$ where the computation can be carried out by using the Euler Beta function (1.1), see [15]. With $p \neq 2, 4$ , the average optimal cost has been computed only in the limit of large N [15] at the order o(N−1). From (2.4) we easily get the next correction in 1/N

Equation (2.7)

3. Average cost in the TSP

Again consider two sets of N points chosen at random in the interval $[0,1]$ , the red $\{r_i\}_{\in[N]}$ , and the blue $\{b_i\}_{i\in[N]}$ , points. In the TSP we have to choose a closed path that visits all the 2N points only once, alternating red and blue points along the way [16],that is two permutations $\sigma$ and $\pi$ in the symmetric group of N elements ${{\mathcal S}}_N$ , such that the total cost

Equation (3.1)

is minimal, where $\tau$ is the shifting permutation, $\tau(i) = i+1$ for $i\in[N-1]$ and $\tau(N)=1$ .

When p   >  1 the optimal solution is given by the permutations [16]

Equation (3.2)

and

Equation (3.3)

once both sets of points have been ordered. It follows that the optimal cost is

Equation (3.4)

By using (1.6) and the generalized Selberg integral (1.5)

Equation (3.5)

we get

Equation (3.6)

from which we obtain

Equation (3.7)

In addition

Equation (3.8)

Finally, the average optimal cost for every N and every p   >  1 is

Equation (3.9)

For p   =  2, this reduces to

Equation (3.10)

which was already known [16]. For generic p   >  1 only the asymptotic behaviour for large N was obtained in [16]

Equation (3.11)

which agrees with the large N limit of equation (3.9).

4. Cutting shoelaces: the two-factor problem

In the two-factor, or two-assignment problem we carry out minimization on the set of all the spanning subgraphs in which each vertex is shared by two edges. In other words, this problem corresponds to a loop covering of the graph, i.e. we relax the unique cycle condition that we had in the TSP. As shown in [17], for every value of N, the optimal two-factor solution is always composed of a union of shoelace loops with only two or three points of each colour. As a consequence, different from the assignment and TSP cases, different instances of the disorder can have different spanning subgraphs that minimize the cost function. In particular these spanning subgraphs can always be obtained by 'cutting' the optimal TSP cycle (see figure 1) in a way that depends on the specific instance. This 'instance dependence' makes the computation of the average optimal cost particularly difficult. However, one can show that the average optimal cost of the two-factor problem is bounded from above by the TSP average optimal cost and from below by double the assignment average optimal cost. Since in the large N limit these two quantities coincide, one obtains immediately the large N limit of the average optimal cost of the two-factor problem. Unfortunately, this approach is not useful for a finite-size system, but we can use Selberg integrals to obtain an upper bound. We can compute the average cost obtained by 'cutting' the TSP optimal cycle in specific ways. When we cut the optimal TSP into two different cycles at the k-position we gain an average cost

Equation (4.1)
Figure 1.

Figure 1. Graphical representation of the cutting operation which produces for the optimal TSP cycle (top) a possible optimal solution for the two-factor problem (bottom). Here we have represented the N  =  4 case, where the cutting operation is unique. Notice that blue and red points are selected at intervals, but here they are represented equi-spaced on two parallel lines to improve visualization.

Standard image High-resolution image

Once more, by using (1.6) and the generalized Selberg integral (1.5), we obtain

Equation (4.2)

and similarly

Equation (4.3)

Their sum is

Equation (4.4)

For p   =  2 this quantity is in agreement with what we got in [17]

Equation (4.5)

For $p\neq2$ , Ek depends on k. In particular, for 1  <  p   <  2 the cut near 0 and 1 are (on average) more convenient than those near the center. For p   >  2 the reverse is true (see figure 2). Notice that for p   =  2 one can see that the best upper bound for the average optimal cost is given by summing the maximum number of cuts that can be done on the optimal TSP cycle. For $p\neq2$ , however, this sum does not give a simple formula.

Figure 2.

Figure 2. Plot of $E_k^{(\,p)}$ given in equation (4.4) for various values of p : the green line is calculated where p   =  2.1; the orange where p   =  2; and the blue one where p   =  1.9. In all cases we take N  =  100.

Standard image High-resolution image

5. Conclusions

In this work we have been able to obtain some finite-size properties of a set of bipartite Euclidean optimization problems: the assignment, the bipartite TSP and the bipartite two-factor problems by extensive use of the Selberg integrals. This confirms once more their importance and their wide application range. Interestingly, we have confirmed that the one dimensional two-factor problem is more subtle to deal with than the other problems considered here, because even using Selberg integrals, we have not been able to find a finite-size upper-bound for the average optimal cost in the generic $p\neq2$ case. However, our approach allowed us to understand the source of this difficulty: for 1  <  p   <  2, the shortest loops tend to concentrate at the border of the $\left[ 0, 1\right]$ interval, while for p   >  2 they tend to concentrate on the center of it.

Please wait… references are loading.