Journal of Graph Algorithms and Applications a Binomial Distribution Model for the Traveling Salesman Problem Based on Frequency Quadrilaterals 412 Wang & Remmel a Binomial Distribution Model for Tsp

We study the symmetric traveling salesman problem via frequency graphs. One computes the frequency of edges by computing how many times an edge occurs in an optimal path involving four vertices. The edges that are in the Optimal Hamiltonian Cycle (OHC) have a higher frequency than most edges that are not in the OHC and thus edges with a low frequency can safely be ignored when searching for the optimal solution. A binomial distribution model is introduced for the symmetric traveling salesman problem based on frequency quadrilaterals. When the frequency of each edge is computed with N frequency quadrilaterals, our model suggests that the minimum frequency of an edge in the OHC is Fmin = (min +1)N where 4 3 + 4 3(n−2) < min < 4. This suggests a heuristic to reduce the number of edges that need to be considered in the search for the OHC which is to keep only those edges whose frequencies are ≥ Fmin. We explore this heuristic in several real-world examples.


Introduction
We consider the symmetric traveling salesman problem (T SP ).That is, we are given the complete graph K n on the vertices {1, . . ., n} such that there is a distance function d such that for any x, y ∈ {1, . . ., n} and x = y, d(x, y) = d(y, x) is the distance between x and y.The goal is to find the optimal Hamiltonian cycle (OHC) with respect to this distance function.That is, we want to find a permutation σ = (σ 1 . . .σ n ) of 1, . . ., n such that σ 1 = 1 and d(σ) := d(σ n , 1) + n−1 i=1 d(σ i , σ i+1 ) is as small as possible.The T SP has been extensively studied to find special classes of graphs where polynomial-time algorithms exist for either finding an exact solution, that is, finding an OHC, or finding an approximate solution, that is, finding a permutation τ of the vertices which gives a Hamiltonian cycle such that d(τ ) ≤ cd(σ) where σ is the OHC and c is some fixed constant.We will call algorithms that find exact solutions exact algorithms and algorithms that find approximate solutions approximation algorithms.There are a number of special classes of graphs where one can find the OHC in a reasonable computation time, see [13].
Karp [17] has shown that the question of whether a graph has a Hamiltonian cycle is N P -complete which implies that T SP is N P -hard.
The computation time of exact algorithms is O(a n ) for some a > 1 for the general T SP .For example, Held and Karp [14] and independently Bellman [3] gave a dynamic programming approach to solve the T SP that required O(n 2 2 n ) time.Integer programming techniques, such as either branch-and-bound [7,10] or cutting-plane [18,2], have been able to solve examples of the T SP with thousands of nodes.In 2006, a VLSI application (Euclidean T SP ) with 85,900 nodes has been solved with an improved cutting-plane method on a computer system with 128 nodes [2].
On the other hand, the computation times of approximation algorithms and heuristics have been significantly improved [20].For example, the MST approximation algorithm [8] and Christofides' approximation algorithm [16] are able to find the 2-approximation and 3  2 -approximation in time O(n 2 ) and O(n 3 ), respectively, for metric T SP .In 2011, Mömke and Svensson [22] gave a 1.461approximation algorithm for metric graphs with respect to the Held-Karp lower bound.In most cases, the LKH heuristic [15] can generate "high quality" solutions within 5% of the optimum in nearly O(n 2.2 ) time.However, these approximation algorithms and heuristics are not guaranteed to find the OHC in polynomial time.
In recent years, researchers have developed polynomial-time algorithms to solve the T SP for sparse graphs.In sparse graphs, the number of Hamiltonian cycles (HC) is greatly reduced.For example, Sharir and Welzl [25] proved that in a sparse graph of average degree d, the number of HCs is less than e( d 2 ) n where e is the base of the natural logarithm.Gebauer [12] gave a lower bound for the number of HCs roughly as ( d 2 ) n for a sparse graph of average degree d.In addition, Björklund [4] proved that the T SP in graphs with bounded degree could be solved in time O((2 − ) n ), where depends on the maximum degree of a vertex in the graph.For a cubic graph, Eppstein [11] introduced an algorithm to solve the T SP with running time O(1.260 n ).This run time was improved by Liśewicz and Schuster [19] to O(1.2553 n ).Aggarwal, Garg and Gupta [1] and Boyd, Sitters, Van der Ster and Stougie [6] independently gave two approximation algorithms to solve the T SP with approximation factor 4  3 for metric cubic graphs.Mömke and Svensson [22] also proved one 4  3 -approximation for degree three bounded and claw-free graphs with respect to the Held-Karp lower bound.For cubic connected graphs, Correa, Larré and Soto [9] proved that the approximation threshold of the T SP in cubic graphs was strictly below 4 3 .For the general bounded-genus graphs, Borradaile, Demaine and Tazari [5] gave a polynomial-time approximation scheme for T SP .In the case of the asymmetric version of the T SP , Gharan and Saberi [23] designed constantfactor approximation algorithms for the T SP for planar graphs with bounded genus where the constant-factor is 22.51(1 + 1 n ).Thus, whether one is trying to find exact solutions or approximate solutions to the T SP , one has a variety of more efficient algorithms available if one can reduce a given instance of T SP to finding the OHC in a sparse graph.
In this paper, we use a binomial distribution model based on frequency quadrilaterals to convert a complete graph (or dense graph) into a sparse graph for T SP .The sparse graphs generally have O(|V |) or O(|V | ln(|V |)) edges.In addition, if the resulting graph has bounded degree or genus or is planar or k-edge connected, then there are even more efficient algorithms available to find exact or approximate solutions to the T SP .
In previous work [26,27,30], the first author introduced frequency graphs as a way to reduce the number of edges that one has to consider to find the OHC.The basic idea of frequency graphs is the following.Suppose that we are given a sequence of ).We assume that d( v σ ) all have different values, so there is an optimal path v σ = (v σ1 , . . ., v σ k ) of the k vertices (or k − 1 edges) connecting v 1 and v k using the intermediate vertices v 2 , . . ., v k−1 which makes d( v σ ) as small as possible.We will call such a path the optimal k-vertex path for (v 1 , . . ., v k ).In general, if we are given a set of k vertices {v 1 , . . ., v k }, we have k 2 ways to pick the end points of a k-vertex path using these vertices so that there are k 2 optimal k-vertex paths that arise from the set {v 1 , . . ., v k }.Let OP k denote the set of all optimal k-vertex paths.Then the frequency f (x, y) of an edge (x, y) in K n with the distance function d(x, y) is the number of optimal k-vertex paths which contain (x, y) as an edge.Our intuition is that if (x, y) is an edge in the OHC for K n with the distance function d(x, y), then its frequency is likely to be much higher than the average frequency.This has been born out by studying many real-world T SP instances, see [26,27,30].This suggests that we can safely eliminate the edges of low frequency below the average frequency and still keep the OHC intact.The hope is that by eliminating the edges of low frequency, we can be left with a sparse graph which has O(n log(n)) edges so that the techniques for either finding or approximating the OHC for sparse graphs can be applied.The questions then become what The outline of this paper is as follows.First in Section 2, we shall introduce the concept of frequency quadrilaterals in the case where k = 4.In Section 3, we shall first discuss the combinatorics of frequency quadrilaterals.Then we shall introduce our binomial distribution model and study the combinatorics of frequency quadrilaterals for edges in the OHC.In Section 4, we shall discuss some heuristic estimates of various parameters of frequency graphs under our binomial distribution model.In Section 5, we will compare our heuristic estimates of such parameters to the actual values of those parameters that are computed for some graphs in the database [24].

The frequency quadrilateral
Suppose that we are given 4 vertices {A, B, C, D} in K n .Since we are assuming that the vertex set of K n = {1, 2, . . ., n}, there is a total order on the elements in {A, B, C, D} induced by the natural ordering on {1, . . ., n}, which we will assume to be A < B < C < D. K n restricted to {A, B, C, D} gives us the graph pictured in Figure 1.We will list the pairs of endpoints according to their lexicographic order to find the six optimal 4-vertex paths.
For any pair of vertices U, V ∈ {A, B, C, D}, we shall write U V for d(U, V ).There are 4  2 = 6 ways to pick end points of 4-vertex paths using {A, B, C, D}, namely, (I) A, B, (II) A, C, (III) A, D, (IV) B, C, (V) B, D, and (VI) C, D. Then, for example, there are two 4-vertex paths with end points A, B, namely, (A, C, D, B) and (A, D, C, B).To pick the optimal 4-vertex path with endpoints A, B, we must compare AC + CD + DB and AD + DC + CB.Since we are assuming that we are in the symmetric T SP , we know that CD = DC.Thus, we must compare AC + BD and AD + BC to determine which one of the two 4vertex paths (A, C, D, B) and (A, D, C, B) is the optimal 4-vertex path.Below we list the comparisons that we must make for each of the cases (I)-(VI).It is easy to see that our comparisons involve only three sums of distances, namely, (1) AB + CD, (2) AC + BD, and (3) AD + BC.Thus, the relative frequency graph for the quadrilateral ABCD depends only on the relative order of (1), (2), and (3).For example, if AB +CD < AC +BD < AD +BC, then the optimal 4-vertex paths for our six possible end points are given in the following table.

Case
End points Inequality formula Optimal 4-vertex path These choices lead to the relative frequency graph for the quadrilateral ABCD pictured in Figure 2 (a).
On the other hand, if AB + CD < AD + BC < AC + BD, then the optimal 4-vertex paths for our six possible end points are given in the following table.

Case
End points Inequality formula Optimal 4-vertex path These choices lead to the relative frequency graph for the quadrilateral ABCD pictured in Figure 2 (b).One can carry out similar computations for the other 4 possible orderings of AB + CD, AC + BD, and AD + BC.They are called four-vertex and three-line inequalities [29] to derive the optimal 4-vertex paths for any given four vertices A, B, C, D in K n .We list the resulting frequency graphs in each case according to the corresponding four-vertex and three-line inequalities, see Figure 2 (c)-(f).
We will study the properties of frequency graphs on K n .Let us denote by N z the number of frequency graphs in K n such that the relative frequency graph is of type (z), for (z) ∈ {(a), (b), (c), (d), (e), (f )} in Figure 2 and note that We shall call this model the binomial distribution model.As we will see, this binomial distribution model suggests a heuristic on what bounds on the frequency graph should be used to eliminate edges according to their frequency.

The binomial distribution model
Our binomial distribution model for frequency graphs is to consider picking for each set of four vertices A, B, C, D in K n a total order on the sums of the distances AD + BC, AB + CD, and AC + BD at random.Then we want to study the properties of frequency graphs that arise from such random choices.For example, in cases (a)-(f) of Figure 2, we picture the six relative frequency graphs that arise from the six ways to put a total order on the AD + BC, AB + CD, and AC + BD.Note that of the six possible frequency quadrilaterals that are possible for ABCD, one sees that the possible frequency for any given edge e is 1, 3, or 5.Moreover, e is assigned frequency 1 in 2 cases, frequency 3 in 2 cases, and frequency 5 in 2 cases.Thus, the average frequency assigned to e over these six frequency graphs is 3.For i ∈ {1, 3, 5}, let p i (e) be the probability that edge e is assigned frequency i in a frequency quadrilateral ABCD containing the edge e.Clearly, p 1 (e) = p 3 (e) = p 5 (e) = 1 3 .More generally, for any subset S ⊆ {1, 3, 5}, we let p S (e) denote the probability that edge e is assigned any frequency i where i ∈ S in a frequency quadrilateral ABCD containing the edge e.Then p {1,3} (e) = p {1,5} (e) = p {3,5} (e) = 2  3 and p {1,3,5} (e) = 1.Next we want to study the expected frequency of edges e that are in the OHC under such a probability model.
First, suppose that A, B, C, D are consecutive vertices in the OHC.In that case, we know that path (A, B, C, D) must have smaller weight than path (A, C, B, D) which implies that AB + BC + CD < AC + BC + BD and hence AB + CD < AC + BD.In the three frequency quadrilaterals in Figure 2 for the quadrilateral ABCD where AB + CD < AC + BD, we see that the frequencies assigned to the edges (A, B), (B, C), and (C, D) are 5, 1, 5, 5, 3, 5, and 3, 5, 3, respectively.Thus, the average frequency of (A, B) is 13  3 , the average frequency of (B, C) is 3, and the average frequency of (C, D) is 13  3 .For any given edge e in the OHC, e will be an edge in three optimal 4-vertex paths with consecutive edges in the OHC so that the total contribution to its frequency from the three optimal 4-vertex paths in OHC is 13  3 + 3 + 13 3 = 35 3 = 11 2 3 as opposed the expected value of 9 for an edge that appears in 3 random quadrilaterals.
Second, suppose that (A, B) and (C, D) are two vertex-disjoint edges on the OHC.That is, we have the situation pictured in Figure 3.In this situation, we note that there is a Hamiltonian cycle which starts at vertex A and follows that OHC in clockwise direction to vertex D, then uses the edges (B, D), then follows the OHC in a counter-clockwise direction to vertex C, and then uses the edge (A, C).Since this Hamiltonian cycle is not the OHC, it must be the case that AB + CD < AC + BD.
If one looks at the three quadrilaterals ABCD for which this inequality holds in Figure 2, one finds that the frequencies for the edge (A, B) are 5, 5 and 3, respectively, so that the summed frequency for the 3 quadrilaterals of this form is 13 as opposed to the expected value of 9. Note that there are n edges in the OHC.Since we are assuming that the edges (A, B) and (C, D) have no vertices in common, then (C, D) cannot be one of the edges adjacent to (A, B) in the OHC.Hence we have n − 3 choices for (C, D).Thus, for an edge (A, B) ∈ OHC, it is contained in at least n − 3 such quadrilaterals which are composed of (A, B) and the other n − 3 non-adjacent edges in the OHC.

A B C D OHC
Note for any given edge (A, B), (A, B) is part of n−2 2 quadrilaterals in K n .Using the OHC, we have found at least n − 3 pairs (C, D) where the frequency of (A, B) relative to the quadrilateral ABCD is either 3 or 5. Assuming that for the remaining choices of quadrilaterals, the probability p 1 (e) that e = (A, B) has frequency 1 in each quadrilateral is 1  3 , the probability p 3 (e) that (A, B) has frequency 3 in each quadrilateral is 1  3 , and the probability p 5 (e) that (A, B) has frequency 5 in each quadrilateral is 1  3 , we see that .
Note that this is a very conservative lower bound since we did not take into account the 3 possibilities where e is part of three consecutive edges in the OHC.Thus, we shall assume that the probabilities p {3,5} (e) and p {1} (e) for an edge e in the OHC are equal to the formula (1). and We let X denote the random variable which gives the number of frequency quadrilaterals where the frequency of edge e = (A, B) is either 5 or 3. Our JGAA, 20(2) 411-434 (2016) 419 assumptions mean that if we select N quadrilaterals from the n−2 2 quadrilaterals which contain the pair (A, B), then X has the binomial distribution X B 0 (N, p {3,5} (e)).In such a situation, the probability P (X = m) that X = m is given by formula (2).
For a binomial distribution model, the function P (X = m) is monotone increasing if m < (N + 1)p {3,5} − 1 and is monotone decreasing if m > (N + 1)p {3,5} .Thus, the maximum probability P 0 is achieved for an integer m when m equals or Thus, if we select N frequency quadrilaterals containing the edge (A, B) at random, we see that the case where there are m 0 frequency quadrilaterals with the frequency of edge (A, B) being greater than or equal to 3 has the maximum probability.In these m 0 frequency quadrilaterals, we assume that the number of frequency quadrilaterals with the frequency of (A, B) equal to 5 also has a binomial distribution X B(m 0 , δ 0 ) where 0 ≤ δ 0 ≤ 1 is the ratio between the number of frequency quadrilaterals with the frequency of (A, B) equal to 5 and m 0 .Thus, if X = δ 0 (m 0 + 1) or δ 0 (m 0 + 1) − 1, the maximum probability will be obtained.For any edge e in the OHC, e is contained in n − 3 frequency quadrilaterals consisting of the vertices of e and the vertices of another edge f in the OHC and we assume that in those frequency quadrilaterals, e has equal probability of having frequency 5 or 3. Given the possible relative frequency graphs pictured in Figure 2, it is easy to see that δ 0 = 1 2 on average in our binomial distribution model.
If we use N random quadrilaterals to compute the frequency of e = (A, B) in the OHC, its total frequency will be equal to formula (3) The minimum frequency F min of an OHC edge is given by formula (4) where .
In the worst case, δ min = 0 which would mean that all m 0 frequency quadrilaterals would assign the frequency of (A, B) to be 3 which means that N is a lower bound for F min .For edges (A, B) in the OHC, computational evidence suggests that δ min is approximately 1  2 or larger.If δ min = 1 2 , then min = 2 + 2 n−2 so that F min = 3 + 2 n−2 N which is bigger than the expected frequency F avg = 3N .This is the intrinsic reason that the frequencies of the OHC edges computed with optimal 4-vertex paths for the examples in [30] are much bigger than those of most of the other edges.However, one can construct examples of T SP where However, the probability that the minimum frequency of (A, B) ∈ OHC equal to ( 73 + 4 3(n−2) )N approaches 0 as n approaches infinity.For reasonable-sized graphs, such as the instances appearing in the database [24], one can compute the probabilities of the frequency of a given edge e using all n−2 2 quadrilaterals containing e.If N i is the number of quadrilaterals containing e where the frequency is i for i ∈ {1, 3, 5}, then , and .
Thus, when N frequency quadrilaterals are chosen at random, the total frequency F of e is given by formula (5). 2 ) approaches zero for big n.This means that the number of edges with ≈ 0 or ≈ 4 have a very small probability.On average, when p 3 (e) = p 5 (e) = 1  3 + 1 3(n−2) for an edge e in the OHC, it follows that the s of the OHC edges will be bigger than 2 + 2 n−2 .Computational evidence from the graphs in [30] suggests that min is bigger than 2 + 2 n−2 due to the fact that N 5 is large.
This suggests the following criterion to determine whether a given edge e is likely to be in the OHC.That is, we should compare and min .For any given edge e, if > min = 4(1+δmin)(n−1) , then F > F min and e is more likely to be in the OHC.For example, if δ min ≥ 0.5, then the criterion becomes that > 2. This suggests that we can safely trim the edges e with < 2 and still keep the edges in the OHC.In Section 5, we shall give several examples to show what happens using this criterion for T SP instances in the database [24].

Some heuristic for the binomial distribution model
Recall that under our binomial distribution model, for each set of four vertices A, B, C, D of K n , we are essentially picking one of the six relative frequency quadrilaterals pictured in Figure 2 (a)-(f) at random.If we compute many frequency graphs where each frequency graph is computed with N random frequency quadrilaterals with edge e, then the cumulative probability P (X ≤ m) of m frequency quadrilaterals where the frequency f associated with e is either 5 or 3 is computed with formula (6).
For the edges e with p {3,5} > 2 3 + 2 3(n−2) , the number of frequency quadrilaterals where the frequency of e is either 5 and 3 will be bigger than m 0 .Their frequencies F computed with N frequency quadrilaterals will be bigger than F min .The cumulative probability P (X ≥ m 0 ) is computed as formula (7).
The bigger the difference between p {3,5} and 2 3 + 2 3(n−2) , the closer the probability P (X ≥ m 0 ) approaches 1.For the edges e with p {3,5} above 2 3 + 2 3(n−2) , F has a high probability of being above F min if it is computed with the same number of random frequency quadrilaterals.Meanwhile, these edges with big p {3,5} will have a small probability according to the binomial distribution (2).
We have seen that for edges e in OHC, their p {3,5} s are on average bigger than the expected value of p {3,5} which is 2  3 .On the other hand, the edges e with p {3,5} below 2 3 + 2 3(n−2) have a small probability that their frequency F is above F min .The bigger the difference between 2  3 + 2 3(n−2) and p {3,5} in such cases, the closer the probability P (X ≥ m 0 ) approaches 0. For most of the edges not in the OHC, their p {3,5} s are generally smaller than the average probability 2  3 .Next, we consider the edges e with frequency above the average frequency.In view of the six frequency quadrilaterals, we know that the expected value of p {3,5} is 2 3 .In other words, every edge has the probability 2 3 that its frequency is bigger than the average frequency 3 in a frequency quadrilateral in K n .Consider the event that the total frequency F of e is greater than 3N where N represents the number of random frequency quadrilaterals with edge e which we denote by P (F > 3N ).The expected value of P (F > 3N ) is 2  3 over all the n 2 edges.This suggests that we can throw away 1 We also want to estimate the number of edges e such that their cumulative frequencies F e satisfy F e > F min when we compute such frequencies with N random frequency quadrilaterals containing the edge e.If there are K such edges e where F e > F min , the number of edges e with F e ≤ F min will be R = n 2 − K.Note that the total number of frequency quadrilaterals chosen is 12 N .Let F K and F R denote the average frequency of the K edges e with F e > F min and the average frequency of the R edges e with F e ≤ F min .Note that the six possible quadrilaterals containing vertices A, B, C, D give a cumulative frequency of 18 to each quadrilateral.It follows that the formula (8) holds.18n(n − 1) 12 laterals, the expected value of , µ( ), will be 2 and the variance σ 2 ( ) can also be determined.One would expect that s of the n 2 edges will approximately conform to the normal distribution N (µ( ), σ 2 ( )) according to the central limit theorem.The probability P ( ≥ µ( is the Gauss error function.It follows that P ( ≥ µ( ) + tσ( )) is given by the formula (11).
Thus, P ( ≥ µ( ) + tσ( )) is a function of the variable t which will reach a maximum at some value t max .In our frequency graphs, the maximum is 4.This means P ( ≥ 4) approaches 0 which is not consistent with a normal distribution.When t reaches the maximum value t max , t max σ( ) = 2 holds for a given σ( ).Therefore, we can compute P for a distribution of ts to determine the t max and then compute the corresponding σ( ) later.For example, suppose one uses the first 14 terms of formula (11) to compute the probability.Then the change in this probability P according to t max is shown in Figure 4.If we take a threshold at 0.0025 as a small probability (which is reasonable considering the 3σ rule for the normal distribution), then t max = 2.819 and σ( ) ≈ 0.7094 which is bigger than the theoretical value 0.5443 (or ( 23 ) 2 ) of the ideal case.If we want to compute a more accurate approximation σ( ), then we must use more terms in the expansion of (11).We tried 22 terms of formula (11) to compute the other small P ( ≥ µ( ) + tσ( )) and t max and found that the corresponding graph did not differ much from the graph pictured in Figure 4.If we choose σ( ) = 0.7094, the probability density function (PDF ) of the s is approximated by formula (12).
Since we are assuming that the distribution of the s nearly conforms to the normal distribution with the exception that P ( > 4) = 0, we can use some characteristics of the normal distribution to approximately analyze their distribution.For example, we can use the 3σ rule of the normal distribution with t max = 3 to compute the distribution of P ( ≥ µ( ) + tσ( )) in which case we find that σ( ) = 2  3 .The number of edges with above min decreases exponentially in proportion to the difference between min and µ( ) = 2.For TSP of large size, our results suggest that min will be close to 4 and the number of edges with s above min is close to n.For TSP of medium size, our computer experiments described in the next section suggest we will end up with a sparse graph if we keep only the edges with above µ( ) + 2σ( ) or µ( ) + 2.5σ( ).For TSP of small size, our computer experiments described in the next section suggest that we will end up with a sparse graph if we keep only the edges with above µ( ) + σ( ) or µ( ) + 1.5σ( ).The number of edges with ∈ [µ( ) + tσ( ), 4] can be approximated according to formulas (11) and (12).
For the OHC edges, the distribution of their s will conform to another normal distribution based on the central limit theorem.The expected value is lim n→∞ µ o ( ) = 4 and the standard deviation is lim n→∞ σ o ( ) = 0. Thus, the probability density function becomes a Dirac delta function.That is, it is zero everywhere except at µ o ( ) = 4, with an integral of one over the span [0,4].

Examples and analysis
The Concorde package on-line (NEOS Server for Concorde) [21] has computed the OHC for several families of T SP .In this section, we will report on some computer experiments where we used the OHC that had been computed for such T SP instances to compute the corresponding min , σ( ), µ o ( ), σ o ( ), K , R .In each case, we keep only those edges whose corresponding is larger than min .Let r = 2− R K − R be the ratio between K and n 2 , which shows how sparse the graph is.The smaller the values r are, the sparser the graphs are.We also compute c 2 log 2 (n) for comparisons.If c is much smaller than the size of the number of vertices n of T SP , then we are reduced to considering graphs with only O(n log 2 (n)) edges, and we can use various efficient algorithms which work on sparse graphs to search for solutions to our given T SP .
The results are listed in Table 1 according to r as r ranges from big to small values.Six digits after the decimal point are kept.In most cases, we found that min is bigger than µ( ) = 2.As the number of vertices gets larger, min seems to approach or exceed 3 and σ( ) is close to 0.7094.Similarly, as the number of vertices get larger, the corresponding values of related to the OHC edges which we call µ o ( ) seem to approach 4 and and the corresponding variance σ o ( ) is much smaller than σ( ).Similarly, we see that K is much bigger than R .In general, we found that K + R > 4 and r is less than 0.5, except for the instance brg180.The deviation for brg180 from the other examples that we computed is probably due to the fact that brg180 has a lot of equal-weight edges so that the distribution of the computed frequency quadrilaterals does not conform to our binomial distribution model.As expected, our examples also show that r decreases quickly as min grows.Our results nearly conform to the results predicted by formula (11) and Figure 4.The number of edges with above min approximately conforms to the normal distribution formula (12).In the last column, we see that c is much smaller than n and they are smaller than 6.5 for all the TSP instances in Table 1.In such cases, the graph that remains after keeping only those edges with > min are sparse enough to be resolved with the current exact algorithms that work only under the assumption that the underlying graph is sparse.We note that one of the basic assumptions is that for all vertices A, B, C, D, the sum of the distances for the path (A, B, C, D) is always different from the sum of the distances for the path (A, C, B, D).This allows us to always pick the optimal 4-vertex paths for each of the 6 possible pairs paths in our frequency quadrilateral.Thus, a natural question arises of what should be done when there are lots of sets of four vertices, A, B, C, D, such that AC + CB + BD = AB + BC + CD.In such a situation, we have no criterion to determine which of (A, B, C, D) and (A, C, B, D) should be used as the optimal 4-vertex path.In our computer experiments, this issue is resolved by numbering the vertices from 1, . . ., n, and then making the choice between (A, B, C, D) and (A, C, B, D) by choosing the one that is smallest in lexicographic order based on our labeling of the vertices.For example for given four vertices A < B < C < D and AC + CB + BD = AB + BC + CD, we choose the path (A, B, C, D) rather than (A, C, B, D) as an optimal 4-vertex path in our computer experiments.In such a situation, the frequency or of edges computed with them will deviate from our binomial distribution model.The problem instance brg180 is an example of a graph where there are many such sets of 4 vertices.In such a situation, some edges in the OHC may have small frequency in their frequency quadrilaterals due to our selection strategy for the optimal 4-vertex paths.This seems to produce a smaller min and bigger values of r and c.
In another computer experiment, we computed the number of edges e whose s are greater than min as min varied from 2.0 to 3.4 for the instances pr144, brg180, kroA200, pr226, fl417 and gr431.We shall call the resulting graph in each case the residual graph.The number of edges in the original graphs equals n 2 .The experimental results are shown in Table 2.One sees as the threshold min grows, the number of edges in the residual graph decreases sharply.The numbers in parenthesis in Table 2 represent the number of edges from the OHC  whose corresponding is less than min .For min ≤ 2.7, we see that the number of lost OHC edges do not change much.Indeed, when min = 2.7 is taken as the threshold, only a few OHC edges are lost whereas the number of edges that we keep is sharply reduced.Based on our experimental results, we suggest that one should use min = 2.7 as the frequency threshold to compute the residual graph for most small TSP instances.In most cases, the residual graph has less than 15.86% of total number of edges in the original graph.If the residual graph includes the OHC, then significant computation time will be saved to resolve TSP.Of course, in theory, the residual graph may not even have an HC if we use a big threshold, such as min > 2.7.We used the improved genetic algorithm [28] to search the new OHC in the sparse graphs computed with min > 2.7, but in nearly all of the cases we failed to find any HC.For the small instances of the TSP, µ( ) + 2σ( ) is too big to take as the min .In such cases, many of the OHC edges are not included in the residual graph.
There is another possibility for dealing with graphs where there are many sets of vertices A, B, C, D where AB + BC + CD = AC + CB + BD so that we can not choose between the paths (A, B, C, D) and (A, C, B, D) based on the sum of the distances of their edges.One way to resolve this problem is to add a small random distance rd ∈ [0, 1] to the distance of every edge, i.e., the d(A, B) of an edge (A, B) becomes d(A, B) + rd(A, B).For symmetrical T SP , rd(A, B) = rd(B, A) ∈ [0, 1] for an arbitrary edge (A, B).The random distance rd is so small that it does not change the OHC.However, the small random distance converts the "special" T SP into a general T SP so that our binomial distribution model can work well.In addition, rds are generated at random for every edge.Therefore, the random distance rd has the nearly equivalent impact on the probability p {3,5} for any edge e.Meanwhile, the k (1 ≤ k ≤ n 2 ) of every edge also complies with the binomial distribution model.We carried out experiments using this idea to generate the same kind of statistics as shown in Table 2.These results are illustrated in Table 3.
Our computer experiments focused on the instances brg180, pr144, pr226 and fl417 which have many equal-weight edges.We added a random distance rd ∈ [0, 1] to the distance of every edge in order to compute the 6 optimal 4vertex paths in each weighted quadrilateral.Because rds are random variables, the results may vary in different trials.That is, for any given quadrilateral, we may not compute the same six optimal 4-vertex paths.However, our experiments showed that the final result for the frequency graphs does not change very much.On average, the added random distances to the edges allowed us to generate 6 exact optimal 4-vertex paths for most of the weighted quadrilaterals.For some parameters of brg180, refer to Table 4.The number of edges in the sparse graphs is computed with min varying from 2.0 to 3.4.The number of the lost OHC edges is also recorded in the parentheses.Our results showed that the s of edges have only small changes when we add a random distance rd to their distances as compared to the results in Table 1.However, the number of edges with s above min is changed to some extent.For example, the min of brg180 becomes 2.337671 which is bigger than that in Table 1.The r is computed as 0.407138.It means that 2068 more edges are removed comparing with the results in Table 1.Finally, we carried out a few more experiments of this type for brg180 which includes a lot of equal-weight edges.In each experiment, we multiply the small random distance rd ∈ [0, 1] with a different coefficient co, i.e., co × rd and rd ∈ [0, 1].The number of equal-weight edges will be reduced greatly so that we can compute just 6 optimal 4-vertex paths for nearly every given quadrilateral.The min is recorded and r is computed according to the coefficients co.The results are given in Table 4.We note that the results in Table 3 for brg180 were computed according co=1.0 in Table 4.
The min s are not equal for different coefficients co.The number of edges with s above min changes according to co • rd.In the worst experiment, the residual graph includes 0.565988 × 180×179 2 = 9118 edges.In the best experiment, the residual graph includes 0.307886 × 180×179 2 = 4960 edges.For the coefficients co = 1.5, 2.0, 2.5 and 2.8, the min s are less than 2 + 2 (n−2) which is probably due to the fact that we still have many quadrilaterals where we cannot compute the right optimal 4-vertex paths.One reason is that there are still many edges with equal weights.The other reason is that adding random distances leads to many inappropriate optimal 4-vertex paths for brg180.Although we compute 6 optimal 4-vertex paths for a given quadrilateral, the frequency of some OHC edges may not be as big as the the frequency of the other OHC edges in their frequency quadrilaterals.
With the other coefficients, the corresponding min s are much bigger than 2 + 2 (n−2) .This suggests that these coefficients are able to change brg180 into a weighted graph to which our binomial distribution model applies.Thus, adding random increments to the distances of edges can still allow the binomial distribution model to work well.Because the rds are generated at random, we cannot expect to obtain the best results with just one experiment.We found that by using many experiments, we were able to acquire some results where min s were much bigger than 2 + 2 (n−2) .In each of the experiments represented in Table 4, we extracted the 180 s of the OHC edges and ordered them from big to small values.In Table 4, one sees that min s vary quite a bit.However, the second smallest value was approximately 2.674499 in the 11 experiments with different cos.In addition, the 3 rd , 4 th , 5 th and 6 th smallest values, which were approximately 2.680589, did not change much in the 11 experiments.Moreover, the seventh smallest was bigger than 2.7.Thus, when min = 2.7 is taken as the frequency threshold, the number of lost OHC edges is at most 6 in each of the experiments.When one adds random distances to the edges for a T SP with a lot of equal-weight edges, the number of edges in the residual graph will vary from experiment to experiment.This suggests that one should do multiple experiments when adding random distances to the edges until one finds an min which is bigger than 2 + 2 (n−2) .In this way, we can compute a residual graph with a relatively small number of edges.

Conclusions
The main result of this paper is to give a heuristic to cut down the number of edges in the search for an OHC in a symmetric T SP based on computing frequency graphs.That is, first one adds a small increment of distance to One could ask whether we can produce a similar analysis by working with optimal 5-vertex paths and pentalaterals instead of optimal 4-vertex paths and quadrilaterals.In this case, there are 32 different frequency graphs that are possible using five vertices A, B, C, D, E and the distribution of frequencies is not uniform as it is in the case of frequency quadrilaterals using optimal 4vertex paths.Thus, it is much harder to analyze.Another drawback is the cost of computing a frequency graph is O(n 5 ) in this case.
When end with two questions for further research.The first question is what happens if we iterate the procedure of computing the residual graphs.In theory, we can throw away more and more edges.We shall pursue such an analysis in a subsequent paper.The second question is to estimate the complexity of the algorithms that we use an exact or approximation algorithm to resolve T SP based the residual graphs that we produce.This will be the next focus of our future research.

Figure 2 :
Figure 2: The six frequency quadrilaterals ABCD in view of four-vertex and three-line inequality arrays in quadrilaterals ABCD.

3 n 2 3 n 2 , 2 9 n 2 ) 3 n 2
edges with small frequency.As n → ∞, the number of edges with F above 3N conforms to the normal distribution N ( 2 .This suggests that we should select only the 2 edges with top frequency in our search for a solution to T SP .

Table 1 :
The computational results of some TSP instances (n is the T SP scale)

Table 2 :
The number of edges in the sparse graphs and the number of lost OHC edges in the parentheses according to min

Table 3 :
The experiments for the T SP with many equal-weight edges

Table 4 :
The experiments with different co • rd for brg180 each edge to ensure that one can distinguish between the distances of the path (A, B, C, D) versus the path (A, C, B, D) for all sets of 4 vertices A, B, C, D. Next, one computes the frequency graph based on randomly choosing N frequency quadrilaterals with each edge and then eliminates those edges e whose corresponding is less than a pre-specified min .The analysis of our binomial distribution model for such randomly chosen frequency quadrilaterals and our computer experiments suggest min = 2.7 is a good first choice.In this case, the residual graph has less than 15.86% of the total number of edges in the original graph.The cost of computing a frequency graph is O(n 4 ).