Approximation Algorithms for Not Necessarily Disjoint Clustered TSP

Let G = ( V, E ) be a complete undirected graph with vertex set V , edge set E and let H = < G, S > be a hypergraph, where S is a set of not necessarily disjoint clusters S 1 , . . . , S m , S i ⊆ V ∀ i ∈ { 1 , . . . , m } . The clustered traveling salesman problem CTSP is to compute a shortest Hamiltonian path that visits each one of the vertices once, such that the vertices of each cluster are visited consecutively. In this paper, we present a 4-approximation algorithm for the general case. When the intersection graph is a path, we present a 5/3-approximation algorithm. When the clusters’ sizes are all bounded by a constant and the intersection graph is connected, we present an optimal polynomial time algorithm.


Introduction
Let G = (V, E) be a complete undirected graph with vertex set V , edge set E and edge lengths l(e).Let H =< G, S > be a hypergraph, where S is a set of not necessarily disjoint clusters S 1 , . . ., S m , S i ⊆ V ∀i ∈ {1, . . ., m}, such that ∪ m i=1 S i = V .The clustered traveling salesman problem CTSP is to compute a shortest Hamiltonian path that visits each one of the vertices once, such that the vertices of each cluster are visited consecutively.
One of the main results of this paper is a 4-approximation algorithm for the general case of the CTSP.When the intersection graph is connected and the clusters satisfy the property that for every i ∈ {j, k}: S i ⊆ (S j ∪S k ), the problem has a feasible solution only if the intersection graph is a path.For this case, we present a 5  3 -approximation algorithm.For both approximation algorithms, we assume that edge lengths satisfy the triangle inequality.Another main result of this paper is for the case where the intersection graph is connected and clusters' sizes are all bounded by a constant, without any additional constraints on the intersections sizes.For this case, we present a polynomial time algorithm which finds an optimal solution.When a feasible solution does not exist, all of the above algorithms give an appropriate statement.
The CTSP may be considered as a generalization of the classic traveling salesman problem (TSP) where S i = {v i } and m = |V |.The CTSP may also be considered as a generalization of the problem where the clusters are disjoint.While for every instance of the problem with disjoint clusters there exists a feasible solution, the intersection between clusters may impose additional constraints creating instances with no feasible solution.The TSP and the disjoint-clustered TSP are known to be NP-hard [6].Information about different versions of TSP can be found in [14] and [9].In [6] Christofides gives a well known 1.5-approximation algorithm for the TSP.In [7] Frederickson, Hecht and Kim present approximation algorithms for some routing problems, including the Stacker-Crane problem, searching for a TSP path which must include a pre-defined set of arcs.In [11] Hoogeveen presents approximation algorithms for minimum length TSP paths, where part or all of the endpoints are known.
A lot of research has also been investigated on the clustered TSP problem, where the clusters are disjoint.A heuristic for this problem is presented in [4], a branch and bound algorithm for solving this problem is presented in [15] and bounded-approximation algorithms are in [2] and [10].In [1] the ordered disjoint clustered TSP is considered and a 5  3 -approximation algorithm is offered.In [16] a genetic algorithm for solving this problem is presented.All these algorithms cannot be applied to the CTSP, as the clusters intersections impose additional restrictions on the required TSP path.However, in one case of the problem (Subsection 3.2) we manage to convert the problem into ordered disjoint clusters, and we use the algorithm in [1] for the last step of the solution.
Related work introduces a polynomial time algorithm where the offered solution is a tree (instead of a path) and each cluster is spanned by a sub-tree [12].In [13] the case where each cluster is spanned by a complete star is solved in polynomial time.
The unweighted version of the CTSP may be solved in linear time by the P Q-tree data structure presented by Booth and Lueker in [3].The unweighed solution offers a feasible solution for the weighted CTSP.Thus, the P Q-tree data structure offers an initial solution in some of the algorithms presented in this work.Related work for the unweighted version is the recognition of interval graphs: Given a graph G, find whether a representation of a path and a collection of sub-paths (clusters) of the path exist, such that the intersection graph of the collection of clusters is the given graph G.The recognition of interval graph may be solved in linear time [8].
A possible application for the problem described in [4] and [15] arises from the area of robotics.In a warehouse, an order for goods contains several suborders, not necessarily disjoint, each of which will call for several goods.A motorized robot is dispatched through the warehouse to pick up the goods for each sub-order, the robot may deal with parallel sub-orders, but once it starts handling a sub-order it must complete it consecutively.The aim is to find minimal length route for the robot, such that the order of picking the goods for each sub-order is consecutive.
A similar application described in [15] is the Numerically Controlled machine.In this application, there is a set of customers that require one or more operations with different tools.A machine containing the different tools travel among the customers.Since operating each tool is costly, the path for operating each tool must be consecutive.
Another application is the Physical Mapping of Chromosomes in bio-informatics, described in [5].Each chromosome is mapped by n probes P 1 , . . ., P n and m clones C 1 , . . ., C m .The problem is to reconstruct the probes in a manner satisfying that all the probes which hybridizes to one clone appear consecutively in the solution.Using the Hamming distance, the problem is in fact the CTSP.
In Section 2 we survey known algorithms which we use throughout the paper.In section 3 we present two approximation algorithms.The first is Algorithm NDC, which is a 4-approximation algorithm for the general case of CTSP.The second is Algorithm IGP, which is a 5  3 -approximation algorithm for the case where the intersection graph is a path.We also prove that this requirement is not too strict as in many cases either the intersection graph is a path or there is no feasible solution.In section 4 Algorithm BC is presented.This is a polynomial time algorithm which finds an optimal solution when the intersection graph is connected and the clusters' sizes are bounded by a constant C.

Preliminaries
We present here some known notations and results that will be used throughout the paper.

Christofides' Algorithm for the T SP Path
In [6] Christofides presents the following well known algorithm to approximate the minimal length TSP path: 1. Find T a minimum spanning tree of the graph.
2. Find M a minimum weight perfect matching for the odd degree vertices of T .
3. The union of edges in T and M form an Eulerian path of the graph.
4. Use the triangle inequality to create a TSP path whose length is no bigger than the length of the Eulerian path.
Theorem 1 ([6]) Christofides' Algorithm returns a T SP path whose length is at most 3 2 the length of the optimal solution in O(n 3 ) time complexity assuming the edges' lengths satisfy the triangle inequality.

Shortest T SP Path with Known Endpoints
Hoogeveen in [11] studies approximation algorithms for finding a shortest T SP path for three variants of the problem, varying according to the number of known endpoints: zero, one, or two.The algorithm is a slight variation of Christofides' Algorithm.When the number of known endpoints is zero or one, the approximation bound is 3  2 .When two endpoints are given, Hoogeveen's Algorithm is a 5  3 -approximation bound.The main steps of the algorithm are: 1. Find T a minimum spanning tree of the graph.
2. Find a set W ⊆ V containing all the fixed endpoints of even degree and all other vertices of odd degree.Find M a minimum weight perfect matching on W , leaving 2 − k vertices exposed, where k is the number of fixed endpoints.
3. The graph which is the union of edges in T and M is connected and has either two or zero odd degree vertices.In the latter case, there is one fixed endpoint which is exposed in M .Delete an arbitrary edge touching this vertex.Find an Eulerian path using the two odd-degree vertices at its end-points.
4. Use the triangle inequality to create a TSP path whose length is no bigger than the length of the Eulerian path.
Theorem 2 ([11]) Hoogeveen's Algorithm returns a T SP path whose length is at most 5 3 the length of the optimal solution when the two endpoints are known or at most 3  2 the length of the optimal solution when one endpoint is known assuming the edges' lengths satisfy the triangle inequality.The algorithm works in O(n 3 ) time complexity.

Stacker Crane
The Stacker Crane problem is presented in [7].This is a version of the TSP path, where a set of arcs must be traversed.Given G = (V, E, A) a graph with E a set of undirected edges and A a set of directed arcs, find a minimum length tour, visiting all the vertices in V and traversing all the arcs in A. Assuming the edges lengths satisfy the triangle inequality, the authors offer a 1.8-approximation algorithm.This algorithm is based on running two Procedures LARGEARC and LARGEEDGE and returning the better solution.Procedure LARGEARC is more suitable when the arcs in A are long (compared with E) and Procedure LARGEEDGE is more suitable when the edges in E are long.We use Procedure LARGEARC as part of one of the algorithms presented in this paper.The main steps of Procedure LARGEARC are: 1. Find M a minimum length matching on the endpoints of A using edges from E.
2. The union of M and A creates disjoint cycles, since the degree of each vertex is even.
3. Represent each cycle as a vertex and find a minimum spanning tree using edges from E.
4. Double the edges of the spanning tree.
5. Create a TSP tour which traverses all the arcs in A. Use the triangle inequality to find a tour whose length is bounded by 3l(E) + l(A).

Theorem 3 ([7]
) Procedure LARGEARC returns a T SP path whose length is at most 3l(E) + l(A) assuming the edges' lengths satisfy the triangle inequality.

Ordered Disjoint Clusters
For the special case of disjoint clusters, when the order of the clusters in the T SP path is given, Anily, Bramel and Hertz in [1] present an approximation algorithm, yielding a 5 3 -approximation with time complexity of O(n 3 ).The main steps of the algorithm are: 1. Find a set of edges that represent the shortest connections between consecutive clusters, denote a i , b i ∈ S i as the endpoints of these edges.
2. Find a minimum spanning tree within each cluster and let F be the union of these trees.
3. Augment the graph by duplicating each vertex a i and b i with even degree in F. Add a zero length edge between each vertex and its duplicate.
4. Define a symmetric weight function on the set of vertices (including the new duplicates) and find a minimum weight perfect matching using this weight function.
5. Combine all the above edges to construct a feasible solution.
Theorem 4 ([1]) Anily, Bramel and Hertz' Algorithm returns a T SP path whose length is at most 5 3 the length of the optimal solution in O(n 3 ) time complexity assuming the edges' lengths satisfy the triangle inequality.

P Q-tree
The P Q-tree is a data structure introduced by Booth and Lueker [3] for checking the consecutive ones property (COP ) in linear time.A binary matrix satisfies the COP if there exists a permutation of its rows such that in each column the ones appear consecutively.In our application, each row represents a vertex from V and each column represents a cluster in S. A matrix cell contains 1 if the row's vertex exists in the column's cluster.The P Q-tree data structure represents all the permutations of vertices in V that satisfy the clusters' constraints.
A P Q-tree is a rooted tree with internal nodes of two types P and Q.The children of a P -node occur in no particular order, while the children of a Qnode occur in a preserved order, up to reversal.The frontier of a P Q-tree is the permutation of the tree's leaves by reading the label of the leaves from left to right.Two P Q-trees are equivalent if one is obtained from the other either by permuting arbitrarily the order of all the children of a P -node or reversing the order of the children of a Q-node.
The Booth-Lueker Algorithm, henceforth denoted as BL-Algorithm, uses a pattern matching routing based on 11 templates.Each template consists of a pattern matching a possible sub-tree of the current P Q-tree and a replacement of this pattern.The BL-Algorithm works on the tree from bottom to top, replacing the appropriate patterns.After applying the algorithm, either the frontier of the tree satisfies the COP or a 'no feasible solution' message is returned.
In our application, the leaves of the tree represent all the vertices in V .After applying BL-Algorithm, each permutation of the nodes created by the tree represents a possible order of the vertices in a TSP path.When the COP is satisfied, the TSP path visits the vertices of each cluster consecutively.
Note that for the restricted case of disjoint clusters, an appropriate P Q-tree contains 3 levels.The leaves level represents the vertices in V .The middle level contains one P -node for each cluster, whose sons are the vertices contained in this cluster, with no particular order.The top level contains one P -node as a root, whose sons are all the P -nodes of the middle level.
The structure of a P Q-tree and the use of P -nodes in the tree, allows every order of the vertices inside each cluster and every order of the clusters in the TSP path, thus creating all the feasible solutions of the problem.In a P Q-tree which represents an ordered disjoint clustered TSP we simply replace the root P -node by a Q-node which defines the order of the clusters.

Definition 5
In a hypergraph H =< G, S > with a PQ-tree representation, a node in the PQ-tree spans v ∈ V if it is an ancestor of the leaf in the tree which represents v.

JGAA, 22(4) 555-575 (2018) 561
Remark 6 In this paper we make the following adjustments on the P Q-tree, before applying the BL-Algorithm : • Every original vertex is represented by a leaf, and we denote each leaf as a V-node (vertex node).We add a father node which is a P -node that spans only this leaf.In this manner we assure that every node has at least one ancestor node which is a P -node.This will be used later in Lemma 25.
• When a node has exactly two children there is no difference between a P -node and a Q-node.We change every P -node which spans only two children into a Q-node.Therefore, in our representation, a P -node spans at least 3 vertices.

G int
Let H =< G, S > be a hypergraph with G = (V, E) and S = {S 1 , . . ., S m }, Proof: When G int is a path the following two conditions are satisfied: 1.Each vertex v ∈ V is contained in at most two clusters from S. 3. Verify that G int is a path using the degree of each cluster in the neighbourhood table.
The above steps require O(mn) time complexity and either indicate that G int is not a path, or create the neighbourhood table which represents G int when it is a path.

Approximation Algorithms
In this section we offer two approximation algorithms.The first algorithm is a bounded 4-approximation algorithm which works on any instance of the CT SP problem, even when the intersection graph is not connected.The second approximation algorithm is appropriate for the special case when the intersection graph is connected and forms a path.In this case we achieve a bounded 5  3approximation algorithm.

The General Case
In this section we address the general case, where the clusters' sizes are not bounded and the intersection graph is not necessarily connected.We call the algorithm for approximating this general case NDC -"Non-Disjoint Clusters" (See Figure 1).First, the algorithm creates one P Q-tree for each connected component of the intersection graph.Next, it adds a root P -node whose sons are the roots of previously created P Q-trees, thus combining all the trees into one P Q-tree, which represents the whole hypergraph.Then the algorithm spans the tree from bottom to top, creating a path to represent each scanned tree node.The path is created either by the order defined by Q-nodes in the P Q-tree, or using Procedure LARGEARC from Stacker Crane (see [7]) to approximate the path representing the vertices that correspond to all the descendants of a Pnode.
Remark 8 By the P Q-tree properties, all the V -nodes that are descendants of the same P Q-tree node are on a consecutive sub-path in any feasible solution.
Remark 9 During the algorithm, at the end of every step which handles a Pnode, the node is replaced by a Q-node.So when the algorithm reaches a node in a higher level, all its children are either V -nodes or Q-nodes.
Definition 10 Denote the followings: • opt -the value of an optimal solution.
• P OP T (u) -The sub-path spanning all descendants of u in an optimal solution.
• P N DC (u) -The sub-path spanning all descendants of u in the solution returned by Algorithm N DC (see Figure 1).
• P N DC -The path returned by Algorithm N DC.
• l N DC -Distances defined during Algorithm N DC.
Lemma 11 In Algorithm N DC, for every node u in T P Q , l N DC (P N DC (u)) ≤ 3l N DC (P OP T (u)).Add a P -node as a root, and connect by an edge each P Q-tree as a subtree, creating one P Q-tree denoted by T P Q .Initialize l N DC (e) = l(e) for every e ∈ E.
All the following steps of the algorithm use the l N DC distances.Scan the tree from bottom to top.for every u a node which is not a leaf in T P Q : if u is a Q-node: then Use the order defined by the Q-node to create a path P u in G. else (u is a P -node) if (all the children of u are V -nodes) then Approximate P u a TSP path spanning all the children of u, using Christofides' Algorithm [6].else (u is a P -node with at least one Q-node child defined in this algorithm in lower level of T P Q ) Let {v u 1 , . . ., v u k } ⊂ V be the children of u which are V -nodes, and let {q u 1 , . . ., q u j } be the children of u which are Q-nodes (defined in a lower level of T P Q ).Each Q-node q u j represents an edge E u j in G. Create P u a Stacker Crane path on {v u 1 , . . ., v u k } and {E u 1 , . . ., E u j }, using Procedure LARGEARC ( [7]).

end if end if
Represent the path P u : • By an edge E u in G whose length is the length of the path.
• By a Q-node q u in T P Q , where all the vertices of the path are children of q u .q u replaces u in T P Q .Proof: The proof of the lemma is by induction on the level of node u in T P Q .The induction is carried on according to the different cases of u in the algorithm.
We note that the lengths used in the algorithm, l N DC (which are minimal distances), are not longer than the original lengths used by the optimal solution.
If all the children of u are V -nodes, then P N DC (u) is P OP T (u), hence l N DC (P N DC (u)) = l N DC (P OP T (u)).
Otherwise, for every w, a child of u that is not a V -node, w is a Q-node created during the algorithm and is represented in G by an edge created in a lower level of T P Q (at an earlier stage of the algorithm).By the induction hypothesis, l N DC (P N DC (w)) ≤ 3l N DC (P OP T (w)).Since the order of the children of u is uniquely defined, the same order also exists in the optimal solution, giving that l N DC (P N DC (u)) ≤ 3l N DC (P OP T (u)).
2. u is a P -node and all the children of u are V -nodes: By Theorem 1, Christofides Algorithm gives l N DC (P N DC (u)) ≤ 1.5l N DC (P OP T (u)).
3. u is a P -node with at least one child which is a Q-node defined in lower level of T P Q : Let w 1 , . . ., w k be the children of u which are Q-nodes defined in lower level of T P Q .Let E 1 , . . ., E k be the corresponding edges defined in G by the algorithm.By the construction of the algorithm, each edge E j is created to represent the path P N DC (w j ) with l(E j ) = l(P N DC (w j )).Let A be the union of all these edges A = ∪ k j=1 E j , giving that l N DC (A) = k j=1 l N DC (E j ) = k j=1 l N DC (P N DC (w j )).Let A be the union of the corresponding optimal sub-paths: A = ∪ k j=1 P OP T (w j ).Hence l N DC (A ) = k j=1 l N DC (P OP T (w j )).
Using Algorithm LARGEARC and by Theorem 3, the length of the returned P N DC (u) is bounded by 3 * (l N DC (P OP T (u))−l N DC (A ))+l N DC (A).
By the induction hypothesis, for every w j a child of u, l N DC (P N DC (w j )) ≤ 3l N DC (P OP T (w j )).Hence l N DC (A) ≤ 3 k j=1 l N DC (P OP T (w j )).Therefore, Corollary 12 l N DC (P N DC ) ≤ 3opt.
Proof: The length of P N DC , calculated by the lengths defined in the algorithm, uses minimal distances which might use inner vertices of the sub-paths.We note that for every sub-cluster, this may happen exactly once for entering the subcluster and once for leaving the sub-cluster.The true lengths may be larger, but, using the triangle inequality, the length added to the final T SP solution is bounded by the lengths of the optimal TSP sub-path inside each sub-cluster.Since the optimal solution contains a TSP path inside each sub-cluster, the total added length is bounded by opt.
Corollary 14 Algorithm N DC, for the general not necessarily disjoint clusters CTSP, returns a 4-approximated solution, in polynomial time, when a feasible solution exists, or reports that there is no feasible solution.

Intersection Graph Path
In this section we present algorithm IGP (Intersection Graph Path) which is an approximation algorithm for the case when G int is a path.The path representing G int uniquely defines the order of the clusters in the solution TSP path.The algorithm (see Figure 2) first verifies that G int is a path.In this case, the algorithm uses the order of the clusters in this path to partition the graph vertices into 2m − 1 disjoint sub-clusters B 1 , . . ., B 2m−1 .In the final step we use the algorithm presented in [1] to find the required TSP path.The approximation ratio of the algorithm in this case is 5  3 .We also prove that when S i ⊆ (S j ∪ S k ) for every i ∈ {j, k}, then a feasible solution exists only when G int is a path.Hence, requiring that G int is a path is relevant for most interesting instances of the CT SP problem.A TSP path P , or a statement "G int is not a path".

begin
If there is only one cluster in G, return an approximated TSP path using Christofides' Algorithm [6].Check whether G int (defined in 2.6) is a path.if (G int is not a path) then return "G int is not a path".else Create G int as a path.Use the order of the nodes in the path representing G int to define an order on the clusters: S 1 , S 2 , . . ., S m .Identify the following partition of V : Calculate and return a TSP path using Anily et al.Algorithm ( [1]).Theorem 15 Algorithm IGP (see Figure 2), for CTSP with an intersection graph which is a path, returns a 5 3 -approximated solution in O(n 3 ) time complexity.
Proof: If there is only one cluster, we use Christofides' Algorithm.By Theorem 1 we find a 3  2 -approximated TSP path in O(n 3 ) time.Otherwise, suppose that there are at least two clusters.If G int is not a path, the algorithm reports an appropriate message.If G int is a path, it uses the order of the clusters to define the sub-clusters B 1 , . . ., B 2m−1 .
The last step of Algorithm IGP uses Anily et al.Algorithm, hence by Theorem 4 the approximation ratio of Algorithm IGP is 5  3 .By Theorems 1, 4 and 7 the time complexity for the whole algorithm is O(n 3 ).
The next theorem justifies our interest in the special case where G int is a path.It proves that when no two clusters contain a third one, then a feasible solution exists only when G int is a path.

Theorem 16
In a hypergraph H =< G, S >, suppose that for every i ∈ {j, k}: S i ⊆ (S j ∪ S k ) and that there exists a feasible CT SP path for H, then the corresponding intersection graph is a path.
Proof: Consider a feasible solution.This solution is a path on the vertices in V .This path defines an order of the clusters in S and therefore implies a path in G int .Hence, G int includes a path: s p1 , . . ., s pm .Suppose G int includes an edge outside the path: (s pi , s pj ) with j > i + 1.Therefore, S pi ∩ S pj = φ.Since j > i + 1 there is another index k satisfying i < k < j.The feasible solution contains, in the following order, all the vertices of S pi , the vertices of S p k \(S pi ∪ S pj ) and only after that all the vertices of S pj .Since S pi ∩ S pj = φ we get that S p k \(S pi ∪ S pj ) = φ, giving that S p k ⊆ (S pi ∪ S pj ), contradicting the assumption of the lemma.
Remark 17 When the intersection graph of H is a path and the intersection's size of every two clusters is bounded by a constant, it is possible to obtain a bounded 5  3 -approximated algorithm which works in O( m j=1 |S j | 3 + mn) time complexity, using dynamic programming.

A Polynomial Algorithm for Bounded Size Clusters
In this section we assume that the intersection graph is connected and that |S i | < C for every i ∈ {1, . . ., m}, for a constant C, but we pose no additional constraints on the size of the intersections.For this case we present a polynomial time algorithm, denoted by BC (Bounded size Clusters), which finds an optimal solution, even when the edge lengths do not satisfy the triangle inequality.Note that in this case we profit from the additional constraints posed by the clusters, as they enable us to obtain a polynomial algorithm.
In this case, G int is not necessarily a path, even when a feasible solution exists.Therefore, instead of G int , we use the structure of the appropriate P Q-tree to define ordered disjoint sub-clusters B 1 , . . ., B q .Note that these sub-clusters are different from the ones defined and used in algorithm IGP .On these subclusters we activate a special defined dynamic procedure, denoted by DP BC (Dynamic Programming for Bounded size Clusters), to find a TSP path which satisfies all constraints imposed by the clusters.
In Procedure DP BC: 1. for every u ∈ B i , 2 ≤ i ≤ q, we calculate f (u) -the length of the optimal shortest TSP path, which ends at u and includes all the vertices in B 1 ∪ • • • ∪ B i−1 ∪ {u} and spans consecutively every contained-cluster of 2. for every v ∈ B q , we calculate d(v) -the length of the optimal shortest TSP path, which starts at v, spans all the vertices in B q and spans consecutively every contained-cluster of B q .
In the end, the algorithm uses the above function values and a concatenation of the corresponding TSP paths to create one TSP path which spans all the vertices of the graph in an appropriate order.We start with some definitions.
Definition 18 In a P Q-tree: • An ancestor-P-node is a P-node which has only Q-nodes as ancestors.
• A high-Q-node is a Q-node with only Q-nodes as its ancestors.(A high-Q-node does not have an ancestor which is a P -node.) Definition 19 In a hypergraph H =< G, S > with a P Q-tree representation, a cluster S i ∈ S is: • P-nested-cluster if there is an ancestor-P -node which spans all the vertices of S i and at least one vertex which is not in S i .
• non-contained-cluster if there is no cluster S j such that S i S j .
Note that there are clusters which may be neither P -nested-clusters nor noncontained-clusters.
Lemma 20 In a hypergraph H =< G, S >, the order of non-contained-clusters in every feasible solution is unique.
Proof: First, we identify a partition of V (the vertices of G) into disjoint subclusters defined by the intersections of the non-contained-clusters, such that every non-empty intersection of at least two non-contained clusters defines a sub-cluster.Every cluster contains at least two sub-clusters, since it must intersect with at least one other non-contained cluster by the connectivity of the intersection graph.We claim that the order of these sub-clusters is unique in every feasible solution and that the unique order of the sub-clusters implies a unique order of the non-contained clusters.Suppose, by contradiction, that there is a non-contained cluster S i with two different feasible orderings of its sub-clusters, F 1 and F 2, such that F 2 is not the reversal of F 1. Since its sub-clusters must appear consecutively in every order, for S i to have two different orderings of its sub-clusters, it must contain at least three disjoint sub-clusters: S i1 , S i2 , S i3 .Without loss of generality, suppose that in F 1 they are ordered (S i1 , S i2 , S i3 ) and in F 2 they are ordered (S i1 , S i3 , S i2 ).
Since the ordering (S i1 , S i2 , S i3 ) is feasible, there is a non-contained cluster S j = S i , such that S i3 is contained in S i ∩ S j and S i2 is not contained in S j .Since S j is a non-contained cluster S j \S i = φ.In F 1 the vertices of the subclusters contained in S j \S i must appear adjacent to S i3 , to ensure that all the vertices of S j appear consecutively in any feasible solution.However, in F 2 the vertices of S j do not appear consecutively, since S i3 ⊂ S j , and S i2 ⊂ S j , a contradiction.Previous reasoning proves that the order of the sub-clusters contained in S i is unique in any feasible solution.Since each sub-cluster is defined by an intersection with other non-contained clusters, the order of S i with respect to other non-contained clusters is uniquely defined by the order of its sub-clusters.Thus, implying a unique order among all non-contained clusters.
Lemma 21 In a hypergraph H =< G, S > with a P Q-tree representation and |V | > C, a P -node spans at most C − 1 vertices.
Proof: Since |V | > C there are at least two non-contained-clusters.Clearly, if a P -node spans more than C vertices, then it spans vertices of at least two non-contained-clusters.Under the assumption of connected intersection graph, the same also holds when a P -node spans exactly C vertices.This contradicts Lemma 20 which states that the order between any two non-contained-clusters is unique and therefore cannot be defined by a P -node.
Corollary 22 In a hypergraph H =< G, S > with a P Q-tree representation, if S i is a P -nested-cluster, it cannot be a non-contained-cluster, therefore there is a cluster S j such that S i S j .
Proof: Suppose S i is a P -nested-cluster and a non-contained-cluster.Since it is a P -nested-cluster, there is a P -node p which spans all the vertices of S i and at least one vertex which is not in S i .Since all the vertices of S i must appear consecutively, there is a node t in the P Q-tree which spans all the vertices of S i and is a child of p.According to Remark 6, a P -node has at least 3 children, therefore p has another two children t 1 , t 2 , each of them spans vertices which are not in S i .Since p is a P -node, both orderings are allowed t 1 , t, t 2 and t, t 1 , t 2 , which gives two allowed orderings between S i and another noncontained cluster, contradicting Lemma 20, which states that the order of the non-contained-clusters is unique.Theorem 27 Algorithm BC, for CTSP with clusters' sizes which satisfy |S i | ≤ C, for a constant C (see Figures 3,4 and 5), returns an optimal solution in polynomial time, when a feasible solution exists, or reports that there is no feasible solution.

BP (Between Path)
Proof: If there is only one cluster, an optimal TSP path can be found in constant time, under the assumption that C is constant.Otherwise, suppose that there are at least two clusters.Each sub-cluster in B 1 , . . ., B q , q ≥ 3, is defined by an ancestor-P -node.By Lemma 21 every sub-cluster contains at most C − 1 vertices.By definition, all the ancestors of ancestor-P -nodes are high-Q-nodes.Therefore, the order of B 1 , . . ., B q in an optimal solution is uniquely defined by the order of high-Q-nodes of the P Qtree.This is the same order imposed on the non-contained-clusters, which is uniquely defined in Corollary 23.
The correctness of the dynamic programming (which can be trivially proved by induction) guarantees the return of an optimal solution.For the complexity, the algorithm contains the following steps: 1. Find a P Q-tree using BL-Algorithm (defined in [3]).
2. For every v ∈ B i−1 and u ∈ B i , calculate the optimal TSP path which starts at v, ends at u and spans all the vertices of B i−1 .The optimal solution can be found in polynomial time, since the clusters' sizes are bounded by C. Note that we calculate the paths inside B i−1 , whose size is bounded by C-1.Therefore, we can also demand that the paths satisfy all the constraints imposed by the contained clusters, in polynomial time complexity.
4. Calculate f (v) for every v ∈ B 2 and d(v) for every v ∈ B q .
All the above steps are polynomial, assuming that C is constant.

Summary and Future Research
Our significant result is the bounded approximation algorithm for finding a TSP path when not necessarily disjoint clusters are defined on the vertex set.We also present a better approximation when certain restrictions are imposed on the structure of the intersection graph.When the clusters' sizes are bounded and the intersection graph is connected we present a polynomial time algorithm for finding an optimal solution.It will be interesting to research the general case further, trying to improve the approximation bound for this case.Furthermore, additional special cases of the problem may be defined and solved.

inputAFigure 5 :
Figure 5: Procedure BP Traverse the input table and create for each vertex v ∈ V the list of clusters containing this vertex.If a vertex is contained in more than two clusters, indicate that G int is not a path and stop.2. Create an m * m neighbourhood table for G int using the lists of clusters containing each vertex.
2. There are two nodes in G int with degree 1, all the other nodes in G int have degree 2.Assume the hypergraph is presented in a table where each row represents a vertex from V and each column represents a cluster in S. A table cell contains 1 if the row's vertex exists in the column's cluster.Perform the following steps:1.
input A hypergraph H =< G, S >, where G = (V, E) with edge lengths l(e), ∀e ∈ E .S = assumption The edge lengths satisfy the Triangle Inequality.returnsA clustered T SP path P , or a statement "No feasible solution".beginFind G int (defined in 2.6).Calculate a P Q-tree for each connected component of G int , using BL-Algorithm ([3]).ifBL-Algorithm returns "No feasible solution" on any of the connected components then Return "No feasible solution".endif