Local improvement algorithms for a path packing problem: A performance analysis based on linear programming

Given a graph


Introduction
Given a graph G = (V , E), we wish to find a maximum number of vertex-disjoint paths of length 2. For this NP-hard problem [7], we propose a series of local improvement algorithms. The basic heuristic, denoted H 0 , applies the greedy algorithm to obtain a maximal path packing. For k ≥ 1, the kth heuristic in the series, denoted H k , starts from a maximal path packing and attempts to improve it by replacing any k paths of length 2 by k + 1 paths of length 2; when no further improvements are found, H k terminates. For fixed k, H k runs in polynomial time, but H k is unlikely to run in time polynomial in k, unless P = NP [2].
In Section 2 we review related work. In Section 3 we present performance ratios. In Sections 4-6 we establish lower bounds on ρ 0 , . . . , ρ 4 that match the upper bounds provided by these examples. The lower bounds are obtained by solving linear programs, where the inverse of the performance ratio is the objective and an analysis of certain configurations that can or cannot be locally improved yields the constraints.
Our research was motivated by the relation between the path packing problem and the test cover problem with tests of size at most 2; see [2].

Earlier work
Hurkens and Schrijver [6] consider a series of analogous local improvement algorithms for the more general problem of packing vertex-disjoint subgraphs on t vertices in a given graph. They derive a lower bound φ k on the performance ratio of their kth heuristic, and prove that it is tight if the subgraph is a clique. In particular, for t = 3, Since a path of length 2 is a subgraph on three vertices, we know that ρ k ≥ φ k . Table 1 lists the values of φ k and ρ k for k = 0, . . . , 4. Note that lim k→∞ φ k = ρ 4 . The limiting value of ρ k is unknown, but it https://doi.org/10.1016/j.orl.2020.11.005 0167-6377/© 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). is likely to be strictly smaller than 1, since the problem of packing paths of length 2 is APX-hard [2].
For k = 0, 1, 2, the lower bound φ k on ρ k matches the upper bound provided by the examples given in Section 3, which proves part of our result. For k = 3, 4, the proof is more involved.
Hassin and Rubinstein [4] study packing 3-edge paths of maximum total weight and give a 3 4 -approximation algorithm. They present a ( 89 169 −ϵ)-approximation for the weighted triangle packing problem in [5], and show that this algorithm also gives a performance ratio of ( 35 67 − ϵ) for the maximum weight packing of 2-edge paths. Bafna et al. [1] analyze the k-local search algorithm for computing maximum weighted independent sets in (k + 1)claw free graphs to show a performance ratio of (k − 1 + 1 k ). Fernau and Raible [3] give a parameterized algorithm for deciding whether a 2-edge path packing of size k exists in time O * (2.448 3k ) where the O * hides polynomial bounds on the size of the instance (like n c for some constant c).
From now on, any path consists of three vertices and two edges. perform badly and meet the performance ratios claimed above. For each k, the solid edges form an optimal solution, and the dashed edges form a locally optimal solution, produced by H k . The latter solution can be improved by sacrificing k + 1 paths in the top-left, in return for k + 2 or more alternative paths, indicated by solid lines. Sacrificing only k paths does not give room for improvement. We leave it to the reader to verify this assertion. Note that we obtain infinite classes of such examples by creating multiple copies of the graphs.

Lower bounds: outline of approach
Before getting into details, we will outline our approach of proving lower bounds on the performance of the heuristics H 0 , . . . , H 4 .
For a given value of k, 0 ≤ k ≤ 4, let P denote the path packing found by H k , and let Q denote an arbitrary maximal packing. To investigate the relation between P and Q , we will give each vertex in P a label that quantifies the interaction between the path in Q to which it belongs and the paths in P. When the middle vertex of a path in P has label β and its end vertices have labels α and γ with α ≤ γ , then the path is said to be of type αβγ . A combinatorial analysis enables us to reduce the number of different path types to 49.
Next, for each path type αβγ , we define a variable x(αβγ ), whose value is equal to the fraction of paths in P of that type. We then formulate a linear program in these variables. The objective will be given by |Q |/|P|. The constraints capture conditions that are implied by the local optimality of the solutions. These conditions are rather straightforward for H 0 , H 1 and H 2 but ask for some case analysis for H 3 and H 4 . It will be seen that the optimum solution values match the inverses of the performance ratios ρ 0 , . . . , ρ 4 given in Table 1.

Lower bounds: vertex labels
For a given k ≥ 0, let P denote the path packing found by H k , and let Q denote an arbitrary maximal packing. A P-path is a path in P, a Q -path is a path in Q . For a vertex v, p(v) denotes the P-path containing v, if it exists, and q(v) denotes the Q -path containing v, if it exists.
Without loss of generality, we make the following assumptions.
(1) G contains no other edges than those in P and Q . Otherwise, delete the edges that are neither in P nor in Q from G. In the new graph, P is still locally optimal and Q remains maximal.
(3) Each P-path intersects a Q -path. Otherwise, Q is not maximal.
(4) Each Q -path intersects a P-path. Otherwise, P is not locally optimal.
(5) No P-path and Q -path cover the same three vertices. Otherwise, delete those three vertices. In the new graph, P is locally optimal, Q is maximal, and |P|/|Q | is smaller.
(6) Each node of degree > 1 is in both P and Q . For a vertex v that is an endpoint of p(v) or q(v), the statement is evidently true, using (1). The case remains where v = b is the midpoint of a path a-b-c, that is either in P or in Q . Then a or c or both must be covered twice, as otherwise P or Q is not maximal. Assume that c is covered twice. By (1), b has no neighbors besides a and c. Modify the graph by changing the path a-b-c to a-c-b (removing the edge ab and adding if needed the edge ac). In the new setting b has only one neighbor, c. Let P ′ and Q ′ denote the new packings and suppose P ′ can be improved to P ′′ . P ′′ must involve the edge a-c, as it would otherwise also exist in the original graph. If P ′′ contained the path a-c-d, we can change it to a-c-b, since then b is not covered in P ′′ . But then, by changing a-c-b to a-b-c, the improvement is also possible in the original graph.
We give each vertex v in P a label, which expresses the interaction of its Q -path q(v) with the P-paths. Vertices not covered by P do not get a label. Vertex v in P receives label intersects exactly two P-paths p(v) and p(w) in the two vertices v and w only, with v being the middle vertex of q(v); (w receives label 3;) note that, by assumption (6),

if q(v) intersects exactly two P-paths p(v) and p(w) in ver-
tices v and w, where w receives label 2 or 5;

if q(v) intersects three P-paths;
5 if q(v) and p(v) intersect in two vertices v and w, and the third vertex z on q(v) is on another P-path p(z); w receives label 5 as well; z receives label 3; 6 if q(v) and p(v) intersect in two vertices v and w, and the third vertex on q(v) is not on another P-path; w receives label 6 as well.
The case that q(v) and p(v) intersect in three vertices is excluded by assumption (5). Note that the labeling depends on the triples of vertices that form a P-path. It is irrelevant which edges are used in a P-path. Further note that label 5 comes in pairs, and so does label 6. Fig. 2 illustrates some typical labelings.  We make some further assumptions. (7) Vertex pairs labeled 5 or 6 have a Q -path and a P-path with coinciding middle vertices. Consider a P-path on a, b, c and a Q -path on a, c, d, with a and c labeled 5 or 6. Change G, replacing the paths by b-a-c and d-a-c, respectively. Any improvement in the new graph must sacrifice b-a-c and use, w.l.o.g., d-a-c. This improvement applies to the old graph too, replacing the P-path on a, b, c by the Q -path on a, c, d.
(8) No vertex has label 6. Consider a P-path a-b-c with labels 6 on b and c, intersecting a Q -path d-b-c. Change G, introducing a new vertex c ′ and replacing the P-path by a-b-c ′ , with c ′ labeled 0 and b labeled 1. Any improvement of P in the new graph implies an improvement in the old graph.
(9) No P-path is labeled 155. Consider a P-path a-b-c with labels 1, 5, 5, intersecting a Q -path d-b-c, with d labeled 3. Change G, introducing a new vertex a ′ and replacing the P-path by a ′ -a-b, which is labeled 012. Vertex c becomes unlabeled. If the new graph allows for an improvement of P, the new solution uses w.l.o.g. the paths q(a) and q(b), and hence exists in the old graph as well.
(10) If a P-path has a vertex labeled 1, its middle vertex has label 1. Otherwise, consider a P-path a-b-c with c labeled 1. Change G as under assumption (6), and replace a-b-c in P by a-c-b. Any improvement of P in the new graph must sacrifice this path and use, w.l.o.g., the path q(c), and hence not the edges {a, c} and {c, b}. This improvement could have been made in the old graph as well. Note that by assumption (9), vertices a and b are not labeled 5. Hence, assumption (7) is maintained.
A P-path is said to be of type αβγ if its middle vertex has label β and its end vertices have labels α and γ , with α ≤ γ . A P-path is said to be of form {αβγ } if its type is a permutation of αβγ . A wildcard * may be used to match any label. As a consequence of the assumptions made, there exist 49 different labelings.
A vertex v that is labeled 1, 2, 3, or 4 is called black if p(v) \ v contains a vertex with label 1, or if v is an endpoint of p(v) and p(v)\v contains a vertex labeled 2. In terms of local improvement, a black vertex v can be 'set free' by replacing p(v) by a path not covering v. The replacing path uses one or two edges from a Q -path, the middle vertex of which lies on p(v) \ v, and one or two endpoints not on any P-path. We will mark a black vertex by an overscore.
It follows from the labeling of p(v) whether vertex v is black or not. We may, for instance, have paths labeled111,214, 233, 424, 434 and 255. From assumptions (6) and (10) we have that a vertex labeled 0,2,3, or4 is never the middle vertex of a P-path.

Lower bounds: LP formulations
We define x(αβγ ) as the fraction of P-paths of type αβγ , n(σ ) as the number of occurrences of label σ , and |αβγ | as the number of paths of type αβγ . Then where χ(σ , τ ) = 1 if σ = τ , and 0 otherwise. If σ has an overscore, we restrict the count to occurrences of a black σ .

Heuristic H 2
No two black nodes are on a short path: To justify this constraint, consider black vertices u, v on a black short path q(u) = q(v). Heuristic H 2 would replace the two paths p(u) and p(v) by disjoint paths only incident to p(u) \ u and p(v) \ v, and then add q(u) as an additional P-path. The LP further extended by these constraints achieves its optimum |Q |/|P| = 9 5 for x(212) = 3 5 and x(333) = 2

. Heuristics H 3 and H 4
As is clear from the model, the larger the fraction of nodes labeled 1 or 2, the larger is the ratio |Q |/|P|. This makes sense, as these nodes are neighbors to vertices covered by Q and not by P. Black nodes can transfer this degree of freedom, at the cost of rearranging one existing P-path. The fact that H 2 fails to improve the ratio indicates that one cannot have too many black nodes or nodes with label -or 0 close together. To see how this works out for heuristics H 3 and H 4 , we now investigate how black short paths are involved in transferring the degree of freedom.
We need a more detailed description of how short paths interact. For each αβγ -path with at least one non-black vertex labeled 2, 3 or 5, we distinguish between types αβγ + and αβγ − .
A P-path has a + label if it intersects a black short path in a nonblack vertex, and a − label otherwise. It is called a P + -path or Non-black vertices labeled 2, 3 or 5 on a P − -path are colored white and denoted 2 , 3 , and 5 , respectively. Vertices labeled 1, 2, 3, 4 or 5 that are neither black nor white are colored gray and denoted1,2,3,4,5. We sometimes use the label55 to denote the pair of vertices labeled 5 on a P + -path. Notice that, from the (extended) path-type, we can deduce the color of each vertex.
Vertices labeled 0 remain blank. Informally speaking, a vertex v is colored gray, if it lies on a black short path or if it might be 'freed up' by sacrificing two P-paths; one path to set free a nearby black vertex, and the path p(v) to create a P-path without v. For a vertex v labeled 2, 3 or 5, such a nearby black vertex is available only if We observe that, by definition of P − -paths, no short path has a white and a black vertex.
Replace constraint (2) for heuristic H 2 by We justify these constraints as follows: (2a) A black vertex v labeled 2 lies on q(v) containing a gray vertex labeled 3.
(2b) A black vertex v labeled 3 lies on q(v) that either contains a gray vertex w labeled2 or two gray vertices labeled5.

Heuristic H 3
The following additional constraints hold: The LP defined by the preceding constraints achieves its optimum |Q |/|P| = 11 7 for x(212) = 1 7 , x(414) = 4 7 , and x(434 + ) = 2 7 . We justify the additional constraints as follows: Note that by construction, each black vertex labeled2 or3 has a corresponding P + -path, but multiple2 and3 may share the same P + -path.
(3a) A black vertex v labeled 2 lies on q(v) containing a gray vertex w labeled 3. If another black vertex v ′ labeled 2 lies on q(v ′ ) containing a gray vertex w ′ labeled 3, with p(w) = p(w ′ ), heuristic H 3 will improve this configuration unless the vertices v and v ′ lie on the same P-path labeled212,222,232, or242.
The path cannot have labels212 since heuristic H 3 would give an improvement. For the same reason, paths of type222 and232 must have a − label.
(3b) A black vertex v labeled 2 or 3 lies on q(v) containing a gray vertex w labeled 3, 2 or 5. If another black vertex v ′ labeled 2 or 3 lies on q(v ′ ) containing a gray vertex w ′ labeled 3, 2 or 5, with p(w) = p(w ′ ), heuristic H 3 will improve this configuration unless the vertices v and v ′ lie on the same P-path labeled212,  In the first case we have an extra contribution of +1 to the term n(2 ) + n(3 ) + 1 2 n(5 ), so the total contribution to the right-hand side becomes +2 = 2 − 1 + 1.
In the latter case, the corresponding black vertices v and v ′ must be connected by a mutual P-path p

232
− ,242,223 − or323 − , otherwise heuristic H 4 can improve upon this solution. We then count on the left-hand side a contribution to 2n(2) + 2n(3) of +4 and a contribution of −3 for the path connecting v and v ′ . On the right-hand side we have a contribution to 2|P + | of +4 and also a contribution of −1 for the paths connecting w and w ′ .
If the + labeled path involved contains more than two gray vertices2,3, or55, and hence is labeled 333 + , it contains an extra gray vertex z ′ not lying on a black short path. It follows that the counterpart of z ′ on q(z ′ ) is a white vertex. This adds a contribution of +1 to the term n (2 )   (4b) Note that adding inequalities (3a) and (3c) and multiplying the sum by 2 yields 2n(2)+2n(4)−2(|222 − |+|232 − |+|242|) ≤ 4n(4)+2|{3 * * } + |), which is stronger than inequality (4b), unless we take into account P + -paths containing both a3 and a4 labeled vertex. Such a path cannot contain a vertex labeled 1, 2 or 5, and hence at least one vertex labeled3 is part of a black short path.
Consider a black vertex v labeled 2. As in the configuration left by heuristic H 3 it lies on a short path q(w) with w labeled3 and with p(w) having a + label. Possibly there is another vertex v ′ labeled2 on a short path with w ′ , with p(w ′ ) = p(w), in which case v and v ′ have a common P-path labeled 222 − , 232 − or 242.
If p(w) contains a single3 and no4, we neglect the labels on p(w)\w and have v contribute 2 to n(2) on the left-hand side, and p(w) contribute 2 to 2|{3 * * } + | on the right-hand side of (4b).
If p(w) contains a single3 lying on a black short path and a single4, that is, p(w) has type 034 + , 043 + , 334 + or 343 + , then vertex v contributes 2 to the left-hand side, and p(w) with type 334 + or 343 + contributes −1 to the right-hand side.
If the vertex z labeled4 lies on q(z) with three gray vertices, we let z contribute 1 to n(4) on the right-hand side, in which case the inequality becomes 2 ≤ 4 · 1 − 1.
If q(z) contains one black vertex z ′ labeled4, we have z ′ contribute 1 2 to n(4) on the left-hand side, and let z contribute 1 to n(4) on the right-hand side, by which the inequality becomes If z is the only gray vertex of q(z), with other nodes z ′ and contributes −3 to the left-hand side, and p(w) contributes −1 or 0 to the right-hand side of (4b).
What remains is the case (33*) where p(w) contains two gray vertices labeled 3, both lying on a black short path, and the case (344), where p(w) is of type 344 + or 434 + . In the case (33*), as described before, the black short paths are connected by a path p(v) of type 222 − , 232 − or 242, which contributes −2 to the lefthand side. Should the path p(w) contain a gray vertex z labeled 4, we let the vertex contribute to (4b). We do this in the same way as above, with a contribution of 4 to the right-hand side, and potentially a contribution of 1 to 2n(4) on the left-hand side. Note that z cannot be the only gray vertex on q(z). If that were the case, the above reasoning would force p(v) to intersect q(z) and have type 224.
In the case (344) we would have p(w) contain gray vertices z and z ′ , say, both labeled4. It follows from the above reasoning that at most one of z, z ′ can be the single gray vertex on q(z) or q(z ′ ) respectively, as p(v) cannot intersect both q(z) and q(z ′ ).
Note that p(w) contributes −4 to the right-hand side, which is balanced by the gray nodes labeled 4, that contribute +8.
Hence in all cases, a vertex z labeled4 contributes 4 to the right-hand side if q(z) consists of three gray vertices; it contributes 1 to the left-hand side and 4 to the right-hand side if q(z) has one black vertex; and it contributes 4 to both left-hand side and right-hand side if q(z) contains two black vertices. We have to do so to prevent double counting. For instance, a Q -path labeled444 can be adjacent to two distinct black short paths.  Q -path with three vertices labeled 4, in which at least one third of the nodes is gray. Each configuration satisfies the constraint, and almost all do so with equality.