Stronger MIP formulations for the Steiner forest problem

The Steiner forest problem asks for a minimum weight forest that spans a given number of terminal sets. We propose new cut- and flow-based integer linear programming formulations for the problem which yield stronger linear programming bounds than the two previous strongest formulations: The directed cut formulation (Balakrishnan et al. in Oper Res 37(5):716–740, 1989; Chopra and Rao in Math Prog 64(1):209–229, 1994) and the advanced flow formulation by Magnanti and Raghavan (Networks 45:61–79, 2005). We further introduce strengthening constraints and provide an example where the integrality gap of our models is 1.5. In an experimental evaluation, we show that the linear programming bounds of the new formulations are indeed strong on practical instances and that the related branch-and-cut algorithm outperforms algorithms based on the previous formulations.


Introduction
The Steiner forest problem (SFP) is one of the fundamental network design problems.Given an edge-weighted undirected graph G = (V , E) and K terminal sets T 1 , . . ., T K ⊆ V , it asks for a minimum weight forest in G such that the nodes inside each terminal set are connected.Its decision version is NP-complete and it is inapproximable within 96/95 unless NP = P [7].In the literature, the SFP was mostly studied in the context of approximation algorithms [1,3,16,[18][19][20]. Surprisingly, only few publications deal with integer linear programming (ILP) formulations, even though the known formulations either yield weak linear programming bounds or are too large to be practically viable.
For the primal-dual 2-approximation algorithms by Agrawal et al. [1] and Goemans and Williamson [16] the classical undirected cut-based formulation is considered.However, this formulation has an integrality gap of 2 even on simple instances.The same is true for the lifted cut relaxation introduced by Könemann et al. [23].
Moreover, the directed cut formulation for the Steiner tree problem [2,8,9,22] can be easily extended to the Steiner forest case.This model cuts off fractional solutions by imposing a direction on each edge, looking for a rooted directed tree that connects all terminals.In the Steiner tree case, where only one terminal set exists, this process is straight-forward and the formulation has an integrality gap between 36/31 ≈ 1.161 and 2, as was shown in [4]; It is widely believed that the true gap of the formulation lies close to 1.161.When multiple sets are present, however, one directed tree per set is needed and these, in general, can impose conflicting orientations to the edges.This is a major additional difficulty in solving the Steiner forest problem.Consequently, there are Steiner forest instances where the directed cut formulation has an integrality gap of 2. Magnanti and Raghavan [25] show how to consolidate the conflicts with an improved flow formulation.This formulation yields strong bounds in computational experiments on small instances, but is too large to be solved on a larger scale.
Lastly, the issues with conflicting orientations can be avoided altogether by using strong undirected formulations.Goemans [14], Lucena [24], as well as Margot et al. [26] independently propose an ILP formulation for the Steiner tree problem that builds on Edmond's complete description of the tree polytope [12].This tree-based formulation has a straight-forward extension to the Steiner forest problem and its LP-relaxation can be solved efficiently.However, its linear programming bounds are identical to the ones from the directed cut formulation.
A more extensive literature can be found for the Steiner tree problem as a special case of the SFP with K = 1: Several surveys compare ILP formulations and their polyhedral properties [9,10,15,27,28].They are the basis for successful branch-andcut (B&C) algorithms [8,22].
Our contribution We propose two new formulations for the Steiner forest problem that combine the strong bounds of the improved flow formulation with the practical usefulness of the simpler cut models.Their corresponding LP relaxations are stronger than the improved flow relaxation by [25] and the directed cut relaxation, and therefore, as the undirected cut relaxation as well.In contrast to the improved flow formulation it can be solved in polynomial time.This answers an open problem in [25] which asks for a cut-based ILP formulation that is at least as strong as the improved flow formulation.
We introduce additional valid constraints that further strengthen our new models.Moreover, we are able to construct an instance with an integrality gap of 1.5; this is in particular interesting since the integrality gap of the directed model for the Steiner Fig. 1 A comparison of lower bounds from LP relaxations.The terminal sets of the three Steiner forest instances are depicted in different shapes ( , , , and ).All edges have unit cost tree problem is a long-standing open problem and its best known lower bound is 1.161 [4].
Finally, we present the results of an experimental study in which all discussed models are compared against each other-both the LP relaxations as well as the related B&C or branch-and-bound (B&B) algorithms.We show that the LP bounds of our models are stronger than what can be achieved from any of the previous relaxations and that they can also be computed quickly and reliably; Fig. 1 shows a comparison of the formulations on widely-used small example instances.The resulting B&C algorithm of our models outperform B&B algorithms based on the previous formulations.
Overview In the remainder of this section we introduce the notations used in this article and give the formal definition of the Steiner forest problem.Section 2 recalls important ILP formulations from the literature along with main results concerning the strength.The main part is Sect.3. Here, our new cut-based models along with their flow-based analogons are described.We prove the strength of the new models with respect to the improved flow formulation [25] and the directed formulation.Moreover, additional strengthening constraints are introduced and an example with integrality gap 1.5 is shown.Section 4 contains the computational study.
Notation Throughout, let G = (V , E) be an undirected, simple graph and let A = {(i, j), ( j, i) | {i, j} ∈ E} be the arcs of the bidirection of G.A cut-set in G is a subset S ⊆ V .Any cut-set S ⊆ V induces a cut δ(S) := {{i, j} ∈ E | |{i, j} ∩ S| = 1}.We abbreviate δ(i) := δ({i}) if S = {i}.If D = (V , A) is a directed graph, we distinguish the outgoing cut δ + (S) = {(i, j) ∈ A | i ∈ S and j / ∈ S} and the incoming cut ∈ S and j ∈ S}.Given a vector x ∈ X d , d ∈ Z ≥0 , and an index set I ⊆ {1, . . ., d} we write x(I ) to abbreviate i∈I x i .Moreover, for k y) ∈ P} be the projection of P onto the x variables.

The Steiner forest problem Consider the undirected graph
if it minimizes the total cost e∈E F c e .Assume without loss of generality that the terminal sets are pairwise disjoint: If T k and T share at least one node, then any forest is feasible for T 1 , . . ., T K if and only if it is feasible for the instance where T k and T are replaced by T k ∪ T .We denote the set of all terminal nodes by T := T 1 ∪ • • • ∪ T K and write τ (t) := k if t ∈ T k .Furthermore, we say that the non-terminal nodes N := V \T are Steiner nodes.For each terminal set T k , k ∈ [K ], we select an arbitrary node r k ∈ T k as a fixed root node and define T k r := T k \{r k } and R := {r 1 , . . ., r K }.To make it easier to state the formulations, we define T i... j as T i ∪ • • • ∪ T j and let T i... j r := T i... j \{r i } be the same set without the ith root node (all other root nodes are still included).
A cut-set S ⊆ V is relevant for the terminal set T k if it separates r k from some terminal t ∈ T k r , i.e., if r k ∈ S but t / ∈ S. We write S k for the set of all cut-sets that are relevant for T k and S := S 1 ∪ • • • ∪ S K for the set of all relevant cut-sets.

Eliminating cycles from the linear programming relaxation
Let us briefly review the existing ILP formulations for the Steiner forest problem.A forest F in G = (V , E) is feasible if and only if any relevant cut-set S ⊂ V contains at least one edge of F, i.e., if |δ F (S)| ≥ 1 for all S ∈ S. Thus, since c ≥ 0, the undirected cut formulation min c T x x ∈ LP uc and integer (IP uc ) where is a valid ILP formulation.While it can be solved efficiently, it yields weak bounds even on trivial instances (see Fig. 1).The reason for the weak bounds becomes apparent when we see formulation (IP uc ) as a set cover problem: We look for a choice of edges such that each cut δ(S) in G is covered by at least one edge.Consider any cycle C of length s in G. Any set cover needs s − 1 edges to cover C. On the other hand, we obtain a fractional solution of value s 2 by setting x e = 0.5 for all edges e ∈ C. Figure 2 shows an example.
As for the Steiner tree problem there exists a model based on flows which is equivalent to the undirected cut-based model. where Thereby, x models the solution edges and f constitutes a flow of value one from the root nodes to each terminal in the same set, cf.(2b).
The undirected formulations can be improved with a standard construction [2,9].Recall that we choose r k ∈ T k as an arbitrary root node of set T k and consider the bi-directed graph underlying G.For all k ∈ [K ], we now look for an arborescence (a directed tree) rooted at r k .If any cut-set S is relevant for T k , then at least one arc must leave S: where Since any solution (x, y) of (IP dc ) can be turned into a feasible Steiner forest  1 where LP dc yields a stronger LP bound than LP uc .The instance has unit costs and a single terminal set that contains all four nodes of the graph.Node a has been chosen as the root.a Shows a feasible fractional solution for LP uc with cost 2.It is impossible to orient this solution such that it is feasible for LP dc which implies an optimum integer solution of cost 3. b Slightly modified instance (B from Fig. 1) with two terminal sets ( 1, 2, root nodes a and b).The red and blue arcs form a solution for relaxation LP dc for the red ( ) and blue ( ) terminal set.The gray edges show the values of the x variables.Looking for a Steiner arborescence for each terminal set does not cut off a fractional optimum of cost 2. c A solution that roots both terminal sets at the root node a of the red ( ) terminal set.The fractional optimum is cut off Hence, we have the following well-known observation (see e.g., [15]).
Observation 2 Proj x (LP dc ) = Proj x (LP df ) and LP uc Proj x (LP dc ).
The directed formulations eliminate directed cycles from the basic optima of its LP relaxation and indeed the bound of the relaxation coincides with the integer optimum on instance A from Fig. 1.However, a slightly modified instance makes the problem reappear, see instance B in Figs. 1, or 2: While the support of any y k is free of directed cycles, the union of the supports is not.This is the reason why the formulation works exceptionally well for the Steiner tree problem where K = 1.If K > 1, however, the LP relaxation of (IP dc ) is again weak.Still, for practical purposes no better formulation was known prior to this work.
These directed cycles potentially appear whenever two terminal sets T k and Tand thus their roots r k and r -end up in the same connected component of the solution, i.e., of the support of x.If we knew beforehand that T k and T lie in the same connected component of an optimum solution, we could simplify the instance, replacing T k and T by their union T k ∪ T .Iterating this idea would yield a solution where all the arborescences are disjoint, eliminating the directed cycles.Unfortunately, we cannot know the connected components of a Steiner forest a priori.Magnanti and Raghavan [25] instead propose to compute the connected components of a solution on-the-fly in the ILP formulation.Then, whenever T k and T , k ≤ , lie in the same connected component, they look for a common arborescence that is rooted at r k and connects all terminals in T k ∪ T .We recall their model IP mr -translated to our notation-in the following.
For each k ∈ , i.e., the set O(r k ) contains a "commodity" (or a terminal pair) for each terminal node that can be connected to r k .We define D := O(r 1 ) ∪ • • • ∪ O(r K ) as the union of the O(r k ), i.e., the set of all commodities.Let H := T 1...K r × • • • × T K ...K r ; any choice h ∈ H assigns exactly one suitable terminal to each root node r 1 , . . ., r K .min c T x (x, y, f ) ∈ LP mr and integer (IP mr ) where LP mr := (x, y, f ) The constraints (5b) ensure that for each ∈ [K ] and each terminal t ∈ T r , there is a unique k ≤ for which the solution contains a directed r k -t-path.In other words, each terminal receives at least one unit of flow from one root node r k .If in the above condition we have k < , then the constraints (5c) ensure that there is a directed r k -r -path, too.
Magnanti and Raghavan show that the improved formulation (IP mr ) is stronger than the undirected cut formulation (IP uc ).Unfortunately, their formulation has a size of Ω( K k=1 K =k |T |), i.e., it is exponential in the number of terminal sets K .We shall see in the next section how we achieve the same effect with a much smaller ILP formulation.

A new ILP formulation for the Steiner forest problem
Our new formulation contains three kinds of variables.As before, we use a variable x i j for each edge {i, j} ∈ E to determine if {i, j} is included in the forest F and two corresponding directed variables y i j , y ji .Likewise, the variables y k i j and y k ji for each k ∈ [K ] and each {i, j} ∈ E determine if the arcs (i, j) and ( j, i), respectively, are included in the arborescence rooted at r k .Finally, we introduce an additional variable z k for each k ∈ [K ] and each ≥ k, with the interpretation that z k = 1 iff T k and T both lie in the arborescence spanned by y k .In the latter case, we say that r k is responsible for the terminals in T .Recall the definition of T i... j as T i ∪ • • • ∪ T j and T i... j r := T i... j \{r i }; In particular, the set T •••K r contains all the terminal nodes that can potentially be connected to r .We extend our previous notion and say that a cut-set S ⊆ V is relevant for r k and T if r k ∈ S and some terminal t ∈ T is not in S. The set of all cut-sets that are relevant for r k and T is written by S k in the sequel.Then, our cut-based formulation reads as follows.min c T x (x, y, z) ∈ LP sedc and integer (IP sedc ) where For any k, , the left hand side of the directed cut-set constraint (6a) is non-negative and the constraint is trivially satisfied if z k = 0.If otherwise z k = 1, we need to connect all terminals from T to the k-th root r k .Then, any cut-set S separating r k from some terminal in T must have at least one outgoing edge.This is exactly the condition modeled by (6a).For each k ∈ [K ], the constraints (6b) ensure that exactly one root r is responsible for T k (and r 1 is always responsible for T 1 , i.e., z 11 = 1).We use constraints (6c) to enforce that each edge {i, j} is part of at most one arborescence.We also want to make sure that no transitive responsibilities exist: If r k is responsible for T , then r cannot be responsible for some T m , m = .This is modeled by the symmetry breaking constraints (6d).They make sure that if root r k is responsible for some terminal set T , then r k must be responsible for T k as well.The capacity constraints (6e) say that if an edge {i, j} is used in any arborescence, then it must be included in the tree.Moreover, no node in any arborescence should have more than one incoming arc, as modeled by the indegree constraints (6f).Finally, the terminals in T  Proof Let Ẽ ⊆ E be an optimal solution to the SFP.Start with z := 0. Now, for each is the root node with lowest index contained in C and for all other root nodes r j ∈ C, j = i, set zi j = 1.The variables z satisfy (6b) and (6d).Moreover, each terminal is assigned exactly one responsible root node.After fixing the z variables the remaining part of the model describes a union of disjoint Steiner trees, one for each connected component.Thereby, Ẽ can be oriented such that each connected component is an arborescence rooted at its responsible root node giving values to variables y 1 , . . ., y K , y, and x.Since the arborescences are disjoint it follows that constraints (6e), (6c) are satisfied.Hence, we obtain a feasible solution to (IP sedc ) with the same objective value.
On the other hand, an optimum solution ( x, ỹ, z) to (IP sedc ) implies a valid hierarchy of the terminal sets.Moreover, constraints (6a) ensure that each terminal set is connected to its responsible root node.Hence, Ẽ := {e ∈ E | xe = 1} is a feasible solution to the SFP with the same cost.
The separation problem for the cut-set inequalities (6a) is polynomial time solveable with standard techniques (see Sect. 4 for details).

Strength of the new formulation
Instead of comparing the models directly we compare their equivalent flow-based variants.To obtain model (IP sedf ) with its relaxation LP sedf from (IP sedc ) we replace the cut-conditions by flow-balance constraints and we also introduce additional flow variables f .Then, any feasible solution to LP sedf defines a flow f kt from r k to any terminal t ∈ T k•••K r and ensures that the flow value of f kt is exactly z k , if t ∈ T . 123 ) The constraints (7c) prohibit f kt from leaving t and facilitate the comparison to LP mr .
Proof The constraints concerning the z variables are identical in both models, as are (6c) and (6e)-(6g).When considering one particular terminal set k ∈ [K ] constraints (7b) model a flow of value z k from r k to each terminal t ∈ T , for each ∈ {k, . . ., K } (except r k itself); and we can assume without loss of generality that this flow satisfies (7c).On the other hand, the directed cuts (6a) ensure that each directed cut separating r k and t has a value of at least z k .This is equivalent.
For better overview we divide the proof into several parts.Parts (A)-(D) show that Proj x (LP sedf ) ⊆ Proj x (LP df ) and (E) gives an example where the strict inequality holds.In particular, in part (D) we construct a solution ( x, f ) ∈ LP df with x = x.
A. Flows are 2-acyclic W.l.o.g.we assume that any flow Otherwise, one can modify the flow f as follows such that the assumption is satisfied.Consider an edge {i, j} ∈ E, let a 1 ∈ {(i, j), ( j, i)} and let a 2 be the reverse arc, and w.l.o.g.let decrease by f kt a 2 for i and j) with the same value and all constraints in LP sedf are still satisfied.

B. Reverse flow
We first introduce additional flow variables f kr , ∀ ∈ [K − 1], ∀ k ∈ { + 1, . . ., K }, i.e., k > .Notice that these flow variables do not exist since we have only flow variables f kt for a set k and terminal t ∈ T k...K r , i.e., τ (t) ≥ k.The values of the new variables are set such that the flow from r to r k is simply reversed: C. Flow from r k to t over r .Now, we construct a flow , and a terminal t ∈ T k r .This flow will send z k from r k to t (over r ) by using the reverse flow from r to r k , i.e., f k t := f t + f kr .

C.1. Feasibility and value
We show that f k t is a feasible flow from r k to t with value We have: Hence, the sum is 0.
• Otherwise : Since f t and f kr are flows the sum is 0.
Hence, f k t is a feasible flow from r k to t with value z k .

C.2. 2-Acyclic
Otherwise, we modify the flow similar to before.Consider an edge {i, j} ∈ E. Again, let a 1 ∈ {(i, j), ( j, i)} with reverse arc a 2 and with r from the same terminal set, and an edge {i, j} ∈ E with the two related arcs a 1 ∈ {(i, j), ( j, i)} and the reverse arc a 2 .We argue that f k s a 1 + f k t a 2 ≤ x i j .If one flow is zero the inequality holds: E.g., if f k t a 2 = 0 we have: The last inequality is true due to constraint (7a * ).The part with f k s a 1 = 0 works analogously.
(a) Otherwise, if both parts are > 0 we have: D. Solution to LP df Due to the previous discussion we are now able to construct a solution ( x, f ) ∈ LP df with the same objective value.See Fig. 3 for an sketch of the construction.

D.1. Variable assignment
We use the same values for the undirected edges by assigning The flow variables f t , ∀ t ∈ T\R, with k = τ (t), are assigned the following values: f k t .Obviously, it holds f t ≥ 0; the upper bound of 1 follows from part D.3.

D.2. Flow conservation and flow value 1
Consider a terminal t ∈ T\R with k = τ (t) and a vertex i ∈ V .By inserting the definition we have:

), and overall we have
Example for strict inequality Figure 4 gives an example with x ∈ Proj x (LP df ) but x / ∈ Proj x (LP sedf ).The instance has unit edge costs and the two terminal sets T 1 = {a, d} and T 2 = {b, c} with r 1 = a, r 2 = b.The optimum solution to LP df sets x i j := 0.5, ∀ {i, j} ∈ E, and the flows are given by Figure (b) and (c) with the depicted arcs routing a flow of value 0.5.Hence, the optimum solution value of LP df is 2.
On the other hand, this solution is not valid for model LP sedf .A value of 0.5 for each edge implies a flow for the first terminal set as depicted in Figure (b).Then, it is not possible to route any flow for the second set (from node b to c) without increasing the x variables.Hence, it has to hold z 12 = 1.However, sending a flow with value 1 from a to nodes b and c while using the same arcs as in (b) is not possible.It is easy to see that the optimum solution to the LP relaxation of LP sedf has a value of 3 by picking any three edges.
Our next theoretical result is that the new relaxation LP sedc is strictly stronger than the relaxation of Magnanti and Raghavan [25].The major difference between LP sedf and LP mr is this: While in LP sedf , any two flows f kt and f kt for t, t ∈ T must have the same flow value z k , the same flows can have different values in LP mr .In that sense, LP sedf is more restricted and it makes sense that any flow that is feasible in LP sedf is feasible in LP mr , too, whereas the converse is not necessarily true (see Fig. 5).

A smaller cut-based formulation
We remark that a directed cut-based model can be written in the slightly different form below.While this formulation is smaller and less involved, it turns out that its linear programming bounds are potentially weaker than the ones from (IP sedc ).Here, we only need two variables y i j , y ji , and a variable x i j for each edge {i, j} ∈ E. As before, for all k ∈ [K ] and all ≥ k, we have a decision variable z k that tells us whether the terminals in T should be connected to the root r k .min c T x (x, y, z) ∈ LP edc and integer (IP edc ) where To see why the formulation is correct, consider a cut-set S ⊆ V with t / ∈ S for some terminal t ∈ T .If S contains a root node r k with z k = 1, then S must have at least one outgoing arc and the right-hand side of (8a) evaluates to 1 (because of (8b) the right-hand side never exceeds 1).Otherwise, the right-hand side of (8a) evaluates to 0 and the constraint is trivially satisfied.The LP relaxation of (IP edc ) can be solved in polynomial time using standard methods to separate the inequalities of type (8a).We sketch the separation algorithm in Sect. 4.
Proof Let ( x, ỹ, ỹ1 , . . ., ỹK , z) ∈ LP sedc .We argue that ( x, ỹ, z) ∈ LP edc .The constraints (8b)-(8d) are trivially satisfied.Now, consider a directed cut S ⊆ V : S ∩T = ∅, for some set ∈ [K ].Any cut S is relevant to the sum in the right-hand side of constraint (8a) if and only if it is a valid cut for constraint (6a), hence On the other hand, the model is stronger than the directed model without the z variables.The following arguments and the used flow construction are similar to the proof of Theorem 1.
Then, for each terminal t ∈ T k , and each root r with ≤ k construct a flow f t from r to t of value z k (except for t = r ).Notice that if k > 1 we also have a flow from r to r k .Similar to the proof of Theorem 1 we also consider the reversed flow f kr (k > ) and combine the flows to f kt := f kt + <k ( f kr + f t ).Due to the directed cuts (8a) and capacity constraints (8d) it is valid to assume that f exists satisfying the following properties: (i) f kt i j ≤ ỹi j and f kt ji ≤ ỹ ji , ∀ {i, j} ∈ E, (ii) f kt is 2-acyclic (as discussed in Theorem 1), (iii) f kt is a feasible flow, and (iv) the flow value of f kt is 1.Using this flow we set ŷk i j := max t∈T k { f kt i j }, ∀ (i, j) ∈ A, ∀k ∈ [K ].Due to properties (i)+(ii) it holds ŷk i j + ŷk ji ≤ xi j , ∀ {i, j} ∈ E, and due to (iii)+(iv) ŷ satisfies the directed cuts (3a).Hence, ( x, ŷ) is a feasible solution to LP dc with the same solution value.
An instance showing the strict inequality is given by Fig. 1.
We summarize the results of the discussion in Fig. 6 and remark that the relationship of LP mr to the models LP dc and LP edc is an open problem.Our conjecture is that it holds Proj x (LP mr ) Proj x (LP edc ) Proj x (LP dc ).

Redundancy in the models and additional valid constraints
Interestingly, the constraints 3) and unitary edge costs 1; the integer optimum is 15.b Optimum solution of LP sedf , LP sedc without indegree constraints (6f) with cost 13.5; the dashed arcs are set to 0.5 and the indegree at the central vertex is violated.c Optimum solution of LP sedf , LP sedc with indegree constraints (6f) and with cost 14; dashed arcs are assigned value 0.5 and solid arcs value 1 are all binding in the formulations LP sedc and LP sedf .Examples are given by Figs. 7, 8, and 5.In particular, this may be surprising for the first inequality since every terminal requires only one path (or a flow of value 1) and moreover, this constraint is nonbinding for the Steiner tree problem.
In the following, we discuss additional constraints for the two models (IP sedc ) and (IP sedf ), respectively.These constraints strengthen the models further and we denote the expanded models by (IP sedc * ) and (IP sedf * ), respectively.Again, we focus on the cut-based model.
a Instance with two terminal sets ( 1, 2) and unitary edge costs 1; the integer optimum is 10.b Optimum solution of LP sedf , LP sedc without indegree constraints y k (δ − (t)) = 0 with cost 9; again, dashed arcs are set to 0.5 and the indegree is violated at the root node of the red set.c Optimum solution of LP sedf , LP sedc with indegree constraints which has cost 9.5; dashed arcs are set to 0.5 and solid arcs to 1 The constraints (9a) and (9b) are the well-known flow-balance constraints from the Steiner tree problem: (9a) affects the overall solution and (9b) each subtree independently.They state that the indegree of a non-terminal vertex is not larger than its outdegree.Since the flow-balance constraints are strengthening for the Steiner tree problem, see e.g., [28], both constraints are strengthening for the SFP, too.We can also incorporate (9a) into LP mr and LP edc , strengthening these models, too.However, this does not hold for constraints (9b).
The latter fact is interesting since it is possible to construct instances where (9b) is violated, but (9a) is not.Such an instance can be constructed by joining two Steiner tree instances-while each instance implies one terminal set-at a non-terminal v. Thereby, the constraint for y 1 and v is violated whereas y 2 has a larger outdegree such that the aggregated constraint is not violated.The first instance is described in [11,27] and is due to Goemans; with k = 4 and r 1 = a 0 the optimum solution sets all arcs to 0.25, and with v = c 34 we have y 1 (δ − (v)) = 0.5 and y 1 (δ + (v)) = 0.25.The second instance is the classical instance with integrality gap 10/9 which can be found in, e.g., [10] Fig. 8.1.With r 2 being the topmost terminal u 1 and v the left non-terminal v 3 the optimum solution sets all arcs to 0.5, and y 2 (δ − (v)) = 0.5 and y 2 (δ + (v)) = 1.The whole example is depicted in Fig. 9.
An example for the strength of (9c) is given by Fig. 8 if the two sets are interchanged, i.e., if the blue terminal set (diamonds) is the first set and the red set (rectangles) the second set.Without these constraints the optimum LP solution has cost 9 and is depicted in Fig. 8b.Adding the constraints increases the optimum solution to 9.5 as, e.g., in Fig. 8c.

Integrality gap
For the Steiner tree problem the integrality gap of the undirected models is 2 and for the directed models the gap is still unknown.Byrka, Grandoni, Rothvoß, and Sanità [4] were able to show that the gap is at least 36/31 ≈ 1.161, but the upper bound is still 2 through the undirected model.Although our Steiner forest models LP sedc * , LP sedf * coincide with the directed models for the case K = 1 we give a series of instances where the gap approaches 3/2 = 1.5 for larger K .
Such an instance depends on an integer M > 0 and consists of M + 1 terminal sets; an example with M = 3 is depicted in Fig. 10.Thereby, the graph consists of M identical subgraphs, one for each set T 1 , . . ., T M .Here, the two terminals of each set are connected by M paths.Each path has a length of 2 with one internal nonterminal vertex.Finally, set T M+1 contains M terminals which are connected to the corresponding non-terminals of each subgraph by zero-cost edges, cf.Fig. 10a.
In the optimum integer solution T M+1 needs to be connected to another set, say T 1 ; hence, the tree containing T 1 ∪ T M+1 induces cost M + 1.All other sets T 2 , . . ., T M can be connected independently by choosing one of the paths.Hence, the overall cost is M +1+(M −1)•2 = 3M −1.On the other hand, the LP relaxation sets z kk = 1 and z k(M+1) = 1/M, ∀k ∈ {1, . . ., M}.Then, each root node r 1 , . . ., r M sends 1/M over each path to its terminal and also 1/M to each terminal in T M+1 .This LP solution

Experimental results
Settings.All experiments were performed on a Debian 10.1 machine with an Intel(R) Xeon(R) CPU E5-2643 running at 3.30GHz.Our code is written in C++ using ILOG CPLEX 12.6.3and the 2012.07 release of the Open Graph Drawing Framework [6].We compiled with g++-8.3 and -O2 flags.Automatic symmetry breaking and presolving was disabled in CPLEX, as well as all general integer cuts.
Instances For the JMP instance set, we generated 580 random graphs with a frequently used method by Johnson et al. [21]: First, distribute n nodes uniformly at random in a unit square.Then, insert an edge {i, j} if the Euclidean distance between i and j is less than α/ √ n, where α is a parameter for the random generator.The cost of the edge {i, j} is proportional to the Euclidean distance.Finally, connect all nodes with a minimum Euclidean spanning tree to ensure that the instance is connected.
To determine K random terminal sets, we first select t • |V | nodes uniformly at random (the number K ∈ [n/2] of terminal sets and the terminal percentage t ∈ [0, 1] are again parameters).We then bring the selected nodes into a random order and draw K − 1 distinct split points from {2, . . ., t • |V | − 1}, thus splitting the random node order into K distinct terminal sets.For each n ∈ {25, 50, 150, 200, 500}, we choose a small, a medium, and a large number of terminal sets K .The percentage t of terminal nodes is picked from {0.25, 0.5, 0.75, 1.0} unless a combination of n, K , and t results in a terminal set size of less than two.For each choice of n, K , and t, we generate five instances with α = 1.6 and five instances with α = 2.0; leading to 580 JMP instances.The MR instance set is generated based on [25] and contains 85 instances.

Solving the LP-relaxations
Separating cut-set inequalities No separation procedures are known for the inequalities of (IP mr ).The cut-set inequalities in the three other formulations can be separated with standard techniques, however: -We separate a point (x, y 1 , . . ., y k ) from (IP dc ) with inequalities of type (3a) in the following way.We compute a maximum r k -t-flow f in the support graph of y k , for each k ∈ [K ] and each t ∈ T k \{r k }.If the value of f is strictly less than one, we derive a violated inequality of type (3a) from the r k -t-cut S := {v ∈ V | there is a v − t-path in the residual network of f }. -For (IP edc ) we want to separate a point (x, y, z) from the feasible region with inequalities of type (8a).For a fixed ∈ [K ] we augment the support graph of y with a super source s and insert an arc (s, r k ) with capacity z k for all k ≤ .We then look for a maximum s-t-flow f for all t ∈ T \{r }.Analogously to the previous case, the corresponding minimum s-t-cut induces a violated inequality of type (8a) if f has a value of strictly less than k=1 z k .To check that r is connected to r 1 , . . ., r −1 as well, we remove (s, r ) from the augmented support graph in a second step and look for a maximum s-r -flow f of value at most −1 k=1 z k .-For (IP sedc ), we want to separate a point (x, y 1 , . . ., y K , z) with inequalities of type (6a).For each ∈ [K ] and each k ≤ , we compute a maximum r -t-flow f in the support graph of y for each t ∈ T k .If the value of f is strictly less than z k the corresponding minimum r -t-cut induces an inequality of type (6a) that separates (x, y 1 , . . ., y K , z).
Some algorithmic techniques have the potential to improve this on-the-fly generation [22]: Back cuts Additionally add the cut-set inequality corresponding to S where v ∈ V is included in S if and only if there is a directed s-v-path in the residual network of f .Nested cuts Assign an infinite capacity to all saturated edges in the residual network of f and iterate.Nested cuts can be combined with back cuts: We first compute S and S and then compute nested cuts on both sets.Creep flows Add a small ε = 10 −8 to all capacities.This lets us find a minimum weight cut that cuts few edges.The creep flow variant works together with both nested cuts and back cuts.Cut purging Finally, it can be beneficial to remove cut-set inequalities from the relaxation if they have not been binding for a number of iterations.
It is not clear a priori which combination of these variants leads to the best performance of the algorithm.In a preliminary experiment, we evaluated all 16 combinations for all the formulations under consideration (the results are shown in Fig. 15 in the Appendix.
To avoid overfitting, we tested on a random subset of the instances only.Back cuts were beneficial in all cases.The LP sedc relaxation benefited from additional creep flows, while LP dc worked best with additional nested cuts and purging.In all cases, we compute the maximum s-t-flows with a custom implementation of the push-relabel algorithm with the highest-label strategy and the gap heuristic [5,17].
Additional valid inequalities Our analysis in Sect.3.3 shows that LP sedc can be strengthened with additional flow-balance and indegree constraints.Similar improvements can be made for the other LP-relaxations.To allow for a fair comparison, we incorporate these improvements and compare the (theoretically) strongest known versions of the LP-relaxations in the sequel: We obtain LP mr * by adding the flow-balance constraints to LP mr .Likewise, we obtain a strengthened version LP dc * of LP dc by adding the flow-balance constraints Analogously, we strengthen LP edc with and obtain LP edc * .Finally, we compare against LP sedc * as defined in Sect.3.3.
Order of the terminal sets.The size of (IP mr ) depends on the order of the terminal sets and is minimized if-without loss of generality-the sets are sorted by decreasing size, i.e., such that The same holds for the running time of the separation procedures for the cut-set inequalities (6a) of (IP sedc ) and (8a) of (IP edc ), respectively.Therefore, in our experiments we index the terminal sets satisfying this decreasing order.A preliminary comparison to a version with the default terminal set order shows that this initial optimization makes solving the LP-relaxation of (IP mr ) more consistent and yields small improvements over the number of instances that could be solved to optimality; e.g., about 7% more instances could be solved.We remark that the order of the terminal sets might have an impact on the LP-bound as well, even though we did not observe significant changes in our experiments.Time to solve the LP-relaxations One important factor for the practical usefulness of an IP formulation is the speed at which its LP-relaxation can be solved to optimality.We evaluate this speed in a computational experiment, comparing the state-of-the-art to our new formulations on the 580 JMP instances.Figure 11 shows how many LPrelaxations were solved to optimality after x ∈ [0, 3600] seconds.After 3600 seconds, the relaxations LP edc * , LP sedc * , LP dc * , and LP mr * were solved to optimality on 567, 554, 292, and 140 instances, respectively; moreover, the bulk of these instances is solved in the first 300 seconds.As observed before, LP mr * has exponential size and has to be solved as a static model, so that its poor performance is not surprising (in fact, it is in line with what Magnanti and Raghavan predict [25]).On the other hand, we would have expected a better performance of the LP dc * model.The LP edc * relaxation solves slightly more instances than the LP sedc * relaxation.This was to be expected, given the smaller size of LP edc * .
Although not shown here, solving the non-starred variants of the formulations has had no significant impact on the solution times in our experiments.Furthermore, the relaxations LP dc * , LP edc * , and LP sedc * can all be solved in less than a second on the 85 instances of the MR set whereas the optimum of LP mr * was reached on 46 MR instances in less than a second of time.We conclude that reliably solving the LP-relaxation is a major hurdle in some cases.

Quality of the LP-bounds
Solving the LP-relaxation to optimality is not necessary as long as a "good-enough" bound is obtained.For instance, it is conceivable that a suboptimum bound from LP mr * is better than an optimum bound from LP dc * and further investigation is needed.To that aim, we solve the LP-relaxations with a time limit of 3600 seconds and take the best bound L found up to that point.We then compare L to the optimum LP bound of LP uc , i.e., L uc .Figure 12 shows the improvement L/L uc in a box plot diagram (maximum, minimum, and quantiles).As the integrality gap of LP uc is two, the maximum improvement is bounded by two as well.Our experiments complement the theoretical analysis from the previous section by quantifying how much stronger the new formulations are.
For the MR instance set, we observe that the bound from LP sedc * is comparable to the one from LP mr * on the smallest instances.For the largest instances, fewer optimum bounds are obtained from LP mr * so that LP sedc * has a smaller spread.The bounds from LP dc * are inferior to the ones from the other relaxations.
Being a large static model, LP mr * did not fit into the memory limit of 3 GB for the majority of the JMP instances.No bound could be obtained in these cases and we thus had to remove LP mr * from the comparison.On this instance set, the new relaxations LP edc * and LP sedc * provide comparable bounds (with LP sedc * seeming slightly stronger) and dominate the bounds from LP dc * .A decrease in quality of the LP dc * bound can be observed for the larger instances.This is in part because fewer and fewer LP dc * -relaxations are solved to optimality.Here, the plotted bound is suboptimum.In an additional experiment, we evaluated the bounds from the lifted-cut relaxation [23] and found them to be identical to the bounds from LP uc on both the JMP and the MR instance set.
Overall, we find that LP-bounds from LP sedc * are at least as good as the ones from the previously strongest relaxation LP mr * .Yet, they can be computed more reliably.

Integrality gaps
We evaluate the integrality gap (O PT I − L P)/O PT I (where O PT I is the integer optimum and L P is the optimum of the LP-relaxation) of the formulations computationally in Fig. 13.The figure is coherent with Fig. 12: The integrality gap of the relaxations LP mr * and LP sedc * disappears on almost all instances.We also see that the bounds obtained from LP edc * indeed are weaker than the ones from LP sedc * .The relaxation LP dc * has significantly larger integrality gaps than the other three relaxations, even for smaller instances where it can be solved to optimality.

Branch-and-bound
As a proof of concept, we implemented a branch-and-bound (B&B) scheme by letting CPLEX solve IP mr * , IP dc * , IP edc * , and IP sedc * on the MR and the JMP instance set.We set a time and memory limit of 3600 seconds and 3 GB, respectively.In each B&B node, we solve the LP-relaxations as discussed previously, in particular, we separate cut-set-inequalities for the cut based formulations IP dc * , IP edc * , and IP sedc * in a branch-and-cut manner using CPLEX callbacks.Solution progress Figure 14 gives an overview over the computational results.It shows how many of the 580 JMP instances were solved to optimality after x seconds.We observe that using IP sedc * leads to the largest number of instances solved.This is surprising when we compare to the results from the LP-experiment where the bounds of LP edc * and LP sedc * seemed on par while LP edc * was solved more reliably.Yet, IP sedc * seems better suited for a B&B scheme.The formulations IP mr * and IP dc * struggle to solve the instances to optimality.This observation agrees with the LP-experiment where already the LP-relaxations LP mr * and LP dc * were difficult to solve.
Layout of the detailed tables More detailed results are given in Tables 1 and 2. Each row of the tables is grouped in three parts and corresponds to a combination of an IP formulation and an instance class in which each instance has |V| nodes and K terminal sets, as shown in the first group of the row.The last column (#) in the first group contains the size of the instance class.The second group shows average values over those instances in each class that were solved to optimality.We show in the first column (#) of the second group how many instances were solved to optimality.The CPU column shows the average CPU time required for optimality to be proven while CPUR gives the cpu time required to solve the root node.The RG column provides the average root gap (O PT − L P r )/O PT where O PT is the optimum integer solution of an instance and L P r is the dual bound at the end of the root node.As usual, the dual bound L P r may be different from the optimum value of the LP-relaxation if CPLEX decides to branch early in view of the time limit or tailing off effects.Finally, BN shows the average number of processed branch-and-bound nodes.Again, all averages are over solved instances only.The third column group gives averages for those instances that could not be solved to optimality.Its first column (#) shows how many instances could not be solved, but still provided a non-trivial dual bound (for this reason, the number of solved/unsolved instances does not add up to the total number of instances in some cases).The second column GAP provides the average gap (O PT − L P)/O PT where L P is the global dual bound after 3600 seconds.The CPUR and BN columns again show the root gap and number of B&B nodes processed.We do not know the optima for 13 of the largest instances and removed those instances from the comparison.
Details on the MR instances We see in Table 1 that the cut-based IP formulations solve all MR instances with ease.For IP sedc * , the root relaxation is integral in all cases.For IP edc * , we need to process a small B&B tree, whereas IP dc * needs to close a much larger gap and considerably more branching is needed.We fail to solve all the instances to optimality with IP mr * : The memory limit is not always sufficient to build the IP model.However, wherever IP mr * is successful, little branching is needed and the root gap is small.Similar observations where made in [25].Details on the JMP instances Table 2 provides detailed B&B results on the JMP instances.As before, the B&B based on IP mr * struggles with the larger instances but seems to profit from tight bounds and small B&B trees wherever it is successful.The IP dc * -based B&B shows the opposite behaviour: In comparison, it needs to close larger gaps and processes larger B&B trees.However, it is more successful than IP mr * .In part, this is due to the high throughput of the algorithm: It processes more B&B nodes per second than any other algorithm in the comparison-at least on the small and medium sized instances.On the larger instances, IP dc * struggles to solve the root relaxations and consequently has little opportunity to close the significant gaps.
The B&B based on IP edc * solves instances with up to 200 nodes and up to 10 terminal sets reliably.Despite the relatively small root gap, many of the larger instances pose a challenge for the algorithm.We observe that even though IP edc * spends little time at the root node, it processes few B&B nodes.This seems to prohibit closing the gap entirely on the large instances, even though the algorithm gets close (within 5%) to the optimum solution-as opposed to IP dc * with a final gap of 20-40%.
Finally, the IP sedc * based B&B solves all instances with up to 200 nodes in less than a minute.We confirm that the root relaxation on these instances is tight, as the algorithm requires little branching (less than 3 nodes on average).However, we observe some failures on the larger instances; in particular, the algorithm fails to solve the root relaxation on some of the instances with 500 nodes and 35/50 terminal sets.On the unsolved instances with 500 nodes, a large part of the computation time is spent at the

Conclusion and outlook
We answer a long-standing open problem by Magnanti and Raghavan [25] and give a cut-based ILP formulation for the Steiner forest problem which is stronger than the classical undirected and directed models.Actually, our new model is even stronger than the improved flow model by [25] and hence, it is the strongest known model for the SFP.The computational study shows that our new branch-and-bound algorithm works very well and its performance seems to be due to the strong bounds obtained from the new formulation IP sedc * .While its relaxation LP sedc * is solved less quickly than the simplified relaxation LP edc * , its stronger bounds seem to pay off overall.
On the theoretical side, we would like to obtain an LP relaxation with an integrality gap of less than 2. This problem is not solved by LP sedc * : We observe that it coincides with LP dc if K = 1.On the other hand, we are able to give a stronger lower bound of 1.5 for the integrality gap.This is a clear improvement over the Steiner tree problem where the gap of the directed models is somewhere between 1.161 and 2.
The relationship to the Steiner tree problem raises some further questions and directions for future research.Since both the Steiner tree problem [30] and the Steiner forest problem [3] are solvable in polynomial time on series-parallel graphs (graphs of treewidth at most 2, partial 2-trees) and there exists a full description of the Steiner tree polytope for this type of graphs [13,26], the existence of such a model for the SFP is an open problem.Notice that LP sedc * does not have the property: inserting an edge between the terminals of the second set in instance B of Fig. 1 gives an example where LP sedc * selects all edges at 0.5.We remark that this instance was already given by [25].
Finally, the polyhedra of our new models and the constraints should be investigated.For example, are the directed cuts facet-defining and are there further strengthening and facet-defining constraints?

Fig. 2 a
Fig. 2 aExample A from Fig.1where LP dc yields a stronger LP bound than LP uc .The instance has unit costs and a single terminal set that contains all four nodes of the graph.Node a has been chosen as the root.a Shows a feasible fractional solution for LP uc with cost 2.It is impossible to orient this solution such that it is feasible for LP dc which implies an optimum integer solution of cost 3. b Slightly modified instance (B from Fig.1) with two terminal sets ( 1, 2, root nodes a and b).The red and blue arcs form a solution for relaxation LP dc for the red ( ) and blue ( ) terminal set.The gray edges show the values of the x variables.Looking for a Steiner arborescence for each terminal set does not cut off a fractional optimum of cost 2. c A solution that roots both terminal sets at the root node a of the red ( ) terminal set.The fractional optimum is cut off

Fig. 3
Fig. 3 Schematic view on the involved flows in the proof of Theorem 1. r k and r are root nodes for sets T k and T , with < k and t ∈ T k r .a The original flows.b The reverse flow f kr from r k to r , cf. part B in the proof, and the combined flow f k t from r k to t over r , cf. part C

Fig. 4
Fig. 4 An instance where the LP relaxation of the extended directed flow formulation gives a better bound than the one of the directed flow formulation, cf.part (E) in the proof of Theorem 1. a Depicts the input graph and b, c give valid flows for sets 1 and 2 for LP df

Fig. 5
Fig. 5 Example instance where LP sedf gives a stronger bound than LP mr .a Instance with three terminal sets ( 1, 2, 3) and unitary edge costs 1. b Optimum solution of LP mr with overall cost 4.5.This solution is infeasible for LP sedf since here we would have z 22 = 0.5 and z 23 = 1.0, conflicting (6d).c Optimum solution of LP sedf which is integer and has cost 5.Here, non-0 z-variables are z 11 = z 22 = z 23 = 1.0

3 Fig. 6
Fig. 6 Relationship of the LP relaxations.The arrows point to the stronger relaxation

Fig. 7
Fig. 7 Example instance where LP sedf and LP sedc are strengthened by y(δ − (v)) ≤ 1. a Instance with three terminal sets ( 1, 2,3) and unitary edge costs 1; the integer optimum is 15.b Optimum solution of LP sedf , LP sedc without indegree constraints (6f) with cost 13.5; the dashed arcs are set to 0.5 and the indegree at the central vertex is violated.c Optimum solution of LP sedf , LP sedc with indegree constraints (6f) and with cost 14; dashed arcs are assigned value 0.5 and solid arcs value 1

Fig. 8
Fig.8 Example instance where LP sedf and LP sedc are strengthened by y k (δ− (t)) = 0, ∀ k ∈ [K ]\{1}, ∀ t ∈ T 1•••k−1 .a Instance with two terminal sets ( 1,2) and unitary edge costs 1; the integer optimum is 10.b Optimum solution of LP sedf , LP sedc without indegree constraints y k (δ − (t)) = 0 with cost 9; again, dashed arcs are set to 0.5 and the indegree is violated at the root node of the red set.c Optimum solution of LP sedf , LP sedc with indegree constraints which has cost 9.5; dashed arcs are set to 0.5 and solid arcs to 1

Fig. 9 Observation 7
Fig.9 Example instance where (9b) is violated, but (9a) is not.All edge costs are one and there are two terminal sets( 1,2).Due to the size of the graph we already show the optimum solution to LP sedc : thin solid arcs are assigned 0.25, dashed arcs are set to 0.5, and z 11 = z 22 = 1.(9b) is violated at v for y 1 but not for y

Fig. 10 a
Fig. 10 a Example instance where the integrality gap of LP sedf * and LP sedc * is 4/3.We have four terminal sets ( 1, 2, 3, 4) and the edge cost for the thick edges is 1 and for the thin edges 0. Since T 4 can only be connected via the other sets the optimum solution constructs three trees, e.g., T 1 and T 4 are connected in one tree and T 2 and T 3 get stand-alone trees.Hence, the cost is 4 + 2 + 2 = 8. b Shows the optimum solution to the LP relaxation which sets z 11 = z 22 = z 33 = 1 and z 14 = z 24 = z 34 = 1/3 and all dashed arcs are set to 1/3.The overall cost is 6 has cost 1/M • 2M • M = 2M.Hence, with arbitrarily large M the integrality gap approaches 1.5.

Fig. 11 Fig. 12
Fig. 11 Number of LP-relaxations (out of 580) solved after x seconds of cpu time

Fig. 13 Fig. 14
Fig.13 Integrality gap of the LP-relaxation after 3600 seconds leaving little time for branching.Comparing the root gaps to the integrality gaps in Fig.13it becomes appearent that CPLEX branches prematurly.

Fig. 15
Fig. 15 Preliminary experiment on a random subset of the instances.The plots show all possible parameter combinations of back cuts (B), creep flows (C), nested cuts (N) and cut purging (P).Each row corresponds to one LP relaxation.The last picture plots the 6 best variants from each previous plot against each other

Table 1
B&B perfomance on the MR