Many visits TSP revisited

We study the Many Visits TSP problem, where given a number $k(v)$ for each of $n$ cities and pairwise (possibly asymmetric) integer distances, one has to find an optimal tour that visits each city $v$ exactly $k(v)$ times. The currently fastest algorithm is due to Berger, Kozma, Mnich and Vincze [SODA 2019, TALG 2020] and runs in time and space $\mathcal{O}^*(5^n)$. They also show a polynomial space algorithm running in time $\mathcal{O}^*(16^{n+o(n)})$. In this work, we show three main results: (i) A randomized polynomial space algorithm in time $\mathcal{O}^*(2^nD)$, where $D$ is the maximum distance between two cities. By using standard methods, this results in $(1+\epsilon)$-approximation in time $\mathcal{O}^*(2^n\epsilon^{-1})$. Improving the constant $2$ in these results would be a major breakthrough, as it would result in improving the $\mathcal{O}^*(2^n)$-time algorithm for Directed Hamiltonian Cycle, which is a 50 years old open problem. (ii) A tight analysis of Berger et al.'s exponential space algorithm, resulting in $\mathcal{O}^*(4^n)$ running time bound. (iii) A new polynomial space algorithm, running in time $\mathcal{O}(7.88^n)$.


Introduction
In the Many Visits TSP (MVTSP) we are given a set V of n vertices, with pairwise distances (or costs) d : V 2 → Z ≥0 ∪ {∞}. We are also given a function k : V → Z + . A valid tour of length is a sequence of vertices (x 1 , . . . , x ), where = v∈V k(v), such that each v ∈ V appears in the sequence exactly k(v) times. The cost of the tour is Our goal is to find a valid tour with minimum cost. Many Visits TSP is a natural generalization of the classical (asymmetric) Traveling Salesman Problem (TSP), which corresponds to the case when k(v) = 1 for every vertex v. Similarly as its special case, MVTSP arises with a variety of applications, including scheduling [24,17,6,27,12], computational geometry [20] and parameterized complexity [21].

Related work
The standard dynamic programming for TSP of Bellman [1], Held and Karp [16] running in time O * (2 n ) can be easily generalized to MVTSP resulting in an algorithm with the running time of O * ( v∈V (k(v) + 1)), as noted by Psaraftis [24]. A breakthrough came in the work of Cosmadakis and Papadimitriou [7] who presented an algorithm running in time 2 O(n log n) + O(n 3 log ) and space 2 O(n log n) , thus essentially removing the dependence on the function k from the bound (the log factor can be actually skipped if we support the original algorithm with a today's state-of-the-art minimum cost flow algorithm). This may be surprising since the length of the output sequence is . However, beginning from the work of Cosmadakis and Papadimitriou we consider MVTSP with compressed output, namely the output is a multiplicity function which encodes the number of times every edge is visited by the tour. By using a standard Eulerian tour algorithm we can compute an explicit tour from this output.
The crux of the approach of Cosmadakis and Papadimitriou [7] was an observation that every solution can be decomposed to a minimal connected spanning Eulerian subgraph (which enforces connectivity of the solution) and a subgraph satisfying appropriate degree constraints (which completes the tour so that the numbers of visits agree). Moreover, once we guess the degree sequence δ of the Eulerian subgraph, our task splits into two separate tasks: finding a cheapest minimal connected Eulerian subgraph consistent with δ (which is computationally hard) and finding a cheapest subgraph satisfying the degree constraints (which can by solved in polynomial time by a reduction to minimum cost flow).
Yet another breakthrough came only recently, namely Berger, Kozma, Mnich and Vincze [3,2] improved the running time to O * (5 n ). Their main contribution is an idea that it is more convenient to use outbranchings (i.e. spanning trees oriented out of the root) to force connectivity of the solution. The result of Berger et al. is the first algorithm for MVTSP which is optimal assuming Exponential Time Hypothesis (ETH) [18], i.e., there is no algorithm in time 2 o(n) , unless ETH fails. Moreover, by applying the divide and conquer approach of Gurevich and Shelah [15] they design a polynomial space algorithm, running in time O (16 n+o(n) ).

Our results
In this work, we take the next step in exploration of the Many Visits TSP problem: we aim at algorithms which are optimal at a more fine grained level, namely with running times of the form O(c n ), such that an improvement to O((c − ) n ) for any > 0 meets a kind of natural barrier, for example contradicts the Strong Exponential Time Hypothesis (SETH) [19] or the Set Cover Conjecture (SCC) [8]. Our main result is the following theorem.
In Theorems 1.1 and 1.2 the better exponential dependence in the running time was achieved at the cost of sacrificing an O(D) factor in the running time, or the optimality of the solution. What if we do not want to sacrifice anything? While we are not able to get a O * (2 n ) algorithm yet, we are able to report a progress compared to the algorithm of Berger et al. in time O * (5 n ). In fact we do not show a new algorithm but we provide a refined analysis of the previous one. The new analysis is tight (up to a polynomial factor).

Theorem 1.3.
There is an algorithm that solves Many Visits TSP in time and space O * (4 n ).
In short, Berger et al.'s polyspace O * (16 n+o(n) ) time algorithm iterates through all O(4 n ) degree sequences of an outbranching, finds the cheapest outbranching for each sequence in time O(4 n+o(n) ), and completes it to satisfy the degree constraints using a polynomial time flow computation. Note that it is hard to speed up the cheapest outbranching routine, because for the sequence of n − 1 ones and one zero we get essentially the TSP, for which the best known polynomial space algorithm takes time O(4 n+o(n) ) [15]. However, we are still able to get a significant speed up of their algorithm, roughly, by using a more powerful minimum cost flow network, which allows for computing the cheapest outbranchings in smaller subgraphs.

Theorem 1.4.
There is an algorithm that solves Many Visits TSP in time O * (7.88 n ) and polynomial space.
Organization of the paper. In Section 3 we show that, essentially, using a polynomial time preprocessing step we can reduce an instance of Many Visits TSP to an equivalent one but with demands k bounded by O(n 2 ). This reduction is a crucial prerequisite for Section 4 where we prove Theorem 1.1. Next, in Section 5 we prove Theorem 1.3 and in Section 6 we prove Theorem 1.4. We note that in these two sections we do not need the reduction from Section 3, however, in practice, applying it should speed-up the flow computations used in both algorithms described there. Finally, in Section 7 we show Theorem 1.2 and we discuss further research in Section 8.

Preliminaries
We use Iverson bracket, i.e., if α is a logical proposition, then the expression [α] evaluates to 1 when α is true and 0 otherwise. For two integer-valued functions f, g on the same domain D, we write f ≤ g when f (x) ≤ g(x) for every x ∈ D. Similarly, f + g (resp. f − g) denote the pointwise sum (difference) of f and g. This generalizes to functions on different domains D f , D g by extending the functions to D f ∪ D g so that the values outside the original domain are 0.
For a cost function d : V 2 → Z ≥0 ∪ {∞}, and a multiplicity function m : Multisets. Recall that a multiset A can be identified by its multiplicity function m A : U → Z ≥0 , where U is a set. We write e ∈ A when e ∈ U and m A (e) > 0. Consider two multisets A and B. We Directed graphs. Directed graphs (also called digraphs) in this paper can have multiple edges and multiple loops, so sets E(G) will in fact be multisets. We call a directed graph simple if it has no multiple edges or loops. We call it weakly simple if it has no multiple edges or multiple loops (but single loops are allowed). For a digraph G by G ↓ we denote the support of G, i.e., the weakly simple digraph on the vertex set V (G) such that E(G ↓ ) = {(u, v) | G has an edge from u to v}.
Given a digraph G = (V, E) we define its multiplicity function m G : V 2 → Z ≥0 as the multiplicity function of its edge multiset, i.e., for any pair u, v ∈ V , we put m G (u, v) = m E ((u, v)). Conversely, for a function m : V 2 → Z ≥0 we define the thick graph G m = (V, E) so that m G = m. Abusing notation slightly, we will identify m and G m , e.g., we can say that m is strongly connected, contains a subgraph, etc.
We call a directed graph connected if the underlying undirected graph is connected. Similarly, a connected component of a digraph G is a subgraph of G induced by a vertex set of a connected component of the underlying undirected graph.
For a graph G (directed or undirected) and a subset X ⊆ V (G), by G[X] we denote the subgraph induced by X.
Solutions. The following observation follows easily from known properties of Eulerian digraphs.

Observation 2.1. Many Visits TSP has a tour of cost c if and only if there is a multiplicity function m
and (ii) m contains a spanning connected subgraph. Thanks to Observation 2.1, in the remainder of this paper we refer to multiplicity functions as solutions of MVTSP (and some related problems which we are going to define). By standard arguments, the multiplicity function can be transformed to a tour in time O( ). Moreover, Grigoriev and Van de Klundert [14] describe an algorithm which transforms it to a compressed representation of the tour in time O(n 4 log ).

Out-trees.
An out-tree is the digraph obtained from a rooted tree by orienting all edges away from the root. If an out-tree T is a subgraph of a directed graph G and additionally T spans the whole vertex set V (G) we call T an outbranching. The sequence {outdeg T (v)} v∈V (T ) is called the outdegree sequence of T . Consider a set of vertices X ⊆ V , |X| ≥ 2. A sequence {d v } v∈X that satisfies (i) and (ii) will be called an out-tree sequence rooted at r, or outbranching sequence rooted at r when additionally X = V . A δ-out-tree means any subtree spanning X with outdegree sequence δ.

Reduction to small demands
Consider the following problem, for a family of simple digraphs F.
In this paper, we will consider two versions of the problem: when F is the family of all oriented trees, called Fixed Degree Connected Subgraph, and when F is the family of all out-trees with a fixed root r, called Fixed Degree Subgraph With Outbranching. The role of F is to force connectivity of the instance. Other choices for F can also be interesting, for example Cosmadakis and Papadimitriou [7] consider the family of minimal Eulerian digraphs.
The goal of this section is to show that, essentially, using a polynomial time preprocessing step we can reduce an instance of Fixed Degree F-Subgraph to an equivalent one but with demands in, out bounded by O(n 2 ).
When considering the instance of Fixed Degree F-Subgraph we will use the notation n = |V | and = v∈V in(v). (Clearly, we can assume that also = v∈V out(v), for otherwise there is no solution.) Observe that if the image of d is {0, +∞} we get the natural unweighted version, where we are given a graph with edge set d −1 (0) and the goal is to decide if one can choose multiplicities of the edges so that the resulting digraph contains a member of F and its in-and outdegrees match the demands of in and out.
The following observation follows by standard properties of Eulerian cycles in digraphs and the fact that every strongly connected graph contains an outbranching rooted at arbitrary vertex.
In the following lemma, we consider the relaxed problem Fixed Degree Subgraph, defined exactly as Fixed Degree F-Subgraph, but dropping the constraint that solutions must contain a member of F. In what follows, s n (F) = max G∈F ,|V (G)|=n |E(G)|. (Note that in applications we consider in this work F is a family of oriented spanning trees, so s n (F) = n − 1.)

66:6
Many Visits TSP Revisited Lemma 3.2. Fix an input instance d : For every optimal solution r of Fixed Degree Subgraph there is an optimal solution c of Fixed Degree F-Subgraph such that for every u, v ∈ V Before we proceed to a formal proof of Lemma 3.2, let us sketch an intuition behind it. We pick an optimal solution c of Fixed Degree F-Subgraph and let B ∈ F be a spanning subgraph in G c . The symmetric difference between E(G r ) and E(G c ) can be decomposed into "alternating" cycles. It suffices to alternate |E(B) \ E(G r )| ≤ s n (F) of them to enforce containing B. If we alternated all the cycles, we would get the cost of exactly d(c), but because of the optimality of r, alternating any of the cycles does not decrease the cost of the solution. Hence alternating a subset of them cannot make the cost bigger than d(c).
Proof. Let c be an arbitrary optimal solution of Fixed Degree F-Subgraph and let B be an arbitrary graph from F which is a spanning subgraph of G c . Our plan is to build an optimal solution c of Fixed Degree F-Subgraph which contains B and does not differ too much from r.
. In what follows, by an alternating cycle we mean an even cardinality set of edges where edges come alternately from A c and A r . Note that an alternating cycle is not really a directed cycle, it is just an orientation of a simple undirected cycle.
Note that for every vertex v ∈ V , among the edges in A that enter (resp. leave) v the number of edges from A c is the same as the number of edges from A r (counted with corresponding multiplicities), since both c and r satisfy the degree constraints for the same instance. It follows that A can be decomposed into a multiset C of alternating simple cycles, i.e., To clarify, we note that the sum above is over all cycles in C, and not over all copies of cycles. Denote for each e ∈ B + , there is at least one cycle in C that contains e. We choose an arbitrary such cycle and we denote it by C e . (Note that it may happen that C e = C e for two different edges e, e ∈ B + .) In other words, c is obtained from r by iterating over all cycles in C ∈ C + , and adding one copy of each edge of C ∩ A c and removing one copy of each edge of C ∩ A r .
Let us show that G c contains B. This is trivial for every e ∈ B + . When e ∈ E(B)∩E(G r ), consider two cases.
To see that c satisfies the degree constraints, recall that r does so, and note that if in (1) we consider only the summands corresponding to a single cycle C ∈ C + , then for every vertex we either add one outgoing edge and remove one outgoing edge, or add one ingoing edge and remove one ingoing edge, or we do not change the set of edges incident to it. For Hence, since c is optimal solution of Fixed Degree F-Subgraph, we get that c is optimal solution of Fixed Degree F-Subgraph as well. Moreover, by (1), for every u, v ∈ V , This ends the proof.
As noted in [7,2], Fixed Degree Subgraph can be solved by a reduction to minimum cost flow. By applying Orlin's algorithm [23] we get the following.

Theorem 3.4 (Kernelization). There is a polynomial time algorithm which, given an instance
if m * is an optimal solution for I , then f + m * is an optimal solution for I. The algorithm does not need to know F, just the value of s n (F).
Proof. Our algorithm begins by finding an optimal solution r of Fixed Degree Subgraph using Observation 3.3. Define The algorithm outputs I = (d, in , out ) and f . In what follows, we show that the output has the desired properties.
For the property (i), consider any vertex v ∈ V and observe that Now we focus on (ii). Let m * be an optimal solution for I . It is easy to check that f + m * satisfies the degree constraints for the instance I. Also, since m * contains a subgraph from F, then f + m * contains the same subgraph. It follows that f + m * is a feasible solution of I. It suffices to show that f + m * is an optimal solution for I.
In particular, since c contains a subgraph from F, then also r contains the same subgraph. It follows that r is a feasible solution for I (the degree constraints are easy to check). Hence, The small costs case in time O * (2 n D) In this section we establish Theorem 1.1. We do it in a bottom-up fashion, starting with a simplified core problem, and next generalizing the solution in a few steps.

Unweighted decision version with small degree demands
Consider the following problem.
Note that Decision Unweighted Fixed Degree Connected Subgraph generalizes the directed Hamiltonian cycle problem, which is known to be solvable in O * (2 n ) time and polynomial space. In this section we show that this running time can be obtained for the more general problem as well, though we need to allow some randomization.
, the positive answer is always correct and the negative answer is correct with probability at least p, for any constant p < 1.
Our strategy will be to reduce our problem to detecting a perfect matching in a bipartite graph with an additional connectivity constraint.
We define a bipartite graph For an undirected graph H by PM(H) we denote the set of perfect matchings in H. We say that a matching M in B G is connected when for every cut (X, For a matching M in B G we define a contraction of M as function m : In other words G m is obtained from M by (1) orienting every edge from O to I and (2) identifying all vertices in Proof. It is clear that if R is non-zero then B contains a connected perfect matching. For the reverse implication it suffices to notice that every summand in R has a different set of variables, so it does not cancel out with other summands over GF(2 t ).
Our strategy is to test whether R is non-zero by means of DeMillo-Lipton-Schwartz-Zippel Lemma, which we recall below. [11], Schwartz [25], Zippel [28]). Let P (x 1 , x 2 , . . . , x m ) be a nonzero polynomial of degree at most d over a field F and let S be a finite subset of F. Then, the probability that P evaluates to zero on a random element (a 1 , a 2 , . . . , a m ) ∈ S m is bounded by d/|S|. By Lemmas 4.4 and 4.5, the task reduces to evaluating R fast. To this end, we will define a different polynomial P which is easier to evaluate and turns out to be equal to R over GF(2 t ).

Lemma 4.5 (DeMillo and Lipton
Consider a subset X ⊆ V . Let . . , out(v)}. Abusing the notation slightly, we will denote B[X] = B[I X ∪ O X ]. Define the following polynomial.
In what follows, v * is an arbitrary but fixed vertex of V . Define yet another polynomial.
Proof. For a matching M in B we say that a set X ⊆ V is consistent with M when M does not contain an edge u O i v I j such that u ∈ X and v ∈ V \ X or v ∈ X and u ∈ V \ X. The family of all subsets of V that are consistent with M will be denoted by C(M ). Then we can rewrite P as follows.   [26,22]). For an arbitrary set X ⊆ V , the polynomial P X can be evaluated using poly(n + M ) field operations.
Proof. Compute the determinant of the corresponding Tutte matrix of dimension |O|×|I|.
Let us now fix our field, namely t = 1 + log n + log M . Since arithmetic operations in GF(2 t ) can be performed in time O(t log 2 t) = O(log(n + M ) log 2 log(n + m)), by the definition of P and Lemma 4.7 we get the following corollary.  Proof. The algorithm evaluates polynomial P using Corollary 4.8 substituting a random element of GF(2 t ) for each variable, and reports "yes" when the evaluation is nonzero and "no" otherwise. If it reported "yes", then P was a non-zero polynomial and by Lemma 4.4 the answer is correct. Assume it reported 'no' for a yes-instance. By Lemma 4.4 P is non-zero. Since deg P = |I| ≤ nM , by Lemma 4.5 the probability that P evaluated to 0 is bounded by deg P/2 t ≤ 1/2 and we can make this probability arbitrarily small by repeating the whole algorithm a number of times, and reporting "yes" if at least one evaluation was nonzero. The claim follows.  In order to prove Lemma 4.10 we cast the problem in the setting of inclusion oracles from the work of Björklund et al. [5]. Consider a universe U and an (unknown) family of witnesses F ⊆ 2 U . An inclusion oracle is a procedure which, given a query set Y ⊆ U , answers (either YES or NO) whether there exists at least one witness W ∈ F such that W ⊆ Y . Björklund et al. prove the following.

Proof of Theorem 1.1
In the lemma below we will adapt the construction from Section 4.1 to the weighted case in a standard way, by introducing a new variable tracking the weight.  Our algorithm reports the minimum w such that R w evaluated to a non-zero element of GF(2 t ), or +∞ if no such w exists. The solution of weight w is then found using Lemma 4.10. The event that the optimum value w * is not reported means that R w * is a non-zero polynomial that evaluated to 0 at the randomly chosen values. By Lemma 4.5 this happens with probability at most deg P/2 t ≤ 1/2, and one can make this probability arbitrarily small by standard methods.

5
The general case In this section we prove Theorem 1.3, i.e., we show an algorithm solving Many Visits TSP in time O * (4 n ). In fact, we do not introduce a new algorithm, but we consider an algorithm by Berger et al. (Algorithm 5 in [3]) and we provide a refined analysis, resulting in an improved running time bound O * (4 n ), which is tight up to a polynomial factor.
Let us recall the algorithm of Berger et al., in a slightly changed notation. In fact, they solve a slightly more general problem, namely Fixed Degree Subgraph With Outbranching. Let I = (d, in, out, r) be an instance of this problem, i.e., we want to find a solution m : V 2 → Z ≥0 that satisfies the degree constraints specified by in and out and contains an outbranching rooted at r. In what follows we assume V = {1, . . . , n} and r = 1.
Consider an outbranching sequence {δ v } v∈V rooted at r = 1. In what follows, all outbranching sequences will be rooted at 1, so we skip specifying the root. Let T δ be a minimum cost outbranching among all outbranchings with outdegree sequence δ and let r δ be an optimum solution of Fixed Degree Subgraph for instance (d, in , out ) where out = out − outdeg T and in = in − indeg T . Berger et al. note that then m δ = m T δ + r δ is a feasible solution for instance I of Fixed Degree Subgraph With Outbranching, and moreover it has minimum cost among all solutions that contain an outbranching with outdegree sequence δ. Since r δ can be found in polynomial time by Observation 3.3, in order to solve instance I it suffices to find outbranchings T δ for all outbranching sequences δ and return the solution m δ of minimum cost. Hence, Theorem 1.3 boils down to proving the following lemma. We prove Lemma 5.1 by using dynamic programming (DP). However, it will be convenient to present the DP as a recursive function BestOutbranching with two parameters, S ⊆ V and {δ v } v∈S (see Algorithm 1). It is assumed that 1 ∈ S. We will show that BestOutbranching(S, δ) returns a minimum cost out-tree among all out-trees with outdegree sequence δ that are rooted at 1 and span S. Our algorithm runs BestOutbranching for S = V and all outbranching sequences δ : V → Z ≥0 . Whenever BestOutbranching returns a solution for an input (S, δ), it is memoized (say, in an efficient dictionary), so that when BestOutbranching is called with parameters (S, δ) again, the output can be retrieved in polynomial time.
Algorithm 1 A pseudocode of the algorithm from Lemma 5.1.

Lemma 5.2. If function BestOutbranching is given a reachable state as input then all recursively called BestOutbranching will also be given only reachable states.
Proof. Let us fix a reachable state (S, δ) for |S| > 2 and consider the associated value v first from the algorithm. Denote S = S \ {v first }. Clearly, it suffices to show that all pairs (S , δ ) created in the for loop are reachable states. First, let us argue that bad(S , δ) = ∅. There are two cases: Assume |bad(S, δ)| = 0. In this case v first > lastRmvd(S) so lastRmvd(S ) = v first . Then, = ∅. Let us consider the recursive call of BestOutbranching for a particular w. The sequence δ | S differs from δ only at w, so bad(S , δ ) ⊆ {w} ∪ bad(S , δ) = {w}. This means that condition (iii) from the definition of a reachable state holds for (S , δ ). Since (S, δ) is reachable, δ 1 ≥ 1. Then either w = 1 and δ 1 = δ 1 ≥ 1 or w = 1 and δ 1 = δ 1 − 1 ≥ 1, where the last inequality holds thanks to the condition in the if statement in Algorithm 1. In both cases, (i) holds for (S , δ ). Finally, (ii) is immediate by the definition of δ . It follows that (S , δ ) is a reachable state, as required. (S, δ), it returns a cheapest out-tree T rooted at vertex 1, spanning S and with outdegree sequence δ.

Lemma 5.3. If the function BestOutbranching is given a reachable state
Proof. We will use induction on |S|.
In the base case |S| = 2, there is only one outbranching spanning S rooted at 1, namely {(1, v first )} and it is indeed returned by the algorithm.
In the inductive step assume |S| > 2. By conditions (i) and (ii) in the definition of a reachable state and Lemma 2.2, there is at least one out-tree rooted at 1, spanning S, and with outdegree sequence δ. Let T be a cheapest out-tree among all such out-trees.
Vertex v first is a leaf of T , since δ v first = 0. At some point w in the for loop in Algorithm 1 is equal to the parent w * of v first in T . Then, T \ {(w * , v first )} is an out-tree rooted at 1, spanning S , and with outdegree sequence δ . Since (S , δ ) is a reachable state by Lemma 5.2, by the inductive hypothesis we know that a cheapest such out-tree T will be returned by BestOutbranching(S , δ ). In particular, it means that d( . It follows that BestOutbranching returns a set of edges best of cost at most d(T ). However best = R w for a vertex w and by applying the induction hypothesis it is easy to see that R w is an out-tree rooted at 1, spanning S with outdegree sequence δ. The claim follows. Proof. Any sequence of n nonnegative integers that sums up to at most n − 1 will be called an extended sequence. It is well known that there are exactly 2n−1 n < 2 2n−1 = O(4 n ) such sequences. To see this consider sequences of n − 1 balls and n barriers and bijectively map them to the sequences of n numbers by counting balls between barriers and discarding the balls after the last barrier.
Recall that our algorithm runs BestOutbranching(V, δ) for all outbranching sequences δ and uses memoization to avoid repeated computation. We claim that for any outbranching sequence δ, the pair (V, δ) is a reachable state. Indeed, conditions (i) and (ii) hold since δ is an outbranching sequence. By definition, lastRmvd(V ) = 0, so bad(V, δ) = ∅ which implies (iii). Hence by Lemma 5.3 the algorithm is correct. By Lemma 5.2 the running time can be bounded by the number of reachable states times a polynomial, which is O * (4 n ) by Lemma 5.4. This ends the proof of Lemma 5.1 and hence also Theorem 1.3, as discussed in the beginning of this section.

6
Polynomial space In this section we show Theorem 1.4, that is, we solve Many Visits TSP in O * (7.88 n ) time and polynomial space. Berger et al. [2] solved this problem in O(16 n+o(n) ) time and polynomial space, with the key ingredient being the following. The intuition behind our approach is as follows. We iterate over all subsets of vertices R. Here, R represents our guess of the set of inner vertices of an outbranching in an optimal solution. Then we perform (i) and (ii) in the smaller subgraph induced by R. Finally, we replace (iii) by a more powerful flow-based algorithm which connects the vertices in V \ R to R, and at the same time computes a feasible solution of Fixed Degree Subgraph on the residual degree sequences, so that the total cost is minimized. Let r = |R|. Clearly, when r is a small fraction of n, we get significant savings in the running time. The closer r/n is to 1 the smaller are the savings, but also the smaller is the number n r of sets R to examine.
In fact, the real algorithm is slightly more complicated. Namely, we fix an integer parameter K, and then R corresponds to the set of vertices left from an outbranching in an optimal solution after K iterations of removing all leaves. The running time of our algorithm depends on K, because the algorithm actually guesses the layers of leaves in each iteration. The space complexity is polynomial and does not depend on K. In the end of this section, we show that our running time bound is minimized when K = 4.

Our algorithm
Similarly as in Section 5, we solve the more general Fixed Degree Subgraph With Outbranching: for a given instance I = (d, in, out, root) we want to find a solution m : V 2 → Z ≥0 that satisfies the degree constraints specified by in and out and contains an outbranching rooted at root.
Let T be an arbitrary outbranching. We define a sequence L 1 (T ), L 2 (T ), . . . of subsets of V (T ) as follows. For i ≥ 1 let L i (T ) be the set of leaves of The sets L i (T ) will be called leaf layers. Denote R i (T ) = V \ (L 1 (T ) ∪ · · · ∪ L i (T )) for any i ≥ 1.

Lemma 6.2. For every
Proof. In this proof we skip the "(T )" in L i and R i because there is no ambiguity. Assume root ∈ L i for some i ≥ 1. It means that root is a leaf in If |V \ (L 1 ∪ . . . ∪ L i )| > 1, then L i+1 is the set of leaves of the out-tree T \ (L 1 ∪ . . . ∪ L i ), which is contained in the set of parents of vertices in L i . Since every vertex in L i has exactly one parent, |L i | ≥ |L i+1 |. If |V \ (L 1 ∪ . . . ∪ L i )| ≤ 1 then L i+1 = ∅ and clearly |L i | ≥ |L i+1 | = 0.
Finally, since for every j < i we have |L j | ≥ |L i | we get n − |R i | = |L 1 | + . . .
Pseudocode of our algorithm is presented as Algorithm 2.
Algorithm 2 A pseudocode of the algorithm from Section 6.1. TR ← cheapest δ-out-tree spanning R rooted at root (Lemma 6.1) 5: for L1, . . . , LK do best ← cost(f ) + d(TR) return best For clarity, in the pseudocode we skipped some constraints that we enforce on the sets L i and sequence δ. We state them below.
It is clear that all these possibilities can be enumerated in time proportional to their total number times O(n).
Let us provide some further intuition about Algorithm 2. Consider an optimum solution m of I and any outbranching B in m rooted at root. In Algorithm 2, for any i = 1, . . . K + 1, the set L i is a guess of the leaf layer L i (B), while R is a guess of V \ (L 1 (B) ∪ · · · ∪ L K (B)). Finally, δ is a guess of the outdegree sequence of the out-tree B [R].
In Line 8 we create a flow network, and in line 9 a minimum cost maximum flow is found in polynomial time. In the next section we discuss the flow network and properties of the flow.

The flow
In this section we consider a run of Algorithm 2, and in particular we assume that the variables R, δ, L 1 , . . . , L K+1 have been assigned accordingly. The function CreateNetwork in our algorithm builds a flow network F = (V (F ), E(F ), cap, cost), where E(F ) is a set of directed edges and cap and cost are functions from edges to integers denoting capacities and costs of corresponding edges. As usual, the function cost extends to flow functions in a natural way, i.e., cost(f ) = e∈E(F ) f (e)cost(e). We let V (F ) = {s, t}∪{v where s and t denote the source and the sink of F . We put following edges into E(F ): We will say that F has a full flow if it has a flow f with value |f | = v∈V out (v). By the construction of F , then all edges leaving source are saturated, i.e., carry flow equal to their capacity. Since v∈V out (v) = v∈V in (v), also all edges that enter the sink are saturated.
Essentially, the network above results from extending the standard network used to get Observation 3.3 by vertices v C . The flow between {v O | v ∈ V } and {v I | v ∈ V } ∪ {v C | v ∈ V \ R} represents the resulting solution. In a full flow the edges leaving v C are saturated, so a unit of flow enters every vertex v C , which results in connecting v in the solution to a higher layer or to R. Thanks to that the solution resulting from adding the out-tree T R to the solution extracted from f contains an outbranching. Proof. By standard arguments, since all capacities in F are integer, we infer that there is an integral flow of minimum cost (and it can be found in polynomial time), so we assume w.l.o.g. that f is integral.
Let b : V 2 → {0, 1} denote a function such that b(u, v) = [(u, v) ∈ T R ]. Now we construct a solution m : In other words, m describes how many times edge (u, v) was used by the out-tree T R and flow f in total. Let us verify that m is a feasible solution for I. The degree constraints are easy to verify, so we are left with showing that m contains an outbranching rooted at root. To this end it suffices to show that every vertex v is reachable from root in G m . Clearly, this holds for vertices in R, thanks to the out-tree T R . Pick an arbitrary vertex v ∈ R.
Then v ∈ L i for some i = 1, . . . , K. We know that f (v C , t) = 1, so there exists u such that f (u O , v C ) = 1. Therefore, v is connected in G m to a vertex from R ∪ L i+1 ∪ . . . L K . Since v in G m has an in-neighbor either in R or in a layer with a higher index, we can conclude that there is a path in G m from R to v. Hence m indeed contains the required outbranching.
Finally, it can be easily checked that d(m) = cost(f ) + d(T R ), what concludes this proof.
Let m be a feasible solution for I. Let R, L i for i = 1, . . . , K + 1 be sets of vertices and δ an out-tree sequence on R, as in Algorithm 2. We say that m is compliant with R, L 1 , . . . , L K+1 and δ when m contains an outbranching T rooted at root such that R K (T ) = R, L i (T ) = L i for i = 1, . . . , K + 1 and δ is equal to the outdegree sequence of T [R]. Consider a minimum cost full flow f in F that is found by Algorithm 2 for a choice of R, L 1 , . . . , L K+1 , δ. The claim above implies that cost(f ) + d(T R ) ≤ d(m). However, notice that we do not claim that cost(f ) is the cost of optimal completion of T R consistent with all guesses, as the intuitions we described earlier might suggest. It could be the case that in the solution resulting from f , a vertex which was guessed to belong to L i does not have any out-neighbor that was guessed to belong to L i−1 , which would mean that this vertex should be in an earlier layer. However, that is not an issue for the extraction of the global optimum solution of I, because we may get only better solutions than the optimum completion for that particular guess.