Maximum flow problem in the distribution network

In this paper, we are concerned with the maximum flow problem in the 
distribution network, a new kind of network recently introduced by Fang and 
Qi. It differs from the traditional network by the presence of the $D$-node 
through which the commodities are to be distributed proportionally. Adding $ 
D $-nodes complicates the network structure. Particularly, flows in the 
distribution network are frequently increased through multiple cycles. To 
this end, we develop a type of depth-first-search algorithm which counts and 
finds all unsaturated subgraphs. The unsaturated subgraphs, however, 
could be invalid either topologically or numerically. 
The validity are then judged by 
computing the flow increment with a method we call the 
multi-labeling method. Finally, we also provide a phase-one procedure for 
finding an initial flow.

In a manufacturing case, D-nodes could be a distilling-like operation such as refining crude oil. See Figure 1 for an example. The D-shaped nodes here represent the distillation or cracking process, while the C-shaped nodes stand for the combination or synthesis processes. In this example, there are three S-nodes: S 1 is the oil field outputs; S 2 is the LPG or natural gas used to fuel the highly energy-intensive refining processes; S 3 provides hydrogen for hydrocracking. Among thousands of resulting products, we name only some representative ones which include butane, asphalt, gasoline, kerosene, lubricants, etc. Notice that petroleum products are usually distributed proportionally from the refinery. The proportionality depends on the geographical location, customer demand, and seasonal needs. Therefore, the whole manufacturing process can be modeled as a multi-commodity network problem with D/C-nodes and there is a recent result on the minimum cost flow problem by Mo et al. [10].
On the other hand, D-nodes could be used to describe the proportional allocation of a particular commodity. This might happen when a non-proportional distribution is not fair and a regulated proportional distribution is more desirable. Examples include water resources in a drought area, power supply in a highly energy-intensive industrial park, or the investment breakdown for a bunch of projects. Fang and Qi's original work in [5] deals with the minimum cost flow problem of this kind. For this reason, it is called the distribution network.
In this paper, we are concerned with the maximum flow problem of the distribution network. A feasible flow x in G is a rational-valued function x : A → Q + satisfying l∈E(i) x li l k ij x li Figure 2. An example of a D-node 3) where E(i) = {l ∈ N : (l, i) ∈ A}, L(i) = {j ∈ N : (i, j) ∈ A}. At an S-node s, we use an inward pseudo-arc ( , s) ∈ (N P S , N S ) to express the sources supplied at s. The flow x s on ( , s) has the upper limit u s . Similarly, we use an outward pseudo-arc (t, ) ∈ (N T , N P S ) to express the demand at a T -node t. The flow x t on (t, ) must satisfy the minimum demand d t . On any D-node i, we suppose E(i) has one single node l, and the outgoing arcs from i satisfy the proportional constraint (1.4). See Figure 2 for an example of a D-node. Notice that equations (1.1)-(1.3) are conservation constraints, while (1.5)-(1.7) are upper/lower-bound restrictions on arcs. Given a flow x, the value of x is defined to be ν(x) = t∈N T x t . Therefore, the maximum flow problem in the distribution network takes the following LP form: where F is the set of all flows satisfying constraints (1.1)-(1.7). We assume that problem (P ) is always feasible throughout the entire discussion. See Figure 3 for an example of the distribution network. It has one S-node, five T -nodes, as well as a couple of O-nodes and D-nodes in between. For each arc (i, j), there associates a triplet (u ij , d ij , x ij ) indicating the upper bound, the lower bound, and the current flow value, respectively. The ratios for the proportional constraints can be read from the network. For instance, through the node D 1 one has to distribute 80% of the inflow to O 5 and 20% of which to O 6 . In literature, there are some special flow problems related to (P ). The generalized flow problem (e.g., see [1]) has gain factors (k e > 1, e ∈ A) or loss factors (0 < k e < 1) for arcs in A. If one unit flow from node i to node j is sent through the arc (i, j), k ij units will arrive at node j. In particular, if the arc (i, j) is lossy, namely, 0 < k ij < 1, one may introduce a dummy sink t and set k it = 1 − k ij to make the node i a "D-like"-node of two outgoing arcs. This formulation is not the same   Figure 3. A distribution network as our D-node since the dummy sink t, having no further outlets, violates the flow conservation required by our model. On the other hand, Cohen and Megiddo [4] discussed a class of parametric flow problems in which the fixed ratios flow problem is similar to (P ). This problem imposes an equivalence relation Q on A such that for every pair (e i , e j ) ∈ Q, x ei = k ij x ej . In the distribution network, let arcs which are incident with a common D-node be in the same equivalence class and also every single arc not adjacent to any D-node be itself an equivalence class. Then, (P ) becomes a fixed ratios flow problem. Nevertheless, the result in [4] does not help much since (1) the algorithm is strongly polynomial only when the number of equivalence classes is a constant. This is surely too restrictive; (2) the strongly polynomial algorithm was proved through a sequence of reduction from other problems. A practical and implementable version of it was not studied.
It turns out that a network algorithm of (P ) may not be easy to get. The Dnodes introduce a great degree of dependency among a cluster of arcs so that search algorithms on the network frequently generate cycles. Figure 3 demonstrates the complication where the flow can (indeed must) be increased through a cycle. It requires to reduce the amount to T 5 by 4 units so as to obtain the net gain of 12.
Our goal is to generate every possible "unsaturated subgraph" G M of this kind as in Figure 3 so that the flow can be increased from s to t iteratively until there are no more such subgraphs. We implement a depth-first-search-based algorithm to search on the associated residual network. The strategies of searching for G M will be discussed in Section 2. In Section 3, we discuss cycles, some of which are valid but some are not. In Section 4, a system of linearly homogeneous equations is used to either solve the flow increment on G M or conclude that it is invalid indeed. In Section 5, we illustrate the importance of searching orders with which every possible G M will not be missed. Section 6 describes our algorithm step by step. In Section Figure 4. Four types of decisions at i 7, a phase-one procedure based on the "reverse-run" of the algorithm is given to find an initial flow. Finally, in Section 8, we conclude the paper.
2. Basic strategies of searching. Given a feasible flow x in G, construct the residual network G x = (N, A x , u x ) as follows. Let where u x (a) is the residual capacity of the arc a with respect to the flow x.
With the flow x, the algorithm begins with some s ∈ N s which is incident with a pseudo arc ( , s) ∈ A + x . That is, s still has some available resources. Define M = {s}; A M = {( , s)} so that G M = (M, A M ) = (s, ( , s)) is the initial subgraph of the residual network G x . The G M is maintained as a stack. At each stage in searching, the algorithm examines arcs connecting with the topmost node in the stack and includes into G M only one arc that keeps two major network properties, the flow conservation and the proportionality at D-nodes. For example, if the topmost node is i and the algorithm is about to include the arc (i, j), we push (j, (i, j)) into G M . Similarly, we push (j, (j, i)) into G M for including the reverse arc (j, i). For convenience, denote the topmost entry in G M as topM ost(G M ) = (j, (i, j)) (or (j, (j, i))); the top most node as topM ostN ode(G M ) = j; the top most arc as topM ostArc(G M ) = (i, j) (or (j, i)); and finally denote the second to the topmost node to be preT opN ode(G M ) = i. Also for the convenience in describing the searching technique, we define arcs emanating from M to M to be forward (denoted shortly by F ) whereas a backward arc (B) is one that comes from M to M . In combinations, there are at most four options at the topmost node i: . See Figure 4. The selection is to make any two consecutive pairs in G M meet the flow conservation. Suppose topM ost(G M ) = (i, (l, i)) is of the forward type and i ∈ N S N O . To keep the flow conservation on the residual network G x , one must also choose a forward arc (i, j), , then at i it should be followed by any backward arc (j, i). In other words, the Sand O-node have to be connected by both forward arcs or both backward arcs. See Figure 5.

Figure 6. Four possible ways to enter a D-node
Suppose now i is a D-node. Listed below in Figure 6 are four possible ways to enter a D-node. Cases (3) and (4) do it from the top, while Cases (5) and (6) from the bottom. Due to the special network configuration of a D-node, if topM ost(G M ) = (i, (l, i)) as in Case (3), all the adjacent arcs (i, j k ), k = 1, 2, . . . , n must be of type (F, A + x ). In case (4), all arcs following i must be of type (B, A − x ). In Case (5) . This is very different from the generic augmenting path algorithm on the classical network for connecting only O-nodes [1].
In the following advance phase, we shall apply repeatedly the cases (1)-(6) to grow the unsaturated subgraph G M . At topM ostN ode(G M ), if there are no unmarked (un-visited) arcs of the right type (meaning Cases (1) -(6) above) in G x , topM ost(G M ) is said to be saturated. Otherwise, we include into G M any one of the desired arcs and leave others for the next G M to exploit. This process continues until it hits a pseudo node or forms a cycle (Step 2). In either case, the algorithm pauses temporarily and switches to a "D-cut". The D-cut is an arc (l, k) ∈ G x such that either l or k is a D-node already included in G M but (l, k) itself is not processed yet. If there is no such D-cut, the advance phase comes to a complete stop and the G M is called "complete".
Push any un-marked arc of the right type and its associated node into G M . If there is none, un-mark (release) the arcs adjacent to topM ostN ode(G M ) for later accesses via different paths. Go to the retreat phase (below).  node into G M . Go to Step 1. If there is none, a complete G M is obtained and the advance phase is stopped.
Step 3.: Go to Step 1. In Figure 7, the advance phase first constructs the path, in this order: , in which we used to choose the nearest one, (D 2 , T 2 ), to continue. In Figure 8, we start from S 1 but have to stop temporarily when topM ost(G M ) = ( , (S 2 , )). The search resumes from the D-cut (D, T ).
When there is no un-marked arc of the desired type adjacent to topM ostN ode(G M ), the following retreat phase is applied. The strategy is to pop out the saturated topM ost(G M ) depending on whether preT opN ode(G M ) is a D-node. If it is indeed a D-node, all the arcs in G M generated after that D-node should be released due to the proportional constraint.

The retreat phase
Step 1.: If preT opN ode(G M ) / ∈ N D , mark and pop out topM ost(G M ). Go to the advance phase. If G M = ∅, stop and report the current flow is maximal.
Step 2.: If preT opN ode(G M ) = l ∈ N D , keep popping out the entries in G M one by one until we find topM ostN ode(G M ) = l. Go to Step 1. Note that, marking the removed topM ostArc(G M ) at Step 1 of the retreat phase is a tracer for arcs that have been visited and discarded from topM ostN ode(G M ). The tracer is reset when this topM ostN ode(G M ) is to be released ("un-mark" at Step 1 of the advance phase). The reset is necessary since this topM ostN ode(G M ) could be later included into a different G M .
When a D-node is to be released, we can replace the capacities The updated capacities have now the same proportions as k ij . By this, if any arc of {(i, j)| i ∈ N D , j ∈ L(i)} is saturated, so are the others. The same D-node hence will not be processed repeatedly by way of different entries to it.
3. Cycles. In searching G M , we could generate cycles by adding an arc of the form (M, M ). Unlike the classical maximum flow problem where one has choices to generate (e.g. the pre-flow push method [1]) or not to generate (e.g., the labeling Nevertheless, cycles can have an obvious defect in the configuration, which we call topologically invalid. In Figure 10, the cycle (l, i, j, l) consists of only O-nodes. Due to the conservation law, the cycle must have the 0-flow and then becomes useless. On the other hand, the directed cycle (D 1 , D 2 , O 1 , D 3 , D 1 ) is topologically invalid since the arc (D 3 , D 1 ) is of the right type for D 3 but is of the wrong type for D 1 . See Figure 11 for more examples of invalid cycles. On the left, the cycle (O 1 , O 2 , D, O 1 ) can be thought as an integrated node with only two incoming arcs but no outgoing arcs. The conservation law will certainly make the flow in the cycle be 0-increment. On the right of Figure 11, the cycle is topologically legitimate but In general, we are not able to identify all patterns of invalid cycles. However, we observe that the invalid cycles, topologically or numerically, result in 0-increment. Our strategy is to always form the cycles and continue the search from the (nearest) D-cuts until a complete G M is generated. The validity of G M can be determined after the flow increment on a complete G M is computed. (next section) In Figure  More generally, we shall call two arcs (i 1 , j 1 ) and (i 2 , j 2 ) compatible, denoted by (i 1 , j 1 ) ∼ (i 2 , j 2 ), if they are on the same compatible path. It is easily seen that the compatibility "∼" defines an equivalence relation on A M , which is thus partitioned into several compatible components. See the right picture in Figure 13 where there are four compatible components. In each component, if the flow value is determined for one arc, so are the others. For this reason, the flows on each component can be expressed in terms of a single independent variable (label), so we call it the multiple labeling method.
In Figure 14, the four components join with each other at three dividing nodes. By the conservation law, at S we have c + b = a. At O 1 , we have 0.4a = 0.5b + 0.3b and at T 1 , we have 0.6a = d. They can be further simplified to become b = c = 0.5a and d = 0.6a with which the four components are represented by just one variable a multiplied by a constant, say r ij a. Determine the value a by solving r ij a u ij with the residual capacity u ij . Taking the minimum of uij rij yields the largest possible flow that can be increased on G M . In this example, a = 20.  Figure 14. All arcs in G M can be expressed by r ij a (left) and take the minimum of uij rij to get x ij (right) In general, if there are n compatible components and m dividing nodes, the conservation law at the dividing nodes gives a system of m linear homogeneous equations with n non-negative unknowns. This system has at least one trivial solution (0, 0, . . . , 0). But in order for G M to be valid and useful, we need the system to have a positive solution. To this end, it is necessary that m n − 1, providing that the system is of full rank.
Suppose that m n. Then, choosing n linear independent equations out of m and solving the n unknowns will give a unique zero solution. Figure 15 is such an example in which m = n = 2. The two dividing nodes are S and O 1 and the two equations are a + b = a and 0.4a = 0.6a. The unique solution is a = b = 0. Figure 15 also provides an interesting example for the dividing node which we call degenerate. From the definition of compatibility, two compatible components must be connected at dividing nodes. This example shows the converse is not necessarily true. The dividing node O 1 in Figure 15 is not the junction node of different components. The conservation constraint at O 1 forces the flow to be 0.
If m = n − 1, the system has one degree of freedom and the flow on G M can be expressed by one single variable. Unfortunately, the non-negativity restrictions could still be violated. Figure 16 provides such an example. On the left is the original network, whereas on the right is the G M . Node S is the only dividing node and there are two components. The equation at S is b = −5a. Only (a, b) = (0, 0) satisfies the non-negative restrictions, so the G M is not valid.
Finally, we briefly discuss the case m n − 2 which gives us at least two degrees of freedom to determine the flow increment. Using our strategy of searching, we can not find a particular G M having this phenomenon, nor can we prove that it is not possible. We can only conjecture that it is very unlikely since we form the cycle at the dividing node without penetrating to go on. Besides, if we do have two degrees of freedom from the equations, it might indicate that there are two chunks of components that are independent of each other and can be treated separately.  T1  T2  T3  T4  T1 T2 T4 Figure 16. An example of the negative total flow increment 5. Searching orders among D-cuts. In the advance phase when an O-node is encountered, every adjacent arc in G x has to be checked and marked to guarantee a unique visit. However, when meeting a D-node, we arbitrarily choose one adjacent arc to continue and leave the others as D-cuts which must be all checked in whatever order. It is surprising to see that some valid G M can be obtained only if a particular order among the D-cuts is followed. To illustrate the importance of searching orders, let us work out an eccentric example as shown in Figures 17 -19. The numbers circled nearby the arcs indicate the searching order in constructing the G M . Therefore, when retreating a D-node from a complete G M , it is necessary to enumerate all possible searching orders among the D-cuts. For example, the D-node in Figure 20 has three D-cuts. The six combinations have to be totally examined before we can safely discard it in the retreat phase. In general, if a D-node l has ε adjacent arcs, there are ε − 1 D-cuts and (ε − 1)! searching orders to explore.
6. The algorithm. Our search method is a depth-first-search-based algorithm. It claims the maximum flow x after exploiting every possible G M on G x . The data structure of G M is a stack. Each stack, when finished, is like a path from the root to a leaf in a depth-first-search tree. Starting from G M = (s, ( , s)) (the root), the advance phase repeatedly searches for an unsaturated arc until it hits either a T -node, or a S-node, or forms a cycle. If there is a D-cut, the search has not reached a leaf yet and has to be continued from any one of the D-cuts. Otherwise, the search is complete and a path from the root to a leaf is found.
The retreat phase is equivalent to the backtracking mechanism of the depth-firstsearch algorithm. It applies either when the top most entry in G M is saturated or when it is already completed but the resulting G M is invalid. To distinguish, we set the former to F lag = U C, while the latter F lag = C. In the retreat phase, if the preT opN ode(G M ) is not a D-node, it pops out the topM ost(G M ) and starts off the advance phase from the earliest possible un-marked arcs of the right type (Step 3 of the following algorithm). If preT opN ode(G M ) is a D-node, it pops out every entry in G M all the way to the place where the D-node was first visited. Then, if it is of F lag = C, we have to check all other orders among the D-cuts and resume the advance phase (Step 5). Otherwise, for F lag = U C, we continue the retreat phase by going back to Step 3. The reason is that, if one searching order could not successfully lead to a leaf, it must be saturated at some Below is the step-by-step procedure of our algorithm.

The Algorithm
Step  If F lag(l) = U C, Go to Step 3. Otherwise, go to Step 5.
Step 5.: (Backtracking from D-cuts using different orders) If there is any unused order for the current D-node, pick one and go to Step 1. Otherwise, go to Step 3.
Step 6.: (Compute the flow increment) Use the multi-labeling method in Section 4 to determine the flow increment on G M . If the value is positive, update the current flow to come out with a new residual network. Go to Step 0. If the value is 0, set every D-node in G M with F lag = C and go to Step 3.
The backtracking strategies in Steps 3 -5 carefully check backward for a new search off the nearest available node in the stack under different situations. The maximum flow x is found when the search eventually ends with G M being an empty set. This guarantees a complete search for all the subgraphs on the residual network and thus provides the correctness for the algorithm.
In general, we are unable to guarantee that the same G M will never be produced again since the same cycle could be formed through different orientations (G M 's). The depth-first-search is thus in a sense of the data structure, not in a sense of the configuration.
7. Initial flow. In this section, we address the phase-one procedure that finds an initial flow on a distribution network. It is not a maximum flow problem. The difficulty occurs especially in the positive lower bounds d t > 0. Take Figure 21 as an example. The maximum flow is a = 50 but T 2 receives nothing from this allocation whereas T 1 and T 3 obtain something more than enough (d T1 = d T3 = 20). If they were otherwise supplied by an amount between 20 a 25, T 2 would be satisfied as well. This shows that the flow balance among T -nodes, rather than a larger flow value, is more critical in the phase-one.
Let us begin with the 0-flow and suppose some positive lower bounds on the T -nodes are violated. Add dotted arcs to those T -nodes and associate each T -node with a negative number −n, n > 0 to indicate that the current flow on the T -node is short of n. See nodes T 1 , T 2 and T 3 in Figure 21 for an example.
The strategy is to run the searching algorithm "upside-down" starting from any T -node having a dotted arc. It means to use (T, (T, )) as the first entry and search for a complete G M (compared with (S, ( , S)) when starting from S). See Figure 21 for an example in which the flow increment on G M is a = 50.
After updating the flow by a = 50, we found T 2 is the only T -node with insufficient supply. See the dotted arc on the leftmost graph of Figure 22. Starting from T 2 , we apply the advance phase to obtain the path ← T 2 ← O 1 ← T 1 ← with a = 30, as shown in the middle graph of Figure 22. In fact, this path ← T 2 ← O 1 ← T 1 ← reallocates 30 from T 1 to T 2 so as to achieve a feasible flow (the right most graph). It could not be accomplished if the search had started from the S-node, since the bottle neck arc (O 1 , D) is not of the right type. This example shows the necessity of searching from the bottom for finding the initial flow. 8. Conclusion. In this paper, we present a search algorithm to solve the maximum flow problem in the distribution network. It is of the depth-first-search type which explores all searching orders so as to identify every valid G M . Although the search strategy may not be very efficient, the examples we provide are the most representative ones showing the necessity to enforce this kind of complete enumeration on graphs. The algorithm also comes with various graph techniques such as computing the flow increment and finding an initial flow, making it ready for implementation. We believe that the study conducted here is very original and illustrates important graph properties with the appearance of the D-nodes and the cycles. There are several interesting research directions afterward. Finding a more intelligent algorithm to avoid the complete enumeration is definitely at the top of the list. Establishing the fundamental graph relation of the maximum flow -minimum "cut" (with a new definition for the "cut") is also primary among many others.