A regularized Interior Point Method for sparse Optimal Transport on Graphs

In this work, the authors address the Optimal Transport (OT) problem on graphs using a proximal stabilized Interior Point Method (IPM). In particular, strongly leveraging on the induced primal-dual regularization, the authors propose to solve large scale OT problems on sparse graphs using a bespoke IPM algorithm able to suitably exploit primal-dual regularization in order to enforce scalability. Indeed, the authors prove that the introduction of the regularization allows to use sparsified versions of the normal Newton equations to inexpensively generate IPM search directions. A detailed theoretical analysis is carried out showing the polynomial convergence of the inner algorithm in the proposed computational framework. Moreover, the presented numerical results showcase the efficiency and robustness of the proposed approach when compared to network simplex solvers.


Introduction
The Optimal Transport (OT) problem requires to move a certain distribution of mass from one configuration into another, minimizing the total cost required for the operation.It has been studied extensively, from the early work of Kantorovich [22], to the development of ever faster algorithms for various OT formulations, e.g.[8,19,27,31,36,41].Recently, there has been a growing interest in using Interior Point Methods (IPMs) [17] in applications that involve optimal transport, in particular for very large scale instances of such problems, see e.g.[33,46,48].
A particularly interesting problem is the optimal transport over sparse graphs: in this case, the transport of mass is only possible along a specific subset of connections, which is noticeably smaller than the full list of edges of a fully connected bipartite graph, as it would happen in a standard discrete OT formulation.The use of OT and the Wasserstein distance (i.e. the optimal objective function of the OT problem) is becoming more and more common in many practical applications, e.g.neural networks [20], image processing [21], inverse problems [32] and in the analysis of large complex networks [34].
The specific formulation of the problem is the following: suppose that G = (V, E) is a connected graph with directed edges E ⊂ V × V and weights c ∈ R |E| + .Define the incidence matrix A ∈ {−1, 0, 1} |V |×|E| as if e = (w, v) for some w ∈ V 0, otherwise.
We consider the optimal transport problem in the Beckmann form [12]: where ρ 0 , ρ 1 ∈ {ρ ∈ R |V | : 1 T ρ = 1 and ρ ≥ 0} =: P rob(V ).In the following we will define |E| := n and |V | := m.OT on graphs has been recently studied in [12,13] and, in this formulation, it is similar to the more general minimum cost flow problem on networks [1], which has also seen extensive use of IPMs, e.g.[5,14,37,38].Sparse graphs have on average very few edges per node, which can lead to nearly disconnected regions and seriously limit the possible paths where mass can be moved.As a result, finding a solution to the optimal transport problem on a sparse graph requires more sophisticated algorithms and may be more computationally challenging compared to solving the same problem on a denser graph.In particular, first order methods like the network simplex may struggle and move slowly towards optimality, due to the limited number of edges available, while an interior point method manages to identify quickly the subset of basic variables (i.e. the subset of edges with non-zero flow) and converges faster.
In this work, the authors address the efficient solution of the optimal transport problem (1) considering the Proximal-Stabilized Interior Point framework (PS-IPM), recently introduced and analysed in [6].
As originally observed in [37], when IPMs are used to solve the minimum cost flow problem on networks, the normal form of the related Newton systems is structured as a Laplacian matrix of the graph (defined as the difference of the diagonal matrix of the vertex degrees minus the adjacency matrix) and the iterates of IPM determine the associate weights of this matrix, see also eq.(38) .In [9], this observation was exploited to solve such Laplacian linear systems (which are, in turn, particular instances of symmetric M-matrices) through the fast specialized solution of O(ln m) linear systems involving symmetric diagonally dominant matrices [42].We refer the interested reader to [45] for a survey on fast Laplacian solvers and to [15] for information concerning the distribution of Laplacian's singular values.regularization in order to enforce scalability.The organization of the work and its main contributions can be summarized as follows: • In Section 2, the authors briefly recall the proximal stabilized framework responsible for the primal-dual regularization of the IPMs here considered.
• In Section 3, the authors provide a detailed convergence analysis of the inexact infeasible primal-dual regularized IPM, when a proximal stabilization procedure is used.Moreover, they prove its polynomial complexity.
• In Section 4, the authors prove that the normal form of the related Newton system is naturally structured as a shifted Laplacian matrix characterized by a strict diagonal dominance.Such feature consistently simplifies the factorization of the normal equations and allows the use of standard libraries for the solution of the corresponding linear systems.On the other hand, such factorizations could incur a significant fill-in even when the original graph is sparse, hence limiting the applicability of the proposed approach for the solution of large scale problems.
• In Section 5, to overcome potential scalability issues related to the fill-in mentioned above, the authors propose to generate IPM search directions using sparsified versions of the IPM normal equations.In particular, the original normal matrix takes the form A(Θ −1 + ρI) −1 A T + δI, where ρ, δ are regularization parameters and Θ is a diagonal matrix related to the IPM barrier parameter µ; the authors propose to use a perturbed normal matrix, where the entries of (Θ −1 + ρI) −1 that are sufficiently small (when compared to µ) are set to zero (completely ignoring the corresponding columns of matrix A).This strategy reduces the time required to assemble and solve the normal equations systems, providing a fundamental advantage to the algorithm.
The resulting sparsified linear systems are solved either using a Cholesky factorization (if that displays only negligible fill-in) or using the conjugate gradient method and employing a simple and inexpensive incomplete Cholesky preconditioner.In both these cases either the complete or the incomplete Cholesky factorization remains very sparse, and this translates into an outstanding efficiency of the proposed method.Moreover, the authors are able to interpret the Newton directions generated using sparsified Newton matrices as inexact Newton directions.Relying on the convergence theory developed in Section 3, the authors are able to prove that, under suitable choice of the sparsification parameters, the above described approach gives rise to a polynomially convergent algorithm.
• In Section 6, the authors present experimental results which demonstrate the efficiency and robustness of the proposed approach.Extensive numerical experiments, involving very large and sparse graphs coming from public domain random generators as well as from real world applications, show that, for sufficiently large problems, the approach presented in this work consistently outperforms, in terms of computational time, the Lemon network simplex implementation [26], one of the state-of-the-art solvers available for network problems.

Notation
In the paper, vectors are indicated with bold letters.• indicates the Euclidean norm.I represents the identity matrix and e the vector of all ones.Given a vertex v of a graph G, we denote as deg(v) its degree, i.e. the number of edges that are incident to v. Concerning the variables inside the algorithm, we use a subscript k to indicate the external proximal iteration and a superscript j to indicate the internal IPM iteration.Given a sequence {µ j } j∈N and a continuos function f , the big-O notation O(•) is used as follows: 2 Computational Framework

Proximal-Stabilized Interior Point Method
Let us consider the following primal-dual formulation of a Linear Program (LP): where A ∈ R m×n with m ≤ n is not required to have full rank.Notice that problem (1) is indeed formulated in this way.We solve this problem using PS-IPM [6], which is a proximal-stabilized version of classic Interior Point Method.Broadly speaking, PS-IPM resorts to the Proximal Point Method (PPM) [39] to produce primal-dual regularized forms of problem (2).Indeed, given an approximation (x k , y k ) of the solution of such problem, PS-IPM uses interior point methods to produce the next PPM step (x k+1 , y k+1 ), which, in turn, represents a better approximation of the solution of problem (2).
In this regard, the problem that needs to be solved at every PPM step takes the form min Using standard duality theory, we say that More in particular, the PS-IPM here considered uses two nested cycles to solve problem (2).The outer loop uses an inexact proximal point method [28], as shown in Algorithm 1: the current approximate solution (x k , y k ) is used to regularize the LP problem, which is then solved using an IPM to find the next approximate solution (x k+1 , y k+1 ) ≈ (x * k , y * k ).And indeed, at the inner loop level, an inexact infeasible interior point method is used to solve the PPM sub-problems, see Algorithm 2. Notice that both methods are inexact: the outer cycle is inexact because the sub-problems are solved approximately by an IPM; the IPM is inexact because the Newton systems are also solved inexactly (see Section 2.2 for more details).Notice also that the IPM is referred to as infeasible because the intermediate iterates are not required to be inside the feasible region.We also call the inner loop regularized, because it is a primal-dual regularized version of the original LP (2).
Regularization in interior point methods was originally introduced in [40] and extensively used in [2], as a tool to stabilize and improve the linear algebra routines needed for their efficient implementation.In this work and in [6], the regularization is introduced as a result of the application of the PPM at the outer cycle level.To summarize, in the following we use three acronyms: PPM refers to the outer cycle; IPM refers to the inner cycle; PS-IPM refers to the overall procedure, combining PPM and IPM. 3 Update the iteration counter: k := k + 1.

end Algorithm 1: PPM, outer loop of PS-IPM
Concerning the stopping criteria, we finally highlight that Algorithm 1 is stopped based on the criterion (42).Algorithm 2 instead, is stopped according to the accuracy that is required for the solution of current sub-problem and based on the following natural residual, see [28], of problem (PPM(k)): where

Interior point method
We now focus on the inner cycle and give a brief description of the IPM used to solve problem (PPM(k)).To this aim, we introduce the following Lagrangian function which uses a logarithmic barrier to take into account the inequality constraints ( The KKT conditions that arise from the gradients of the Lagrangian (5) are . . .
Setting s i = µ xi for i ∈ {1, . . ., n}, we consider the following function where σ ∈ (0, 1) is the barrier reduction parameter, S = diag(s) and X = diag(x).A primal-dual interior point method applied to problem (PPM(k)) relies on the use of Newton iterations to solve a nonlinear problem of the form F µ,σ k (x, y, s) = 0, x, s > 0. A Newton step for (6) from the current iterate (x, y, s) is obtained by solving the system i.e., the following relations hold: A∆x + δ∆y = ξ p (9) where (∆x, ∆y, ∆s) is the Newton direction to be taken at each iteration (with an appropriate stepsize).The solution of ( 7) is delivered by the following computational procedure where Θ := XS −1 .Before continuing let us give basic definitions used in the remainder of this work.

Definition 3. Normal Matrix:
Neighbourhood of the infeasible central path: The neighbourhood here considered is standard in the analysis of infeasible IPMs [47]: it requires the iterates to be close enough to the central path (according to parameters γ and γ), and the primal-dual constraint violations to be reduced at the same rate as the complementarity product x T s.Within this neighbourhood, x T s → 0 guarantees convergence to a primal-dual optimal solution.
Moreover, we consider an inexact solution of the linear system (11): where C inexact ∈ (0, 1) and we defined It is important to note that the above Assumption 1 is a non-standard requirement in inexact Newton methods [23,11].Its particular form is motivated by the use of IPM and the needs of the complexity analysis in Section 3. It is chosen in agreement with the definition of the infeasible neighbourhood (15) of the central path of the sub-problem considered.Using ( 12) and ( 16) in ( 9), we have whereas equations ( 8) and ( 10) are satisfied exactly.Therefore the inexact Newton directions computed according to (16) satisfy: i.e. x j k (α) is the point reached from x j k after a step of length α along the Newton direction.Notice that, after selecting the correct stepsize α j k , we define We report in Algorithm 2 a prototype IPM scheme for the solution of problem (PPM(k)).The fundamental steps involved in the algorithm are: computing the Newton direction by solving (17) with a level of inexactness that satisfies (16), see Line 2; finding the largest stepsize that guarantees to remain inside the neighbourhood and to sufficiently reduce the complementarity products, see Line 7; preparing the quantities to be used in the next iteration, see Lines 8-9.
We study the convergence of Algorithm 2 in Section 3. Concerning the notation, recall that the subscript k is related to the iteration count of the outer Algorithm 1 (PPM) whereas the superscript j is related to the iteration of the inner Algorithm 2 (IPM).To avoid over-complicating the notation, notice that in the following we use ξ j p,k and ξ j d,k instead of (ξ p ) j k and (ξ d ) j k .

Convergence and complexity
In this section, we show that the particular inexact IPM in Algorithm 2, used as inner solver in Algorithm 1, is convergent.Moreover, at the end of the present section, we show that such IPM converges to an ε−accurate solution in a polynomial number of iterations.Our implant of the proof is inspired by the works [3,4,7,16,24,25] but consistently differs from the hypothesis and techniques used there.
The PPM iteration counter k is fixed through this section and, for the sake of readability, is used only when writing the fixed PPM iteration (x k , y k , s k ) and not in the context of the IPM iterations (x j k , y j k , s j k ).We start from analysing the progress made in a single Newton iteration.Using ( 6), (7), and ( 8) we obtain whereas, using ( 6) and ( 17) we have The last block equation in (17) yields (s j ) T ∆x j + (x j ) T ∆s j = −(x j ) T s j + σnµ j = (σ − 1)(x j ) T s j (22) and Finally, using (22), we state the following identity With the next Theorem 1 we prove that Algorithm 2 is well-defined: at each iteration, there exist a non-empty interval of values for the stepsize α such that the next iterate still lies in the neighbourhood N k (γ, γ, γ p , γ d ) and such that the complementarity product (x j k ) T s j k is reduced by a sufficient amount, as required in (19).
If the stopping conditions at Line 4 of Algorithm 2 are not satisfied, then there exists 0 < αj < α * ,j such that conditions (19) are satisfied for all α ∈ [0, αj ].
Proof.In this proof we omit also the IPM iterate counter j, i.e. (x j , y j , s j ) ≡ (x, y, s).Let us define the following functions, for all i = 1, . . ., n Using (20) in the expressions of g d (α) we have whereas using (21) in the expressions of g p (α) we have We start proving that there exists αj > 0 such that for all i = 1, . . ., n and for all α ∈ [0, αj ].In the following we will use exensively the identity (24).We have Since x T s > 0, using a simple continuity argument, we can infer the existence of a small enough and hence there exists a small enough fi > 0 s.t.fi (α) ≥ 0 for all α ∈ [0, fi ].Concerning h(α), we have and, since x T s(σ − σ) > 0, there exists ĥ > 0 small enough s.t.h(α) ≥ 0 for all α ∈ [0, ĥ].
Concerning g d (α), we have and hence there exists ĝd > 0 small enough s.t.
Finally, concerning g p (α), we have and hence there exists ĝp > 0 small enough s.t.
From the above implication, using g d (α * ,j ) and g p (α * ,j ), we obtain that i.e. (x(α * ,j ), y(α * ,j ), s(α * ,j )) is a solution of problem (PPM(k)).We have hence obtained a contradiction since we are supposing that Algorithm 2 did not stop at Line 4.
Before proving the convergence of Algorithm 2 we would like to emphasize that the above proof complements and expands [24,Remark 3.1].
The next two results establish that Algorithm 2 converges to a solution of problem (PPM(k)).This is done by establishing that the right-hand sides of the Newton systems are uniformly bounded and by showing that the complementarity product (x j ) T s j cannot be bounded away from zero.
Corollary 1.The right-hand sides of the Newton systems are uniformly bounded.
Proof.As a consequence of Theorem 1, we can suppose the existence of a sequence of iterates {(x j , y j , s j )} j∈N produced by Algorithm 2 s.t.
Since by construction (x j ) T s j ≤ (x 0 ) T s 0 , we have from ( 15) Moreover, we have S j X j e − σµ j e ≤ S j X j e + σµ j e Theorem 2. Algorithm 2 produces a sequence of iterates in N k (γ, γ, γ p , γ d ) s.t.lim(x j ) T s j = 0, i.e. (x j , y j , s j ) converges to a solution of problem (PPM(k)).
Proof.Let us argue by contradiction supposing that there exists ε * > 0 s.t.(x j ) T s j > ε * for all j ∈ N.
Claim 1 There exists a constant C 1 dependent only on n s.t.
[∆x j , ∆y j , ∆s j ] T ≤ C 1 for all j ∈ N.
The proof of this fact follows observing that the Newton matrices in ( 17) satisfy all the hypothesis of [3, Th. 1], i.e. they have a uniformly bounded inverse, and that the right-hand sides are uniformly bounded, see Corollary 1.As a consequence, there exists another constant C 2 s.t.

Polynomial Complexity
In this section we show that the number of iterations needed to reduce µ j below a certain tolerance ε grows polynomially with the size of the problem.Moreover, it is important to note that, from the definition of the central path N k (γ, γ, γ p , γ d ), the same number of iterations is sufficient to reduce also the primal and dual infeasibility below the tolerance ε.
In the following we will omit the index j when this does not lead to ambiguities.It is important to note that the linear system in ( 17) can be written in an alternative form as follows Using [3, Remark 1], we have that where To prove polynomial complexity, we start by bounding the terms that appear in the expression (32).The next two technical results are useful in this sense.
Therefore, we know that for some positive constants C 3 and C 4 .Let us define C 5 := max{C 3 , C 4 }.Now that we have bounded the terms in (32), we look for an upper bound on the norms of the Newton directions that depend polynomially on the size of the problem n.This is a crucial step to find a polynomial lower bound on the minimum stepsize (31) which leads to the polynomial complexity result mentioned at the beginning of this section.Theorem 3.There exists a positive constant C 6 s.t.
[∆x j , ∆y j , ∆s j ] T ≤ C 6 n µ j for all j ∈ N.
The next straightforward technical result specializes the polynomial bound of Theorem 3 to the terms that appear in equations ( 25)-( 29).Corollary 3.There exists a positive constant C 7 such that, for all i We can now apply the previous results and obtain a bound similar to (31), but that depends polynomially on the size of the problem n.This is the last fundamental step before the polynomial complexity result can be stated.Theorem 4.There exists a constant α s.t.α j ≥ α for all j ∈ N and where C 8 is a positive constant.
Finally, we are ready to show that the number of iterations required to reduce µ below a certain tolerance ε is proportional to n 2 .Theorem 5. Algorithm 2 has polynomial complexity, i.e. given ε > 0 there exists K ∈ O(n 2 ln( 1 ε )) s.t.µ j ≤ ε for all j ≥ K. Proof.Thesis follows observing that

Properties of the regularized normal equations system
In this section we show some properties of matrix S ρ,δ that are useful for the analysis performed in the next section.
For the original graph G(V, E), we define the adjacency matrix A ∈ R |V |×|V | such that A ij = 1 if there exists an edge between nodes i and j and A ij = 0 otherwise.Notice that for an undirected graph A is symmetric; we assume that there are no self-loops, so that the diagonal of A is made of zeros.Let us define the degree matrix of a graph D ∈ R |V |×|V | such that D is diagonal and D jj is the degree of node j.Notice that D jj = (Ae) j .
Let us define also the Laplacian matrix of a graph as L ∈ R |V |×|V | such that L = D − A. An important relation between the Laplacian L and the node-arc incidence matrix A is that L = AA T .
Given a diagonal matrix Θ and a parameter ρ, we define the re-weighted graph G Θ as the graph with the same connectivity of G, in which the weight of every edge j is scaled by a factor Θjj 1+ρΘjj .The adjacency matrix of the new graph A Θ has the same sparsity pattern of A, but takes into account the new weight of the edges.The same happens for the degree matrix D Θ .The incidence matrix therefore becomes The new Laplacian matrix thus reads L Θ = D Θ − A Θ and can be written as We have just shown that the normal equations matrix can be interpreted as the Laplacian matrix of the original graph, where the edges are weighted according to the diagonal entries of matrix Θ.This result is important because solving linear systems that involve Laplacian matrices is much easier than solving general linear systems.We summarize this result in the following Lemma.

the Laplacian of a weighted undirected graph and hence, for every
where D Θ and A Θ are the degree and adjacency matrices of the weighted graph.
The next result shows that the normal matrix is strictly diagonally dominant, due to the presence of dual regularization.This property is significant because it assures that the incomplete Cholesky factorization of the normal equations matrix S ρ,δ can always be computed without the algorithm breaking down (in exact arithmetic), see e.g.[29].
The next two technical results are related to the distribution of eigenvalues of the normal matrix.They are used in the next section to show that the inexactness introduced when sparsifying the normal matrix remains bounded.
Proof.The proof is straightforward using Gershgorin's circle Theorems.

Lemma 5. The eigenvalues λ of matrix S ρ,δ satisfy
Proof.Using the Rayleigh quotient, for some vector u and v = A T u, the eigenvalues can be written as The lower bound λ ≥ δ is trivial; the upper bound follows from Lemma 4 and (34).

Sparsification of the reduced matrix
We now propose a technique to reduce the number of nonzeros in the normal equations S ρ,δ , based on the weights of the edges in the re-weighted graph (according to Lemma 2).We then show that this sparsification strategy is sound and produces a polynomially convergent interior point algorithm.
In this section we omit the IPM iteration counter j and we consider all the IPM-related quantities as a function of µ → 0. As IPMs progress towards optimality, we expect the following partition of the diagonal matrix Θ contributed by the barrier term: where the optimal solution (x * , y * , s * ) was defined in (3).Notice that (B, N ) is the partition corresponding to the optimal solution.
Assumption 2. We suppose that the following asymptotic estimates hold and, since when an IPM iterate is sufficiently close to the central path, using (39), we suppose This assumption makes sense given the neighbourhood that is considered, see e.g.[18].Due to Assumption 2, we consider the following asymptotic estimates of (Θ −1 + ρI) −1 The diagonal entries of Θ give a specific weight to each column of matrix A (or equivalently, give a weight to each edge of the original sparse graph, as shown in Lemma 2).The columns for which the corresponding Θ ii is O(µ) have a very small impact on the normal matrix, but still contribute to its sparsity pattern.In order to save time and memory when forming (complete or incomplete) Cholesky factorizations, we propose the following sparsification strategy: we introduce a suitable threshold C t ∈ R + and define We define the µ-sparisified version S Ctµ ρ,δ of S ρ,δ as Notice that this matrix completely ignores some of the columns of A (and some of the edges of the graph).The dual regularization δI guarantees that the resulting matrix is non-singular, irrespective of the level of sparsification chosen.In this paper, we consider using inexact Newton directions produced by solving linear systems with matrix S Ctµ ρ,δ , rather than S ρ,δ .Remark 1.It is important to note that, in general, the sparsity pattern of the matrix S Ctµ ρ,δ depends on the choice of the parameter C t and on the partitioning (B, N ).Indeed, when µ is sufficiently small, we expect that Let us now show how the algorithm is affected by the use of the proposed sparsified normal matrix.Notice that the results presented below depend strongly on two facts: the optimization problem evolves on a graph and thus the normal matrix is a Laplacian, with very desirable properties; the interior point method employs primal-dual regularization.
We start by showing how much the normal matrix deviates from its sparsified counterpart.Theorem 6.The sparsification strategy in (40) produces a matrix which is close to the original S ρ,δ , in the sense that Proof.We have that S ρ,δ − S Ctµ ρ,δ = AE µ A T where E µ is the diagonal matrix defined as Hence we have λ max (E µ ) ≤ Ctµ 1+ρµ .Thesis follows using Lemma 4 and observing that v We now show that the condition number of both matrices is uniformly bounded.This is an important property when using iterative Krylov solvers to find the Newton directions.Lemma 6.When µ is sufficiently small, the condition numbers of S ρ,δ and S Ctµ ρ,δ satisfy Proof.The thesis follows from Lemma 4 and observing that for We now show that the solution of the sparsified linear system is "close" to the solution of the original one, and the bound depends on µ.This result depends on the spectral distribution that was shown in the previous section.
Theorem 7.For all v ∈ R m we have that Proof.Using Lemma 5 and Theorem 6, we have that and hence The next technical result is useful for the proof of Corollary 4.
Lemma 7. ξj p is uniformly bounded, i.e. there exists a constant C 9 > 0 such that for all j ∈ N ξj p ≤ C 9 Proof.For the sake of clarity of notation, we do not include the index j in the proof.
To bound ξp , consider the following estimate We already know the following estimates To estimate (Θ −1 + ρI) −1 X −1 ξ µ,σ , we proceed as in (35): It is straightforward to prove that The remaining terms can be bounded using the properties of the neighbourhood Since µ ≤ µ 0 , we deduce that ξp ≤ C 9 , for some positive constant C 9 .
Finally, we show that, for a small enough constant C t , the inexactness introduced by the sparsification strategy satisfies the Assumption (16).Therefore, an algorithm that includes such a sparsification strategy retains the polynomial complexity of the inexact IPM shown in Section 3.

Corollary 4.
If in Algorithm 2 we generate the search directions using (S Ctµ ρ,δ ) −1 with C t sufficiently small, i.e. if we compute the search directions using (11), ( 12) and (13) where S Ctµ ρ,δ substitutes S ρ,δ , then Algorithm 2 is convergent.Proof.Using Theorem 7, we have Recall (16) and Lemma 5; the thesis follows observing that there exists a constant where the last inequality holds if

Numerical Results
The proposed method is compared with Lemon (Library for Efficient Modelling and Optimization on Networks) [26], an extremely efficient C++ implementation of the network simplex method, that has been shown to significantly outperform other popular implementations, like Cplex, see e.g.[5,48].The network simplex method has been shown [41] to be very competitive against other algorithms specifically developed for discrete OT, while remaining very robust and adaptable to many types of problems.Let us highlight that the other algorithms available in Lemon (cost scaling, capacity scaling, cycle cancelling) produced worse results than the network simplex.
All the computational tests discussed in this section are performed using a Dell PowerEdge R740 running Scientific Linux 7 with 4× Intel Gold 6234 3.3G, 8C/16T, 10.4GT/s, 24.75M Cache, Turbo, HT (130W) DDR4-2933, with 500GB of memory.The PS-IPM implementation closely follows the one from [6] and is written in Matlab ® .The software versions used for the numerical experiments are as follows: Matlab R2022a, Lemon 1.3.1 and GCC 4.8.5 as the C++ compiler.
We stop Algorithm 1, when where Concerning the choice of the parameters in Algorithm 1, we set σ r = 0.7.Moreover, to prevent wasting time on finding excessively accurate solutions in the early PPM sub-problems, we set τ 1 = 10 −4 , i.e. we use as inexactness criterion for the PPM method r k (x k+1 , y k+1 )) < 10 4 σ k r min{1, (x k+1 , y k+1 ) − (x k , y k ) .
Indeed, in our computational experience, we have found that driving the IPM solver to a high accuracy in the initial PPM iterations is unnecessary and usually leads to a significant deterioration of the overall performance.
Concerning Algorithm 2, we set as regularization parameters ρ = 10 −4 and δ = 10 −6 .Moreover, in order to find the search direction, we employ a widely used predictor-corrector method [30].This issue represents the main point where practical implementation deviates from the theory in order to gain computational efficiency.
Finally, concerning the test problems, in all the following experiments we generate the load vector ρ 1 − ρ 0 in (1) randomly and such that the sum of its entries is zero (to guarantee feasibility of the optimization problem), with only 10% of them being nonzeros.Moreover, we fix the weight of each edge at 1.
We test the above mentioned solution strategies on various instances generated with the CONTEST generator [43]; in particular, we considered the graphs pref, kleinberg, smallw and erdrey, with a fixed number of 100, 000 nodes and different densities (i.e.average number of edges per node).Therefore, for these instances, m = |V | = 100, 000 and n = |E| = m • density.
In the upper panels of Figures 1 and 2 we report the computational time of the three approaches for various values of densities (chosen in relation to the properties of the graph), whereas in the lower panels we report the total number of IPM iterations.From the presented numerical results, it is clear that the sparsification strategy, in conjunction with the iterative solution of the linear systems, provides a clear advantage over the use of a direct factorization.As can be expected, the iterative method and the sparsification strategy become more advantageous when the size of the problem (number of edges) increases.On the other hand, it is important to note that the use of the sparsified Newton equations in conjunction with the full Cholesky factorization presents only limited advantages in terms of computational time when compared to the Cholesky factorization of the full Newton normal equation.This is the case because the resulting inexact IPM requires, generally, more iterations to converge (see lower panels of Figures 1 and 2).Advantages of the proposed approach become clearer when the graphs are denser.

Results on randomly generated graphs
In this section, we compare the PS-IPM algorithm, using the sparsified normal equations matrix and the PCG, with the network simplex solver of Lemon.For PS-IPM we use the same parameters as proposed in Section 6.1.The graphs used in this section come from the generator developed in [44] and already used for OT on graphs in [12].This generator produces random connected graphs with a number of nodes varying from 1, 000 to 10, 000, 000 and degrees of each node in the range [1,10], with an average of 5 edges per node.For each size, 10 graphs and load vectors are generated and tested.These parameters closely resemble the ones used in [12].Figure 3 shows the comparison of the computational time between PS-IPM and Lemon: for each size of the problem (indicated by the total number of edges), we report the summary statistics of the execution times using Matlab's boxplot.
For small size problems, Lemon is the clear winner, by two orders of magnitude; however, as the size increases, the performance difference between the two methods reduces and for the largest instance considered, Lemon becomes one order of magnitude slower than PS-IPM.
Figure 4 shows the average computational time against the number of edges (from 5, 000 to 50M ) in a logarithmic scale, the corresponding regression lines and their slopes.Both the proposed method and the network simplex (see [35]) are known to have polynomial complexity in terms of number of iterations (although the estimates are usually very pessimistic for IPMs); from the computational results presented, we can estimate the practical time complexity of both methods.Recall that, in a log-log plot, polynomials of the type x m appear as straight lines with slope m.Using linear regression, we can estimate that the time taken by Lemon grows with exponent approximately 2.06, while the time taken by PS-IPM grows with exponent approximately 1.28, providing a considerable advantage for large sizes.
Looking at the full set of results as reported in Figure 3, let us mention finally the fact that the variance of the computational times over the 10 runs for a given problem size is smaller when using PS-IPM, especially for large sizes, indicating that the method is more robust and less dependent on the specific problem being solved.This is a very desirable property.

Results on SuiteSparse graphs
Results on randomly generated problems do not necessarily represent the ability of an optimization method to tackle problems coming from real world applications.Therefore, in this section, we show the results of applying PS-IPM and Lemon to some sparse graphs from the SuiteSparse matrix collection [10].The characteristics of the graphs considered are shown in Table 1: the number of nodes, edges and the number of edges per node.All the graphs are undirected and connected.Due to the fact that the considered graphs are particularly sparse, in the numerical results presented in this section, we solve the sparsified normal equations using the full Cholesky factorization. Figure 5 shows the computational times for the eight problems considered, Figure 4: Logarithmic plot of the computational time for randomly generated graphs.The time taken by Lemon (red circles) grows as (number of edges) 2.06 ; the time taken by PS-IPM (blue triangles) grows as (number of edges) 1.28 .using PS-IPM and Lemon.Apart from the problem nc2010, which represents a relatively small instance in out dataset, on all the other problems PS-IPM consistently outperforms Lemon in terms of required computational time.In particular, for the problems hugetric-00010, hugetric-00020, hugetrace-00010 and hugetrace-00020, which reach up to 16 million nodes and 48 million edges, PS-IPM is one order of magnitude faster than Lemon.Notice that graphs of these sizes (and larger) appear in many modern practical applications, e.g.social networks, PageRank, analysis of rail/road networks, energy models, to mention a few.
Looking at the regression lines and their slopes, we notice that the time taken by Lemon grows with exponent approximately 2.07 while the time taken by PS-IPM grows with exponent approximately 1.40.These values are very close to the ones found previously for randomly generated graphs.The data of Figure 5 however has a more erratic behaviour than the times shown in Figure 4, because the properties of each graph considered are different and because we are not averaging over 10 different instances of each problem.
Let us highlight also that the time taken by Lemon seems to be more problem dependent, while PS-IPM looks more consistent and robust.
Figure 5: Logarithmic plot of the computational time for the SuiteSparse problems.The time taken by Lemon (red circles) grows as (number of edges) 2.07 ; the time taken by PS-IPM (blue triangles) grows as (number of edges) 1.40 .The instances are ordered as in Table 1.

Conclusion
An efficient computational framework for the solution of Optimal Transport problems on graphs has been presented in this paper.Such framework relies on Proximal-Stabilized Interior Point Method and clever sparsifications of the normal Newton equations to compute the inexact search directions.The proposed technique is sound and polynomial convergence guarantee has been established for the inner inexact IPM.Extensive numerical experiments show that for large scale problems, a simple prototype Matlab implementation is able to outperform consistently a highly specialized and very efficient C++ implementation of the network simplex method.We highlight also that Interior Point Methods are more easily parallelizable than simplex-like methods; for huge scale problems, for which high performance computing resources need to be used, the use of IPMs with proper parallelization may be the only viable strategy to solve these problems.

Figure 1 :
Figure1: Comparison of sparsified and full normal equations approach, using full Cholesky factorization or incomplete Cholesky as preconditioner for PCG, in terms of IPM iterations and computational time, for problems pref and kleinberg.

Figure 2 :
Figure2: Comparison of sparsified and full normal equations approach, using full Cholesky factorization or incomplete Cholesky as preconditioner for PCG, in terms of IPM iterations and computational time, for problems erdrey and smallw.

Figure 3 :
Figure3: Box Plots of the computational times of PS-IPM and Lemon, for randomly generated graphs.The red and black intervals show the spread of the measured computational times (red, on the left, is PS-IPM; black, on the right, is Lemon), while the (blue) crosses indicate the outliers.

Table 1 :
Details of the graphs from the SuiteSparse matrix collection, ordered by increasing number of edges.