Computational Semi-Discrete Optimal Transport with General Storage Fees

We propose and analyze a modified damped Newton algorithm to solve the semi-discrete optimal transport with storage fees. We prove global linear convergence for a wide range of storage fee functions, the main assumption being that each warehouse's storage costs are independent. We show that if $F$ is an arbitrary storage fee function that satisfies this independence condition then $F$ can be perturbed into a new storage fee function so that our algorithm converges. We also show that the optimizers are stable under these perturbations. Furthermore, our results come with quantitative rates.

1. Introduction 1.1. Semi-discrete optimal transport with storage fees. In this paper we present an algorithm to compute numerical solutions to the semi-discrete optimal transport problem with storage fees. This problem can be described as follows. Let X ⊂ R n , n ≥ 2 be compact and Y := {y i } N i=1 ⊂ R n a fixed collection of finite points, along with a cost function c : X × Y → R and a storage fee function F : R N → R. We also fix a Borel probability measure µ with spt µ ⊂ X, and assume µ is absolutely continuous with respect to Lebesgue measure.
We want to find a pair (T, λ) with λ = (λ 1 , . . . , λ N ) ∈ R N and T : X → Y measurable satisfying i δy i X c(x,T (x))dµ + F (λ). (1.1) We remark that by taking F to be the indicator function of a point, we cover the classical semi-discrete optimal transport problem where we are given fixed probability measures µ, ν (with µ absolutely continuous and ν discrete) and we want to find a measurable map T so that T # µ = ν and X c(x, T (x))dµ = miñ This problem was first studied in [CJP09] in the case where F (λ) = i h i (λ i ) for some h i , a condition we refer to as "the storage fee function splitting". This condition can be though of as the storage costs between different warehouses being independent. In that setting the authors showed existence and uniqueness under some regularity and convexity conditions and gave a characterization of the optimizer. The problem with non-splitting storage fees in analyzed in [BK19] where the authors found a dual problem with strong duality: Furthermore, it is shown that given a dual maximizer, ψ, the minimizing transport map T can be constructed by sending each point of X to the warehouse of its corresponding Laguerre cell of the Laguerre partition generated by ψ. The minimizing λ is then seen to be given by λ i = T # µ({y i }).
1.2. Informal Overview of Results. In this paper we propose a modified damped Newton method to solve the dual problem and hence construct approximate solutions to the primal problem. We will show global linear convergence and local superlinear convergence along with a quantitative rate. We will require two assumptions on the storage fee function F . The first major assumption is a technical condition that will be satisfied whenever the storage fee function splits, i.e. F (λ) = i f i (λ i ) for some f i : R → R. The second major assumption is that F * needs satisfy a regularity, strong convexity, and non-degeneracy assumption. In order to show that this second assumption is not too constraining, we give a method to perturb any convex F that splits into one that satisfies the assumption. We are then concerned with how this perturbation will effect the associated optimal transport map. To address this, we prove a stability result on the optimizers under perturbations of F which may be interesting on its own. With this our algorithm can find approximate solutions for the semi-discrete optimal transport with storage fees for any convex F that splits, in particular for all of the storage fee functions analyzed in [CJP09]. The major difficultly obtaining convergence is that the functional in the dual problem is only well-conditioned when all of the Laguerre cells have positive mass. Unfortunately, if the initial guess is not very close to the actual solution, a Newton step might end up collapsing a Laguerre cell. In other works on computational semi-discrete optimal transport (such as [MMT18,KMT19]) this was addressed by damping the Newton steps. In the classical case, it turns out that if the cells of an approximate solution are already too small then any sufficiently small step will not make them smaller. Unfortunately in the setting with storage fees this is not true and so if we only use damping to keep the cells from collapsing we will not get global linear convergence. To remedy this we introduce a sub-routine that we call parameter shuffling. Given any approximate solution in which some Laguerre cells are too small, this routine finds another approximate solution in which the sizes of all of the Laguerre cells are bounded from below and for which the error is not larger then the initial error. Our algorithm functions by alternating between Newton steps which reduce the error but might make some Laguerre cells too small and parameter shuffling steps which fix the Laguerre cells and do not increase the error. We remark that our parameter shuffling routine might be of independent interest in that it provides a means to find good initial guesses for any semi-discrete optimal transport algorithm that requires non-empty Laguerre cells (for example the algorithms of [KMT19, MMT18, BK20a]).
1.3. Literature Review. The semi-discrete optimal transport problem with storage fees was first posed in [CJP09]. In this paper they consider the case where the storage function splits and under some regularity and convexity assumptions they prove that there is a unique optimizer and give a characterization of it. The problem is analyzed in greater generality in [BK19] where the authors also provide a dual problem with strong duality. The use of Newton-type algorithms in the semi-discrete setting seems to first appear in [OP88]. Here the authors prove local convergence of a Newton algorithm to solve a semidiscrete Monge-Ampère equation with Dirichlet boundary conditions. Global convergence is established in [Mir15] although without any quantitative rates. Concerning the classical semi-discrete optimal transport problem, experimentally fast algorithms are presented in [Mér11,Lév15] however they do not come with a convergence guarantee. A damped Newton algorithm with a proven quantitative rate of convergence was developed in [KMT19]. This idea is extended in [MMT18] to solve the optimal transport problem in the case where the source is supported on a union of simplices and the target is discrete. An overview of numerics for the semi-discrete optimal transport problem is given in [San15, Section 6.4.2]. Concerning numerics for the semi-discrete optimal transport problem with storage fees to the best of the author's knowledge the only previous algorithm is that of [BK20a]. This paper only treats a very specific case of storage functions and not only the proof of convergence but also the Newton algorithm itself heavily depends on the specific choice of storage fee function. In particular the algorithm does not actually directly maximize the dual problem by searching for the zero of its gradient (indeed the mapping for which a zero is found is not the gradient of any scalar function, see [BK20a, Remark 2.7]) and so our algorithm is fundamentally different from the one presented there. Concerning convergence, that algorithm does have global linear and local superlinear convergence of the same order that ours does.

Notations and Conventions.
In this subsection we collect some notations and conventions that will be used throughout the entire paper. We fix positive integers N and n and a collection Y := {y i } N i=1 ⊂ R n . For any vector V ∈ R k , we will write its components as superscripts so V i is the i-th component of V . We reserve the notation 1 to refer to the vector in R N whose components are all 1. We use V p for the l p Euclidean norm, i.e.
We will use V to refer to the standard Euclidean norm, V 2 . We shall use that notation for the set of admissible weight vectors. For any subsets A, B ⊂ R k we shall use d H (A, B) to denote the Hausdorff distance between A and B. We adopt the notation δ A to denote the indicator function of A i.e.
We shall assume that the cost function c satisfies the following standard conditions: We also assume the following condition, originally studied by Loeper in [Loe09].
Definition 2.1. The cost function, c, is said to satisfy Loeper's condition if for each i ∈ {1, . . . , N} there exists a convex set Y i ⊂ R n and a C 2 diffeomorphism exp c i (·) : See Remark 2.3 below for further discussion of these conditions. We also say that a set X ⊂ R n is c-convex with respect to Y if (exp c i ) −1 (X) is a convex set for every i ∈ {1, . . . , N}.
Definition 2.2. For any ψ ∈ R N and i ∈ {1, . . . , N}, we define the ith Laguerre cell associated to ψ as the set We also define the function G : R n → Λ by G(ψ) := (G 1 (ψ), . . . , G N (ψ)) = (µ(Lag 1 (ψ)), . . . , µ(Lag N (ψ))), and denote for any ǫ ≥ 0, Remark 2.3. Of the above conditions on the cost, (Reg) and (Twist) are standard conditions in the existence theory for optimal transport. Furthermore, (QC) holds if Y is a finite set sampled from from a continuous space, and c is a C 4 cost function satisfying what is known as the Ma-Trudinger-Wang condition (first introduced in a strong form in [MTW05], and in [TW09] in a weaker form).
If µ is absolutely continuous with respect to Lebesgue measure, then the condition (Twist) implies that the Laguerre cells are pairwise µ-almost disjoint. In this case the generalized Brenier's theorem [Vil09,Theorem 10.28], tells us that for any ψ ∈ R N , the map T ψ : X → Y defined by T ψ (x) = y i whenever x ∈ Lag i (ψ) is a minimizer in the optimal transport problem (1.2), where the source measure is µ and the target measure is defined by ν({y i }) = G(ψ) i .
For the remainder of this paper we will also assume that X is compact and c-convex with respect to Y . Furthermore we denote the density of the (absolutely continuous) measure µ by ρ and we assume that ρ is α-Hölder continuous for some α ∈ (0, 1]. Next we set some definitions concerning the function F , representing the cost of warehouse storage. Definition 2.4. A storage fee function, F , is a proper closed convex function from R n → R ∪ {+∞} that is +∞ outside of Λ (in other words dom F ⊂ Λ, where dom F means the effective domain of a convex function).
These are the sufficient conditions for strong duality given in [BK19]. Throughout this note F will always be a storage fee function unless otherwise noted.
where each f i is convex, closed, and proper. Furthermore we require that dom We remark that the condition that dom F ⊂ Λ (and so the addition of the δ Λ (λ) term) does not add any additional assumption. Since in the optimal transport problem with storage fees we minimize over pairs (T, λ) satisfying T # µ = N i=1 λ i δ y i , we must have that λ ∈ Λ since µ was a probability measure. In particular solving the optimal transport problem with storage fee function F + δ Λ will yield the exact same solution as solving it with storage fee function F . For our algorithm to converge we will need some kind of strong convexity of the objective functional. There are two ways that we can get this. Either we can assume that F * has some kind of strong convexity which corresponds to some kind of regularity of F . Alternatively, we can require that the support of µ satisfies some kind of quantitative connectedness assumption. It turns out that for our purposes it suffices to assume that µ satisfies a Poincaré-Wirtinger inequality.
Definition 2.6. A probability measure µ on X satisfies a Poincaré-Wirtinger inequality if there is a constant C pw > 0 such that for any f ∈ C 1 (X), If this holds we will say that "µ satisfies a PW inequality".
We remark that if the support of µ is connected and ρ is bounded away from 0 on support of µ, then it is classical that µ satisfies a PW inequality. Finally we call the objective functional to be maximized Φ defined as We recall that ∇Φ = G − ∇F * , see for example [San15, Section 6.4.2]. The conditions (Reg), (Twist), (QC) are sufficient to obtain the C 1,α regularity of G and a P W inequality is sufficient to obtain strong monotonicity of G outside of the direction 1 as long as ψ ∈ K ǫ for some ǫ > 0 (see [KMT19, Theorems 4.1, 5.1]).
2.2. Statement of Main Results. We are now ready to state our algorithm and main results. We start by describing our parameter shuffling routine.
We remark that since G is a monotone function (it is the gradient of a concave function) we can always use a binary search to find the r needed in line 5. Next we describe our modified damped Newton algorithm. Algorithm 2: Damped Newton's algorithm Input: A tolerance ζ > 0, an initial ψ 0 ∈ R N , and an ǫ > 0 such that Step 1: Run the Parameter Shuffling Routine on ψ k with tolerance parameter 2ǫ 0 .
We remark that D 2 Φ(ψ k ) = DG − D 2 F * . The matrix DG(ψ) can be explicitly computed in terms of the Laguerre diagram associated to ψ (see the discussion before Lemma 6.5 in [San15]) and under certain assumptions we will explicitly compute D 2 F * in terms of F (see the proof of Theorem 5.7). Also note that Algorithm 2 will always keep the sizes of the cells bigger than ǫ 4 . The condition that ∇F * ≥ ǫ coordinate-wise on K 0 insures that the cells in the true optimal solution are bigger than ǫ and so that algorithm can converge. We show in Theorem 2.8 that this assumption on ∇F * is not too restrictive.
Our first main theorem is that under certain conditions on F , Algorithm 2 has has global linear convergence and local superlinear convergence. In particular under these assumptions we have that ∇F * ≥ ǫ coordinate-wise.
Theorem 2.7. Suppose F is a storage fee function that splits into f i so that the f i are essentially smooth and twice continuously differentiable on their domains, the f ′′ i are locally Lipschitz on their domains, and each Then Algorithm 2 has global linear convergence and local superlinear convergence of order α, the Hölder constant of ρ.
For our next theorem we show that the assumptions imposed in our main convergence theorem are not too restrictive in the sense that for any storage fee function that splits, there is an approximating storage fee function that satisfies the assumptions.
Theorem 2.8. Let F be a storage fee function that splits into f i . Then for every η > 0 there is a storage fee functionF (explicitly constructed in the proof ) so thatF satisfies the assumptions of Theorem 2.7 and if λ,λ are the minimizers of problems associated to F,F respectively then λ −λ ≤ η.
We remark here that the results of [BK20b] tell us that the optimal transport maps that solve the problems associated to F andF are also close in the sense of L 1 (µ) distance.
2.3. Outline of Paper. In section 3 we show that Algorithm 1 terminates and does not increase error. In section 4 we prove global linear and local superlinear convergence of Algorithm 2 under assumptions given in terms of F * . In section 5 we translate the assumptions on F * back to assumptions on F and use this to prove Theorem 2.7, our main convergence theorem. In section 6 we obtain some results on the stability of the optimizing weight vector under perturbations in the storage fee function. In section 7 we show how to regularize any splitting storage fee function into one that satisfies the assumptions of Theorem 2.7. This then gives a proof of Theorem 2.8. We remark that in this section we also obtain an explicit formula for D 2 F * that may be useful for implementing Algorithm 2. Finally in appendix A we prove a quick lemma that bounds the difference between different coordinates of any ψ ∈ K 0 .

Parameter Shuffling Analysis
In this section we analyze Algorithm 1. Our first proposition shows that it always terminates and gives a bound of how many iterations it can take.
Proof. First of all we claim it is not possible for every ψ i to be increased. More rigorously let A ⊂ {1, . . . , N} be the collection of incidences for which the "if" statement in line 4 evaluates as true in some iteration. We claim that A = {1, . . . , N}. To see this note that if k ∈ A then G(ψ) k ≤ 3ǫ throughout the entire algorithm after line 4 evaluates as true for the index k. Since i G(ψ) i = 1 > 3Nǫ it is not possible for A = {1, . . . , N}. Hence we can fix some j ∈ A and some k ∈ A. We claim that line 6 can execute at most 4L c ∞ ǫ + 1 times for the index k. After just one iteration G k (ψ) > 0 and so by Lemma A.1 we see that ψ k < ψ j + 2 c ∞ . Now note that in each iteration of line 6, G k increases by at least ǫ and so ψ k decreases by at least ǫ L . Hence after iterating line 6, 4L c ∞ ǫ more times we will have decreased ψ k by at least 4 c ∞ , and so we will have ψ k < ψ j − 2 c ∞ which would give us G j (ψ) = 0 by Lemma A.1 which is a contradiction since j ∈ A. Since line 6 can be executed at most 4L c ∞ ǫ + 1 times for each index in A, we conclude that it can in total only be executed at most 4L c ∞ ǫ |A| + 1 ≤ (N − 1)( 4L c ∞ ǫ + 1) times in total and so the result follows.
Next we prove that Algorithm 1 cannot increase the error.
Proposition 3.2. Let ψ in be the input and ψ out be the output of Algorithm 1. Suppose that F * is differentiable, for all k ∈ {1, . . . , N} we have (∇F * ) k ≥ 3ǫ, and that Proof. We analyze what happens in a single execution of line 6 in Algorithm 1. Fix some index i so that G i (ψ) ≤ ǫ and so the if statement of line 4 evaluates to true. Let ψ 1 be the value of ψ before an execution of line 6 and ψ 2 be the value of ψ after. We have and so we get ∇Φ(ψ 2 ) 1 ≤ ∇Φ(ψ 1 ) 1 and so the error does not increase in any execution of line 6. Since line 6 is the only line of Algorithm 1 that changes ψ we conclude that ∇Φ(ψ out ) 1 ≤ ∇Φ(ψ in ) 1 as desired.
that is assumed in the above proposition is equivalent to which says that the Hessian of F * is diagonally dominated.
To see this note that the forward direction follows from dividing by ψ 1 − ψ 2 and limiting as ψ 2 approaches ψ 1 . The backward direction follows from taking the line integral from ψ 2 to ψ 1 .

Convergence of Newton Algorithm
In this section we will prove convergence of Algorithm 2. We will show convergence under weaker assumptions then that of Theorem 2.7, however in this section the assumptions are stated in terms of F * instead of F . We shall wait till Section 5 to find conditions on F itself, that give convergence of our algorithm. We recall [KMT19, Theorem 4.1] which says that under our assumptions G is C 1,α on K ǫ for every ǫ > 0.
Proposition 4.1. Let F be a strictly convex storage fee function. Suppose that there is some ǫ > 0 so that (∇F * (ψ)) i ≥ ǫ for every ψ ∈ K 0 . Set ǫ 0 = ǫ 4 . Furthermore assume that ∇F * is C 1,α on K ǫ 0 . We let L be the sum of the C 1,α constants of ∇F * and G on K ǫ 0 . Next assume that Φ is strongly concave (except in the direction 1) for all ψ ∈ K ǫ 0 . More formally, there is some κ > 0 so that for all ψ ∈ K ǫ 0 where H is the orthogonal projection onto the hyperplane perpendicular to 1. Finally suppose that Then the iterates of Algorithm 2 satisfy Furthermore once we have τ k = 1 we get Proof. We analyze a single iteration of Algorithm 2. Define ψ := ψ k ∈ K 2ǫ 0 . Let as Φ was κ-concave except in the direction of 1 and ∇Φ, 1 = 0. Also define ψ τ = ψ − τ v. Let τ 1 be the first exit time from K ǫ 0 , i.e. min i G(ψ τ 1 ) i = ǫ 0 and τ 1 is the smallest value of τ for which this holds. We have that and so .
Applying Taylor's formula to ∇Φ we get for τ ≤ 1. Now we establish the error reduction estimates. (4.1) gives Again using (4.2) this will be true provided Hence we see that if we set τ k := τ 2 , then the claim is true. Furthermore as the error goes to zero, eventually we must have τ k = 1. When this happens (4.1) gives and so (4.2) gives the super-linear convergence.
Remark 4.2. There are two ways to satisfy the strong concavity assumption on Φ. Either one can assume a PW-inequality in which case we get the strong concavity from the G term. Alternatively one can put some kind of strong concavity assumption on F * . We will see that in the case where F splits into f i this strong concavity assumption on F * will be satisfied assuming some regularity and convexity conditions on the f i .

Relationship between F and F *
This section is mainly about the convex analysis of storage fee functions that split. First we show that the assumption that F splits yields the technical condition on F * that we needed in Proposition 3.2 in order to obtain that Algorithm 1 does not increase error. Next we analyze how the regularity and convexity assumptions on the f i effect the regularity of F * . We then use this to prove our main convergence theorem, Theorem 2.7. We recall some notation and definitions from convex analysis. We start with two definitions from [Roc70, Section 26].
Definition 5.1. Given any proper convex function G we say that G is essentially smooth if the following holds. Let C = int(dom G). We require that C = ∅ and G is differentiable on C. Also if x i ∈ C is a sequence that converges to a point on the boundary of C then we require that |∇G(x i )| diverges to +∞.
Definition 5.2. Given any proper convex function G we say that G is essentially strictly convex if G is strictly convex on every convex subset of {x : ∂G(x) = ∅}.
Next we recall that range ∂G = x∈R n ∂G(x Next we obtain a characterization of the subdifferential of F * purely in terms of F , when F is a storage fee function that splits. This characterization will form the basis of our program to translate conditions on F into conditions on F * .
Lemma 5.4. Suppose that the storage fee function F splits and each f i is strictly convex. Furthermore, assume that F is not the indicator function of a point. Then the system characterizes the subdifferential of F * in the sense that for any pair (ψ, λ) there exists an r ∈ R so that the above system is satisfied if and only λ = ∇F * (ψ).
Proof. Let π i : R N → R denote the projection onto the i-th coordinate. For this proof we shall use the notation We also remark that since each f i is strictly convex, F is strictly convex and so F * is differentiable hence it makes sense to refer to ∇F * . Note that since f i (v) = +∞ if v ∈ [0, 1] we can write where P is the hyperplane that extends Λ (formally P = {λ ∈ R N : i λ i = 1}). Our next objective is to show that the intersection of the relative interiors of the domains of the F i and P is non-empty (or the f i can be modified to make this so). Note that since the f i are proper, and convex their domains are non-empty intervals, say = (a 1 , . . . , a N ) and so F would be the indicator function of a point (a similar argument holds if i b i = 1). Now we can choose λ ∈ P so that λ i ∈ (a i , b i ) whenever a i < b i and λ i = a i if a i = b i . In this case λ i ∈ ri dom f i which implies that λ ∈ ri dom F i . Furthermore λ ∈ P = ri dom δ P . To recount we have proven that ∩ i ri dom F i ∩ ri dom δ P = ∅. Hence by [Roc70, Theorem 23.8] we have that ∂F = ∂F 1 + · · · + ∂F N + ∂δ P . We can now proceed to the proof of the lemma: for any pair (ψ, λ) there exists an r ∈ R so that the above system is satisfied if and only λ = ∇F * (ψ). For the backward direction, by Lemma 5.3 we have λ ∈ P and so λ satisfies (5.2). However since λ ∈ P we have that ∂δ P (λ) = {r1 : r ∈ R}, as P is a plane orthogonal to 1. Hence (5.1) simply says that ψ ∈ ∂F 1 (λ) + · · · + ∂F N (λ) + ∂δ P (λ) = ∂F (λ), but this is given, since we assumed λ = ∇F * (ψ). For the forward direction say that ψ, λ, r is a solution. Because of (5.2) we have that λ ∈ P . Then since ∂F = ∂F 1 + · · · + ∂F N + ∂δ P we see that (5.1) says that ψ ∈ ∂F (λ). Since F * is everywhere differentiable this means that λ = ∇F * (ψ).
Next we use our characterization in order to show that F splitting implies a monotonicity condition on the partial derivatives of F * .
We now use the monotonicity condition on the partial derivatives of F * to obtain the technical condition used in section 3.
Corollary 5.6. Assume F is a storage fee function that splits with strictly convex f i . Let ψ 1 ∈ R n and set ψ 2 = ψ 1 − γe k , for some γ ≥ 0. Then In particular if F * is twice differentiable then Proof. By a direct application of Proposition 5.5 we see that for j = k In particular where the last line follows from Lemma 5.3, as range ∇F * ⊂ Λ implies that ∇F * (ψ 1 ), 1 = ∇F * (ψ 2 ), 1 = 1. This proves the first claim. The second follows easily by taking limits.
In the next theorem we show that convexity and regularity assumptions on F give higher order regularity on F * when F is a storage fee function that splits. In the process we also obtain an explicit formula for D 2 F * in terms of the f i .
Theorem 5.7. Let F be a storage fee function that splits where each f i is an essentially smooth function that is twice differentiable on the interior of its domain and so that each f ′′ i is locally Lipschitz (on the interior of its domain). Furthermore we assume each f i is strongly convex, say where each S i is a compact subset of the interior of dom f i that is constructed in the proof.
Proof. Since f i are proper and convex their domains are non-empty intervals, say dom f i = [a i , b i ]. Since the f i are essentially smooth they aren't indicator functions of points, and so b i > a i . Since the f i are essentially smooth ∂f i (a i ) = ∂f i (b i ) = ∅ (the derivative here is "∞"). Since by assumption the f i are differentiable on (a i , b i ), Lemma 5.4 tells us that λ ∈ ∂F * (ψ) if and only if λ i ∈ (a i , b i ) and there exists r ∈ R so that Since F was strictly convex we have that F * is continuously differentiable. Hence for any fixed ψ there precisely one λ that satisfies the above system (it is ∇F * (ψ)). Hence (looking at the first equation) there is also precisely one value of r that satisfies the system (5.3). We this denote by r(ψ). Our next step is to apply the implicit function theorem to deduce the differentiability of ∇F * (ψ). Let H(ψ, λ, r) = (f ′ 1 (λ 1 ) + r − ψ 1 , . . . , f ′ N (λ N ) + r − ψ N , i λ i − 1) encode the system in the sense that H(ψ, λ, r) = 0 if and only if (ψ, λ, r) satisfies (5.3). Let J be the Jacobian matrix of H with respect to (λ, r). If we can show that J is invertible then by the implicit function theorem ∇F * (ψ) and r(ψ) will be continuously differentiable. Furthermore by the implicit function theorem we will have ∂ ∂ψ j (∇F * ) = −J −1 ∂H ∂ψ j = −J −1 (−e j ) = J −1 e j and so D 2 F * is just J −1 with the last row and column removed (these correspond to the r terms). We proceed to compute J and its inverse. Direct computation shows that where empty entries are interpreted to be zero. Let It is now easy to see that J −1 = L + R where L is the diagonal matrix with diagonal entries l 1 , l 2 , . . . , l N , 0 and Hence J was invertible after all. Note that we needed f ′′ i (λ i ) > 0. At this point we have proved that F * is C 2 and r(ψ) is C 1 . The only thing left to do is to obtain the Lipschitz estimate on D 2 F * . However (as seen in the expression for J −1 ) this strongly relies on the local Lipschitz constant for f ′′ i . Since f ′′ i is only locally Lipschitz we need to obtain control of the possible values of ∇F * (ψ) and assure that they never approach the boundary of the domain of any f i . Thankfully as we will see in Lemma A.1 the assumption that no cell collapses (ψ ∈ K 0 ) gives us enough compactness in the ψ's. Now fix some ψ ∈ K 0 and let λ = ∇F * (ψ) and r = r(ψ) be the solutions to our system, Now since F is not the indicator function of a point, we have that i b i > 1 and so B > 0.
where we have used the monotonicity of f ′ k (recall the f k are convex). Sinceψ and ψ differ by a multiple of 1 we have G(ψ) = G(ψ) and soψ ∈ K 0 . Hence by Lemma A.1 we get , a symmetric argument gives that ) + 2 c ∞ is a constant that depends only on c and the f k . Now, recall from the system (5.3) thatψ i = f ′ i (λ i ). Hence we have obtained that for any ψ ∈ K 0 , |f ′ i (∇F * (ψ))| ≤ C 1 . Since f ′′ i is locally Lipschitz and essentially smooth there is a constant C f i that depends only on f i and C 1 so that |f ′′ With this we finally get that for any ψ 1 , ψ 2 ∈ K 0 , We may now return to the problem of controlling D 2 F * .
Since D 2 F * is just J −1 with the last row and column punctured we get, D 2 F * = S + T where S, T arise from L, R respectively by removing the last row and column. We are now in a position to obtain the C 0,1 bound on D 2 F * . Fix some ψ 1 , ψ 2 ∈ K 0 . Let λ 1 , λ 2 equal ∇F * (ψ 1 ), ∇F * (ψ 2 ) respectively. For i ∈ {1, 2} we write S i , T i , Q i , l j i for the S, T, Q, l j corresponding to λ 1 , λ 2 . First note that since the f i are strongly convex with parameter η, F is strongly convex with parameter η. Hence ∇F * is Lipschitz with constant 1 η . Hence The S's are easy to bound: For the T 's, consider the functions g ij : R n → R, l → T ij (l) = Q(l)l i l j = l i l j ( k l k ) −1 . We see that g ij is continuously differentiable outside the origin and In particular since l i ( k l k ) −1 ≤ 1 and l j ( k l k ) −1 ≤ 1 we have that ∂g ij ∂(l m ) ≤ 3 and so g ij is Lipschitz with constant 3 √ N . Hence we get Putting it together we get Using our explicit expression for D 2 F * , we prove a quick corollary which gives invertibility of D 2 F * except in the direction of 1.
Corollary 5.8. Let F satisfy the assumptions of Theorem 5.7. Then for all ψ ∈ R N we have that ker D 2 F * (ψ) = span 1.
Proof. Fix some ψ ∈ R N and let v ∈ R N be some vector where v = i v i e i for some v i ∈ R. We recall from the proof of Theorem 5.7 that we can split D 2 F * (ψ) into the sum of two matrices S, T where S is the diagonal matrix with elements l 1 , l 2 , . . . , l N and (l 1 ) 2 l 1 l 2 · · · l 1 l N l 1 l 2 (l 2 ) 2 · · · l 2 l N . . . . . . . . . . . .
Since each l j = 0 we see that v ∈ ker D 2 F * (ψ) if and only if for all j ∈ {1, . . . , N} we have that v j = k l k v k i l i which occurs if and only if all of the v j are equal, i.e. v ∈ span 1. Finally we apply Theorem 5.7 to prove our main convergence result.
Proof of Theorem 2.7. We need to verify all of the conditions of Proposition 4.1 are satisfied. First note that by Theorem 5.7 we have that F * ∈ C 2,1 (K 0 ). Next since dom f i ⊂ [ǫ, 1] we see that dom F ⊂ [ǫ, 1] N . Hence by [Roc70, Corollary 23.5.1] we have that range ∂F * ⊂ [ǫ, 1] N . In particular (∇F * (ψ)) i ≥ ǫ for ψ ∈ K 0 . Next by Corollary 5.8 for every ψ ∈ R N there is some κ(ψ) > 0 so that D 2 F * (ψ) ≥ κ(ψ)H where H is the orthogonal projection onto the hyperplane perpendicular to 1 (denotedP ). By Lemma A.1 we have that K 0 ∩P is a compact set so we can choose a uniform κ > 0 so that D 2 F * (ψ) ≥ κH for all ψ ∈ K 0 ∩P . Since D 2 F * (ψ) = D 2 F * (H(ψ)) we get D 2 F * (ψ) ≥ κH for all ψ ∈ K 0 . Finally since F splits by Corollary 5.6 we have that Hence all of the assumptions of Proposition 4.1 are indeed satisfied.

Stability
We now begin working towards the proof of our second main theorem, Theorem 2.8. In this section we will prove that the optimizing weight vector is stable under perturbations in the storage fee functions. We obtain results both in terms of L ∞ perturbations and in terms of perturbations of the domain. To start we set some notation. Let The equality between the minimization and maximization in the above definition is just the classical Kantorovich duality and so the supremum is actually obtained (see [Vil09,Theorem 5.10]). Furthermore, for any storage fee function F let O F (λ) = C(λ) + F (λ).
Lemma 6.1. Let F 1 be a storage fee function and λ 1 be a minimizer in the associated problem. Then for any where C L is the constant described in [BK20a, Lemma A.1].
Proof. Recall from [BK20a, Lemma A.1] that C is strongly convex with constant 1 4C L N . Since F 1 is convex, O F 1 is also strongly convex with the same constant as C. The result now follows from [Nes18, Corollary 3.2.3].
We now use the strong convexity of C to obtain stability under L ∞ perturbations of F . Proposition 6.2. Let F 1 , F 2 be storage fee functions such that dom(F 1 ) = dom(F 2 ). Let λ 1 , λ 2 be the minimizers of problems associated to F 1 , F 2 respectively. Then and so the result follows from the Lemma 6.1.
Finally we show stability under changing the domain of F . In order to quantitatively measure perturbations of the domain of F , we use Hausdorff distance which we denote with d H .
Proposition 6.3. Let F 1 , F 2 be storage fee functions such that dom(F 1 ) ⊂ dom(F 2 ) and F 1 = F 2 on dom(F 1 ). Furthermore assume that F 2 is uniformly continuous on its domain with modulus of continuity ω. Let λ 1 , λ 2 be the minimizers of problems associated to F 1 , F 2 respectively. Then , dom(F 1 ))). Now letψ 2 be a maximizer in the dual problem for C(λ 2 ) so thatψ 2 ∈ K 0 and jψ j 2 = 0. We see that where the final inequality follows from Lemma A.1. Hence , dom(F 1 ))).
Remark 6.4. Note that the assumption that F 2 is uniformly continuous on its domain poses no real added assumption, see Proposition 7.1.

Regularizations
In this section we prove our second main theorem, Theorem 2.8, which shows that for any storage fee function, F that splits there is a storage fee function that satisfies the assumptions of our main convergence theorem and yields an optimizer close to that of F .
Proposition 7.1. Let F 1 be a storage fee function. Define the storage fee function F 2 by Then F 2 is uniformly continuous on its domain and if λ 1 , λ 2 are the minimizers of problems associated to F 1 , F 2 respectively then λ 1 = λ 2 .
Proof. Note that since dom F 2 ⊂ Λ and so dom F 2 is bounded. Furthermore note that F 2 ≤ 2 c ∞ + minλ F 1 (λ) on its domain and so F 2 is bounded on its domain. Since F 2 is a closed, convex function that is bounded on its domain we see that dom F 2 is closed. Hence dom F 2 is compact. Since convex functions are continuous on their domain this shows that F 2 is uniformly continuous on its domain. Next we need to show λ 1 = λ 2 . First we will show that F 1 (λ 1 ) ≤ 2 c ∞ + minλ F 1 (λ). Note that for anyλ ∈ Λ C(λ) = min Hence for anyλ ∈ Λ and so minimizing overλ ∈ Λ gives F 1 (λ 1 ) ≤ 2 c ∞ + minλ F 1 (λ). Hence F 2 (λ 1 ) = F 1 (λ 1 ) and O F 2 (λ 1 ) = O F 1 (λ 1 ). Since F 2 ≥ F 1 pointwise we get for anyλ ∈ Λ, and so λ 1 is indeed the minimizer of the problem associated to F 2 and so λ 1 = λ 2 .
Corollary 7.2. Suppose F 1 is a storage fee function that splits into functions f i,1 . Define f i,2 by Then the f i,2 are proper, convex functions that are uniformly continuous on their domains. Furthermore if we define the storage fee function F 2 (λ) = i f i,2 (λ i ) + δ Λ (λ), then the minimizers of problems associated to F 1 , F 2 are equal.
Proof. The proof is similar to that of Proposition 7.1. If λ 1 is the minimizer in the problem associated to F 1 then all we need to show is that f i,2 (λ i 1 ) < +∞ which is equivalent to showing But this follows because In the remainder of this section we discuss how to take an arbitrary storage fee function F that splits and "regularize" it into a new storage fee function that satisfies the assumptions of Theorem 2.7. Recall that we use d H to denote Hausdorff distance. We start with a technical lemma.
Once this is done a symmetric argument will give the same bound for d H (B ∩ Λ, C ∩ Λ). Our lemma will then follow from the triangle inequality (for Hausdorff distance).
First we handle the case where i a i = 1. Fix some λ ∈ C ∩ Λ and defineλ byλ i = a i so thatλ ∈ A ∩ Λ. Let S = {i : λ i > a i } Note that since i λ i = 1 we have A similar result holds when i b i = 1. Hence we may assume that i a i < 1 < i b i . Again choose some λ ∈ C ∩ Λ and defineλ bỹ Assume without loss of generality that iλ i ≥ 1. Then we defineλ byλ i = (1 − t)λ i + and so the proof follows.
We are now ready to proceed to the proof of the second main theorem.
Proof of Theorem 2.8. We modify F one piece at a time in order to get all of the assumptions satisfied. First, set F 1 = F and f i,1 = f i . We define f i,2 as in Corollary 7.2. Our first task will be to deal with the possibility that some dom f i,2 might be a single point. The only way this is possibly is if f i,2 is the indicator function of a point plus a constant, i.e. if f i,2 = δ {y} + K for some K ∈ R. In this case we set f i,3 = δ [y−η,y+η]∩[0,1] + K. Otherwise we define f i,3 = f i,2 . Now, f i,3 is a convex function with bounded domain and by construction it is bounded on its domain. Hence its domain is a closed interval, say dom f i,3 = [a i , b i ]. Since dom f i,3 is not a single point we have a i < b i . Now in order to define f i,4 we split into two cases. First we consider the case where i a i < 1 < i b i . Choose some ǫ > 0 so that ǫ < min( , η) and ǫ < min i b i . Define The AM-GM inequality shows that f i,5 − f i,6 L ∞ ([c i ,d i ]) ≤ η 2 (d i − c i ) and we see that f i,6 is strongly convex with parameter η 2 d i −c i . Furthermore, it is clear that f i,6 is essentially smooth, and smooth on the interior of its domain. In particular f ′′′ i,6 is locally bounded on the interior of dom f i,6 . Furthermore dom f i = [c i , d i ] and we saw above that min c i > 0 and i c i < 1 < d i . Now we set F j = i f i,j (λ i ) + δ Λ (λ) for j ∈ {2, 3, 4, 5, 6}. F 6 is the promisedF , i.e. we see from the above paragraph that F 6 satisfies the assumptions of Theorem 2.7. All that is left is to prove that the minimizer of the problem associated to F 6 is close to that to F 1 . Let λ i be the minimizers associated to the F i . We have λ 1 = λ 2 by Corollary 7.2. Note that by Lemma 7.3 we have that d H (dom F 2 , dom F 3 ) ≤ 4Nη and so by Proposition 6.3 we get that λ 2 − λ 3 2 ≤ 64C L N 5/2 c ∞ η + 8C L Nω(4Nη) where ω is the modulus of continuity of F 2 . For controlling λ 3 − λ 4 , recall that we split into two separate cases. First we look at the case where we had i a i < 1 < i b i . In this case by Lemma 7.3 we have that d H (dom F 3 , dom F 4 ) ≤ 4Nǫ ≤ 4Nη and so we get again that λ 3 − λ 4 2 ≤ 64C L N 5/2 c ∞ η + 8C L Nω(4Nη). Note that we have used that ω is also the modulus of continuity of F 3 . Now if we are in the other case, i.e. i a i = 1 then note that dom F 3 = {a} where a = (a 1 , . . . , a N ) ∈ R N . Hence we can apply Lemma 7.3 with A = {a} and B = i [c i , d i ] to get d H (dom F 3 , dom F 4 ) ≤ 4η and so λ 3 − λ 4 2 ≤ 64C L N 3/2 c ∞ η + 8C L Nω(4η).
Finally by Proposition 6.2 that λ 4 − λ 6 ≤ 4 C L N i (η + η d i −c i 2 ). Since ω depends only on the initial F (and not on η) and |d i − c i | ≤ 1 (and so is also independent of η) we get the desired result.

Acknowledgments
I would like to thank Kitagawa for helpful comments and suggestions on a previous version of this manuscript.
Appendix A. Bounds on ψ ′ s In this section we give prove a lemma bounding the difference between different coordinates of a ψ that generates a Laguerre diagram where each cell has positive mass.
Lemma A.1. If G(ψ) j > 0 then for all k we have ψ j − ψ k ≤ 2 c ∞ . In particular if ψ ∈ K 0 the for all j, k we have ψ j − ψ k ≤ 2 c ∞ .