Integer Optimal Control with Fractional Perimeter Regularization

Motivated by many applications, optimal control problems with integer controls have recently received a significant attention. Some state-of-the-art work uses perimeter-regularization to derive stationarity conditions and trust-region algorithms. However, the discretization is difficult in this case because the perimeter is concentrated on a set of dimension $d - 1$ for a domain of dimension $d$. This article proposes a potential way to overcome this challenge by using the fractional nonlocal perimeter with fractional exponent $0<\alpha<1$. In this way, the boundary integrals in the perimeter regularization are replaced by volume integrals. Besides establishing some non-trivial properties associated with this perimeter, a $\Gamma$-convergence result is derived. This result establishes convergence of minimizers of fractional perimeter-regularized problem, to the standard one, as the exponent $\alpha$ tends to 1. In addition, the stationarity results are derived and algorithmic convergence analysis is carried out for $\alpha \in (0.5,1)$ under an additional assumption on the gradient of the reduced objective. The theoretical results are supplemented by a preliminary computational experiment. We observe that the isotropy of the total variation may be approximated by means of the fractional perimeter functional.

(P α ) Here, F : L 1 (Ω) → R, which we assume to be bounded below, is the principle part of the objective that is due to the application and R α is a regularizer that will provide desirable features of the solution to (P α ).The scalar α ∈ (0, 1) parameterizes the regularizer R α and in turn (P α ).The specific role of α will become clear soon.In order to clarify possible misunderstandings, since the term polyhedron may or may not imply convexity in the literature, the term polyhedron in this paper is used for sets whose boundaries are unions of convex polytopes.
Recent work [18,23,25,29] motivates and analyzes the use of a total variation regularization of w, which corresponds to a penalization of the perimeters of the level sets because if TV(w) < ∞, where P (A; B) denotes the perimeter of A in B and TV(w) denotes the total variation of the function w, where we assume that w is extended by outside of Ω for feasible w in (P α ), see [23,Lemma 2.1].
We refer the reader to [19] for extensive information on sets of finite perimeter and their properties.The key property of this regularization that is exploited in the aforementioned publications is the compactness it induces on the control space, specifically bounded sequences of feasible points of (P α ), like sequences produced by descent algorithms, have subsequences that converge in L 1 (Ω).
In the multi-dimensional case, d ≥ 2, the discretization of the subproblems that are proposed in [18,23] is challenging.The reason is that the arguments in the analysis of the finite difference or finite element discretizations, as are for example carried out in [4,10,8], use that w can take values in R (or at least conv W , the convex hull of W ) and that the superordinate minimization problem is convex.Both of these features are not available in our setting.Moreover, any piecewise constant ansatz for w on a fixed decomposition of the domain into say polytopes restricts the geometry of the level sets and therefore introduces a potential gap between (P α ) and its discretization.This gap is due to the local structure of the perimeter regularization.Specifically, the information on the perimeter is concentrated on the (reduced) boundary of the level sets and hence the discretization, which has finite d − 1-dimensional Hausdorff measure.The very recent work [26] provides a two-level discretization that can overcome this issue but is computationally expensive and efficient implementations that make use of the underlying structure as in [24] are not available so far.This motivates us to study regularization terms R α that are close to the perimeter regularization, provide compactness in L 1 (Ω), but also have nonlocal properties so that they might give a fruitful computational vantage point because they allow to replace the difficult localized boundary integrals: specifically, R α is given by means of a double volume integral so that its numerical approximation can be improved by improving the quadrature of the volume integrals.Let w be feasible for (P α ) and let E i := w −1 ({w i }) for i ∈ {1, . . ., M }.Specifically, we consider for α ∈ (0, 1), where P α (E) is the so-called fractional perimeter introduced and analyzed in [7,31] and χ E denotes the {0, 1}-valued indicator function of E ⊂ R d , see also [12,13,3].The limit problem that we approximate with (P α ) is the perimeter-regularized integer optimal control problem minimize In particular, this means that our optimization variable w can be modified only on the domain Ω but the regularizer in (P α ), (P) takes into account the boundary of Ω.In other words, w is implicitly extended by zero outside of Ω and the jumps across ∂Ω are counted.We note that we are not the first ones to make steps in the direction of computationally exploiting these properties of the fractional perimeter and particularly point to the works [15,5,3].

Contributions
We make the following contributions.The existence of solutions to (P α ) isin our opinion-not immediate since we are not aware of a Banach-Alaoglu theorem for the Sobolev space W α,1 (R d ) that induces the fractional perimeter as defined in (1).Thus we prove it by means of an argument that exploits the specific structure of our feasible set that only consists of W -valued functions, |W | < ∞.
We show compactness in L 1 (Ω) and Γ-convergence for α ↗ 1.For all α ∈ (0.5, 1), we prove stationarity conditions and asymptotics of a trust-region algorithm parallel to [23].In order to achieve this, we currently require the regularity assumption ∇F (w n ) ∈ C 2 ( Ω) for the iterates w n produced by the algorithm, which is quite strong compared to the regularity ∇F (w n ) ∈ C( Ω) that is required for the perimeter-regularized case, see [23].
We also provide a preliminary and qualitative computational experiment, in which we apply the trust-region algorithm for the choices α = 0.5 and α = 0.9 as well as for R(w The geometric restriction induced by the piecewise constant ansatz for our control functions is visible in the limiting case but this behavior is alleviated for α = 0.5 and α = 0.9.Unfortunately, the subproblem solves in the trust-region algorithm are extremely expensive even for a relatively coarse discretization.Moreover, since computationally tractable discretizations of the subproblems for the limit case are not available so far, it is difficult to interpret and compare the results.Therefore, we emphasize that much more work is needed from a computational point of view in order to provide more efficient discretization and solution algorithms.

Structure of the remainder
After introducing some notation and the necessary concepts regarding modes of convergence and local variations of the elements of the feasible set in Section 2, we provide the existence of solutions in Section 3. Compactness and Γ-convergence are analyzed in Section 4. The analysis of local minimizers and the trust-region algorithm and its asymptotics are provided and analyzed in Section 5. We provide a computational experiment and discuss its implications in Section 6.

Notation and auxiliary results
If not indicated otherwise, we assume that α ∈ (0, 1) is fixed but arbitrary in the whole article without further mention.We denote the complement of a set A ⊂ R d by A c := R d \ A. We denote the symmetric difference between A and a further set B ⊂ R d by A△B.We will frequently use the following reformulation of (1).Let E ⊂ R d , then P α (E) satisfies We immediately obtain that the function P α is submodular, see also (2.1) in [9].
Lemma 2.1.Let α ∈ (0, 1).Let E, F be measurable subsets of Ω.Then and Proof.We consider the formulation of P α (E) from (2) and define g(x, y) := |x − y| −(d+α) for x, y ∈ R d .Then inserting the definitions, elementary computations, and the positivity of g yield which proves (3).In order to see (4), we consider The proof is complete.
We will sometimes switch between the view of a function w that is feasible for (P α ) and the partition of Ω that is given by its level sets.If there is no ambiguity, we will denote the level sets by E i := w −1 ({w i }), i ∈ {1, . . ., M }, without further mention.Let λ denote the Lebesgue measure on R d .
Let α ∈ (0, 1).Then ∇ α f , defined as |y − x| d+1+α dy for x ∈ R d is the so-called fractional gradient [12,30,3] for all f , where the integrand of this integral is an integrable function.We denote the feasible set by

Modes of convergence
The Gagliardo seminorm of the Sobolev space W α,1 (R d ), see [14], with fractional order of differentiability α ∈ (0, 1) corresponds to the fractional perimeter P α as defined above.To the best of the authors' knowledge, this space does not admit a predual so that, in contrast to BV(R d ), there is no weak- * topology that gives existence of limits for bounded subsequences.However, this property can be recovered when restricting to our feasible set F of a.e.W -valued integrable functions.We therefore refer to this property as pseudo-weakly- * in this article.
In Banach spaces that are not uniformly convex, that is they do not have a norm that satisfies a uniform midpoint convexity property, having weak- * convergence or weak convergence together with convergence of the values of the norm does not necessarily imply convergence in norm.Consequently, there is sometimes an important mode of convergence for this situation like so-called strict convergence in BV(Ω).In an analogy to this, we define strict convergence for our setting as convergence in L 1 (Ω) in combination with convergence of the regularizer, which of course implies pseudo-weak- * convergence.
Definition 2.2.Let α ∈ (0, 1) be fixed.We say that {w n } n ⊂ F converges to w ∈ F pseudo-weakly- * in F and write We say that We obtain lower semicontinuity with respect to convergence in L 1 (Ω) and in turn also for our regularizer, which we briefly show below.
In particular, for w n → w in L 1 (Ω) and w n , w ∈ F we obtain Proof.Clearly, the first claim holds if lim inf n→∞ P α (E n ; Ω) = ∞.If this is not the case, we observe that for s = 0.5α and by means of |χ where we have extended χ E with the value zero outside of Ω and where the right hand side is the squared Gagliardo seminorm of the Hilbert space W s,2 (R d ), see, for example, [14].Moreover, the boundedness of the characteristic functions gives χ E n → χ E in L 2 (Ω) too.We infer that all subsequences χ E n k such that P α (E n k ) is bounded converge weakly to χ E .Then the weak lower semicontinuity of the seminorm of W s,2 (R d ) yields the first claim.The second claim follows directly from the first and the definition of R α .

Local variations
We follow the ideas presented in [23] in order to derive stationarity conditions for (P α ) and obtain a corresponding sufficient decrease that in turn allows to prove convergence of a sufficient decrease condition.Both rely on the analysis of a perturbation of the partition E 1 , . .., E M .We introduce such perturbations by means of so-called local variations, where we follow [23], which in turn is based on [19].
Definition 2.4 (Definition 3.1 in [23]).(a) A one-parameter family of diffeo- (c) For a local variation, we define its initial velocity: Proof.We refer the reader to Proposition 3.2 in [23].
Let w = M i=1 w i χ Ei and (f t ) t∈(−ε,ε) be a local variation in Ω.Then we define the functions for all t ∈ (−ε, ε).For the results below, we consider an arbitrary but fixed local variation Proof.This follows because the f t are diffeomorphisms, see also [23, §3].
We mention that local variations also allow for a notion of stability, see Definition 1.6 in [11].
holds with the definitions Proof.By following the arguments in the proof of Lemma 4.1.1 with the choice s = α/2 on p. 182 in [15], we obtain the desired identity Before proving that |C(x, y)| is uniformly bounded, which concludes the proof, we briefly extend an argument in the proof of Lemma 4.1.1 in [15].We first note that by means of Fubini's theorem and the substitution formula, see for example Theorem 263D (v) in [16], we obtain that P α (f t (E)) is finite if and only if is finite, where Df t = I + t∇ϕ.
We do this because we did not immediately see why the o(ε)-term (o(t) in our notation) in its proof that is due to the remainder term of the Taylor expansion of |x−y| −d−α is an o(t)-term even after integrating.Loosely speaking, why were the authors of [15] able to deduce E E c o(t) dx dy = o(t)?Since we did not see the argument directly ourselves, we provide the arguments in more detail below for convenience.They show both that the arguments in [15] are correct and we just required some more steps and, moreover, that the arguments from [15] even allow to prove the claim for the assumed generality P α (E) < ∞.
Applying Lemma 17.4 from [19] and inspecting its proof (note that ϕ has compact support and is Lipschitz), we obtain that there exist t 0 > 0 and functions c 1 (t), c 2 (t) that are uniformly bounded and whose bounds depend only on t 0 and ϕ such that for all t with |t| < t 0 we obtain for all x, y ∈ R d Moreover, we deduce for x ̸ = y where the remainder r(x, y) term of the Taylor expansion of z → |z| for some ξ = ξ(x, y) in the line segment between [x−y, x−y+t(ϕ(x)−ϕ(y)).Because ϕ is compactly supported and Lipschitz continuous with Lipschitz constant L, there exists 0 < t 1 ≤ t 0 so that for all |t| ≤ t 1 it holds that |ξ(x, y)| ≥ 1 2 |x − y| uniformly for all x, y ∈ R d .Combining this with the Lipschitz continuity of ϕ, we deduce Computing the double integral over the remainder term of the Taylor expansion implies for r(t) := E E c r(x, y) dx dy that for some c 3 > 0 and all |t| ≤ t 1 , which gives that r(t) = o(t).With a similar argument, one can deduce that there exists c 4 > 0 such that Combining these considerations with (6) and using that ϕ is compactly supported therein, we obtain that ( 5) is finite and in turn also P α (f t (E)) is finite for all t with |t| ≤ t 1 .
Due to these arguments, we can combine all o(t) terms and obtain the claimed formula for P α (f t (E))−P α (E).It remains to show that |C(x, y)| is uni- ) is globally Lipschitz with Lipschitz constant L ≥ 0, we obtain that the absolute value of the third term is bounded by L(d + α) in combination with the Cauchy-Schwarz inequality.
We note that an alternative proof of this claim follows from the arguments in Section 3 of [20].We also note that the strategy above uses the same basic steps as the arguments in Section 17 of [19] for the limit case α = 1.
be the local variation defined by f t := I + tϕ for t ∈ (−ε, ε).Then there exist 0 < ε 0 < ε and L > 0 such that for all s, t ∈ (−ε 0 , ε 0 ), and measurable sets E, F ⊂ Ω with and in particular, Proof.We follow the proof strategy of Lemma 17.9 in [19].To this end, let , which exists by Theorem A.1 in [12].Then we obtain where we note that the right hand side of the inequality ( 7) is identical to λ(f t (E)∆E) for the choice F = Ω so that the second claim follows from the first in this case.
Because the function u δ is smooth, we can follow the proof of Proposition 3.14 in [12] (which in turn relies on the fractional fundamental theorem of calculus as it is given in Theorem 3.12 in [12]) in order to obtain where y in the proof of Proposition 3.14 in [12] is replaced by ϕ t (x) − x, Hölder's inequality is applied to obtain the term sup x∈R d ∥ϕ t (x) − x∥ α and the constant γ d,α is from Proposition 3.14 in [12] too.Then, the identity sup for some c > 0 and the constant µ d,α from (1.2) in [12].
Proposition 2.9 (Variation of the linearized objective).
Proof.The (first half of the) arguments in the proof of Proposition 17.8 in [19] imply where E div(g(x)ϕ(x)) dx exists and is finite because of the assumed regularity of g.Remark 2.10.We note that the assumed regularity on g for the Taylor expansion of the Lebesgue measure in Proposition 2.5 may deemed to be unrealistically high.Thus improving the required regularity is an important question for further research, in particular, because the much weaker requirement g ∈ C( Ω) is sufficient for proving a Taylor expansion in the non-local case, see [19,Proposition 17.8].We have, however, not found a means to do so until this point.

Standing assumptions and existence of solutions
We provide two standing assumptions on our problem that allow us to deduce the existence of solutions as well as first-order optimality conditions for (P α ) and (P) .
A.2 Let F : L 2 (Ω) → R be twice continuously Fréchet differentiable.For some C > 0 and all ξ ∈ L 2 (Ω), let the bilinear form induced by the Hessian The second part of Assumption 3.1 is rather restrictive.The estimate on the Hessian with respect to the L 1 -norms is, for example, not satisfied if F (w) = 1 2 ∥w∥ 2 L 2 and it implies that F involves some Lipschitz operation that maps L 1 (Ω) to a smaller space.This may be a convolution operator or a solution operator of a PDE, see also the discussions in [18,23,21,22].
The existence of minimizers for the limit problem (P) under Assumption 3.1 follows as in [18,23].The existence of minimizers for the problems (P α ) follows from the compactness and lower semicontinuity properties of nonlocal perimeters, see, for example, [13,Section 3.7].We briefly recap how the existence is achieved below.Proposition 3.2.Let α ∈ (0, 1).Let Assumption 3.1 hold.Then (P α ) admits a minimizer w = M i=1 w i χ Ei Proof.We apply the direct method of calculus of variations.There is a real infimum of (P α ) because all terms of the objective are bounded.We thus consider a minimizing sequence Because all terms of the objective are bounded below, we obtain that the sequences (P α (E n i )) n are bounded.Consequently, the sequences (χ E n i ) n are bounded in a space that, similar to BV(Ω), admits sequential weak- * compactness properties, see [12, Corollary 4.6 and Proposition 4.8], which yields a limit function w = M i=1 w i χ Ei such that χ E n i → χ Ei in L 1 (Ω) for all i ∈ {1, . . ., M }.Because convergence in L 1 (Ω) implies pointwise a.e.convergence for a subsequence, we obtain that the limit sets E i are a partition of Ω except for a set of Lebesgue measure zero.In other words, w ∈ F.Moreover, Lebesgue's dominated convergence theorem gives w n → w in L 2 (Ω).
Because R α is lower semicontinuous with respect to convergence in L 1 (Ω) on our feasible set, see Lemma 2.3, and F is continuous, see Assumption 3.1, we obtain that w minimizes (P α ).
Proof.Because F and the summands in R α are bounded below it follows that sup α↗1 (1 − α)P α (E α i ) < ∞ for all i ∈ {1, . . ., M }, where E α i = (w α ) −1 ({w i }).Consequently, we can apply Theorem 1 in [2] in order to obtain that for all i ∈ {1, . . ., M }, the sequence (χ α Ei ) α is relatively compact in L 1 loc (R d ) and in turn in L 1 (Ω).Recalling w α = M i=1 w i χ E α i , we obtain that the sequence (w α ) α is relatively compact.Because convergence in L 1 (Ω) implies pointwise convergence a.e. for a subsequence, we obtain that the limit is W -valued, see also [18].Proposition 4.2 (Corollary of Theorem 2 in [2]).Let w = M i=1 w i χ Ei for a partition {E 1 , . . ., E M } of Ω into measurable sets E i .Then: 2. there is a sequence Proof.The first claim (lim inf-inequality) follows directly from Theorem 2 in [2].The second claim (lim sup-inequality) is immediate for Then {E 1 , . . ., E M } is a Caccioppoli partition of Ω and [6] asserts that polyhedral partitions are dense in the Caccioppoli partitions, that is there exist sets {T k 1 , . . ., T k M } k ⊂ R d whose boundaries are composed of finitely many convex polytopes such that where the second convergence follows from Corollary 2.5 in [6] in combination with Because of ( 9) and ( 10), we can assume that all T k i are contained in a bounded hold-all domain, e.g., a large ball.Because the sets T k i ∩ Ω are polyhedral (recall that we assumed that Ω is polyhedral at the beginning of the article), we can apply Lemmas 8 and 9 in [2] to them and obtain lim α↗1 Combining ( 10) and ( 11), we choose a suitable diagonal sequence indexed by which implies the assertion.
We note that the proof of Proposition 4.2 is the only point in this article, where we use the assumption that the domain Ω is polyhedral.As a corollary of the compactness established in Proposition 4.1 and the liminf-and limsupinequalities established in Proposition 4.2, we obtain the convergence of global minimizers to global minimizers below.
Corollary 4.3.Let Assumption 3.1 hold.Let α ↗ 1.Let w α be a global minimizer of (P α ) for α.Then there exists a W -valued accumulation point w ∈ L 1 (Ω) with w ∈ BV(R d ), where w is the extension of w by zero outside of Ω, w(x) ∈ W a.e., such that w α → w in L 1 (Ω) and J α (w α ) → J(w).Moreover, for all accumulation points of (w α ) α are W -valued, global minimizers of (P), and their extensions by zero ( wα ) α outside of Ω are in BV(R d ).

Local minimizers and trust-region algorithm
As is noted in [18,23], it makes sense to consider local minimizers and stationary points in the settings of (P α ) and (P) because L 1 -neighborhoods of feasible points contain further feasible points.We briefly translate these concepts from the perimeter-regularized case, see [23], to our setting in Section 5.1.Then we introduce and analyze non-local variants of the trust-region subproblem from [23] in Section 5.2.We introduce and describe a trust-region algorithm that builds on these subproblems in Section 5.3 and prove its asymptotics in Section 5.4.

Local minimizers and stationary points
We start by defining local minimizers and stationary points and then verify that local minimizers are stationary.Definition 5.1.Let α ∈ (0, 1) and let w = M i=1 w i χ Ei be feasible for (P α ).
• We say that w is locally optimal for (P α ) if there is r > 0 such that J α (w) ≤ J α (v) for all v that are feasible for (P α ) and satisfy ∥v−w∥ L 1 ≤ r.
for some ε > 0 be the local variation defined by ) is differentiable at t = 0, which can be seen as follows.Assumption 3.1 implies that for some ξ t in the line segment between w and f # t w.Assumption 3.1 and Lemma 2.8 imply for some large enough C > 0 and all t ∈ (−ε, ε).Because α > 0.5, we obtain that r t is differentiable at t = 0 with value zero so that .

Trust-region subproblems
We introduce trust-region and analyze subproblems by following the ideas from [18,23], that is the principal part of the objective enters the trust-region subproblem by means of a linear model and the regularization term is considered exactly.We analyze Γ-convergence of the trust-region subproblems with respect to convergence of the linearization point, and provide optimality conditions for the trust-region subproblem.Let ∆ > 0 and w be feasible for (P α ) with R α ( w) < ∞.The trust-region subproblem reads where we recover the linearized principal part of the objective of (P α ) with the choice g = ∇F ( w).The trust-region subproblem TR α ( w, g, ∆) admits a minimizer, which we briefly show below.
Proof.Because g ∈ L 2 (Ω) and w, w ∈ L ∞ (Ω), the first term of the objective of TR α ( w, g, ∆) is bounded below.Because of the L ∞ (Ω)-bounds (W is a finite set), convergence in L 1 (Ω) of feasible points implies convergence in L 2 (Ω) and we obtain continuity of the first term of the objective if a sequence of feasible points converges in L 1 (Ω).Moreover, the term ηR α ( w) is constant.Consequently, the assumptions of Proposition 3.2 are satisfied on the non-empty feasible set of TR α ( w, g, ∆).Thus the existence of solutions to (P α ) follows as a corollary from (the proof of) Proposition 3.2.
Proof.Let g := ∇F (w).We choose F (v) := (g, v) L 2 for v ∈ L 2 (Ω) and obtain ∇ F (v) = g and ∇ 2 F (v) = 0 so that F satisfies Assumption 3.1 on F. We apply Proposition 5.2 with F for F and obtain that w is stationary and satisfies (12) with ∇ F (w) = g = ∇F (w), which means that w is also stationary for (P α ).
Next, we analyze Γ-convergence of the trust-region subproblems with respect to strict and pseudo-weak- * convergence of feasible points of (P α ).As in [23], this will be a key ingredient of our convergence analysis of our trust-region algorithm.
Proof.We follow the proof strategy of [23,Theorem 5.2].
Part 1: Lower bound inequality.T (w) ≤ lim inf n→∞ T n (w n ) for w n p * ⇀ w in F. Because of the uniform L ∞ (Ω)-bounds on F, we obtain w n → w in L 2 (Ω) and v n → v in L 2 (Ω).In combination with g n ⇀ g in L 2 (Ω), we obtain Because w n p * ⇀ w in F and v n → v strictly in F, we obtain with the help of Lemma 2.
∆ holds for an infinite subsequence, then the convergence of {w n } n and {v n } n in L 1 (Ω) and the triangle inequality imply ∥w − v∥ L 1 ≤ ∆ so that the last term of T is zero and the lower bound inequality is satisfied.If there is no such subsequence, then T n ≡ ∞ and the lower bound inequality holds trivially.
Part 2: Upper bound inequality.For each w ∈ F with R α (w) < ∞, there exists a sequence w n p * ⇀ w in F such that T (w) ≥ lim sup n→∞ T n (w n ).We make a case distinction on the possible values of the norm difference ∥w − v∥ L 1 .
Case 2a ∥w − v∥ L 1 > ∆: Then T (w) = ∞ and we can choose w n := w for all n ∈ N.
Case 2b ∥w − v∥ L 1 < ∆: We choose again w n := w for all n ∈ N. We obtain The convergence of {w n } n and {v n } n in L 1 (Ω) and the triangle inequality imply that ∥w n − v n ∥ L 1 ≤ ∆ holds for all large enough n ∈ N. Consequently, T n (w n ) → T (w).
Case 2c ∥w − v∥ L 1 = ∆: Because ∆ > 0, there exists a set with λ(D) > 0. We note that the specific values w 1 and w 2 are without loss of generality because we may reorder the indices of the elements of W as necessary.
The set D satisfies where the first inequality follows from (3).The second inequality follows from the fact that at most one of the w i can be zero.Because D has strictly positive Lebesgue measure λ(D) > 0, it has a point of density 1, that is there exists We define κ n := ∥v n − v∥ L 1 for n ∈ N. Because of ( 13), there exist a sequence (r n ) n and n 0 ∈ N such that r n ↘ 0 and for all n ≥ n 0 : and B r n (x) ⊂ Ω.We now restrict to n ≥ n 0 and define w n by w(x) else for a.e.x ∈ Ω.This gives The construction of the w n implies w n p * ⇀ w in F. In order to obtain the lim supinequality, we show R α (w n ) → R α (w).To this end, let E i := w −1 ({w i }), For {E n 1 } n , we deduce by means of (3) and, analogously, by means of (4) for all i ∈ {1, . . ., M }.Summing the terms, we obtain R α (w n ) → R α (w).

Trust-region algorithm
We propose to solve (P α ) for locally optimal or stationary points with a variant of the trust-region algorithm that is proposed and analyzed in [18,23].It is stated as Algorithm 1 below and consists of two loops.The outer loop is indexed by n and in each iteration of the outer loop a new feasible iterate w n ∈ F with R α (w n ) < ∞ is computed that improves acceptably over the previous iterate w n−1 .An acceptable improvement is achieved if the new iterate w n satisfies ared( Algorithm 1 Trust-region Algorithm leaning on SLIP from [18,23] Input: α ∈ (0, 1), F sufficiently regular, ∆ 0 > 0, w 0 ∈ F with R α (w n ) < ∞, σ ∈ (0, 1).
1: for n = 0, . . .do while not sufficient decrease according to ( 14) wn,k ← minimizer of TR α (w n−1 , ∇F (w n−1 ), ∆ n,k ). 6: Terminate.The predicted reduction for w n−1 is zero.end while 17: end for for a fixed σ ∈ (0, 1) and the trust-region radius ∆ n,k that is determined by the inner loop (see below).In (14), the left hand side is defined by ared(w n−1 , w) := F (w n−1 ) + ηR α (w n−1 ) − F (w) − ηR α (w) for w ∈ F and is the actual reduction of the objective that is achieved by w.The right hand side is the predicted reduction that is achieved by the solution wn,k of the trust-region subproblem TR α (w n−1 , ∇F (w n−1 ), ∆ n,k ) pred(w n−1 , ∆ n,k ) := (∇F (w n−1 ), w n−1 − wn,k ) + ηR α (w n−1 ) − ηR α ( wn,k ) is the predicted reduction by the (negative objective of the) trust-region subproblem for the current trust-region radius and thus its solution ṽn,k .
To this end, the inner loop, indexed by k, starts from the reset trust-region radius ∆ n,0 = ∆ 0 and solves the trust-region subproblems TR α (w n−1 , ∇F (w n−1 ), ∆ n,k ) with linearization (model) point w n−1 .If the solution of the trust-region subproblem wn,k satisfies (14), the new iterate w n is set as wn,k and the inner loop terminates.Else, the trust-region radius is halved and the next iteration of the inner loop begins.If the pred(w n−1 , ∆ n,k ) = 0, w n−1 is a minimizer of the trust-region subproblem for a positive trust-region radius and thus stationary for (P α ) by virtue of Proposition 5.4 if α ∈ (0.5, 1).In this case, Algorithm 1 terminates.

Asymptotics of the trust-region algorithm
With the results that have been established in the previous sections, the asymptotics of Algorithm 1 can be analyzed by following the strategy from [23], which in turn is an extension of the analysis and ideas in [18].We therefore only provide the information, where the proofs of the corresponding results in [23] require modification to match the situation of this work.We begin with the proof of the asymptotics of the inner loop and continue with the asymptotics of the outer loop.
1.The inner loop terminates after finitely many iterations and (a) the sufficient decrease condition ( 14) is satisfied or (b) the predicted reduction is zero (and the iterate w n−1 is stationary for (P α )).
2. The inner loop does not terminate and the iterate w n−1 is stationary.
Proof.The proof follows as in Corollary 6.3 with the major steps of the proof being Lemma 6.1 and Lemma 6.2 in [23], where the violation of L-stationarity in Lemma 6.2 is replaced by a violation of ( 12) and the TV-term is replaced by R α .The roles of Lemma 3.3, Lemma 3.5, and Proposition 5.5 in [23] are taken by Propositions 2.7, 2.9 and 5.2.The role of Lemma 3.8 in [23] is taken by Lemma 2.8.It leads to the term (ε k ) 2α instead of (ε k ) 2 in the proof of Lemma 6.2, which is still dominated by ε k η for ε k ↘ 0 if α ∈ (0.5, 1) as assumed.
Theorem 5.7 (Theorem 6.4 in [23], Theorem 4.23 in [18]).Let α ∈ (0.5, 1).Let Assumption 3.1 hold.Let the iterates (w n ) n be produced by Algorithm 1.Let ∇F (w n ) ∈ C 2 ( Ω) for all n ∈ N. Then all iterates are feasible for (P α ) and the sequence of objective values (J α (w n )) n is monotonically decreasing.Moreover, one of the following mutually exclusive outcomes holds: 1.The sequence (w n ) n is finite.The final element w N of (w n ) n solves the trust-region subproblem TR α (w N , ∇F (w N ), ∆) for some ∆ > 0 and is stationary for (P α ).
2. The sequence (w n ) n is finite and the inner loop does not terminate for the final element v N , which is stationary for (P α ).
3. The sequence (w n ) n has a pseudo-weak- * accumulation point in F. Every pseudo-weak- * accumulation point of (w n ) n is feasible, and strict.If w is a pseudo-weak- * accumulation point of (w n ) n that satisfies ∇F (w) ∈ C 2 ( Ω), then it is stationary for (P α ).
Proof.As in the proofs of Theorem 6.4 in [23] and Theorem 4.23 in [18] it follows that Algorithm 1 produces a sequence of feasible iterates (w n ) n with corresponding montonotically decreasing sequence of objective function values (J α (w n )) n .Again, as in the proofs of Theorem 6.4 in [23] and Theorem 4.23, it suffices to prove Outcome 3 in case that Outcomes 1 and 2 do not hold.In this argument Proposition 5.6 takes the role of Lemma 6.2 in [23] and Lemma 4.19 in [18].As in [23], we consider four steps of the proof that Outcome 3 holds into four parts.Outcome 3 (1) existence and feasibility of pseudo-weak- * accumulation points: This follows with the same arguments as are carried out for the existence of minimizers in Proposition 3.2.
Outcome 3 (2) pseudo-weak- * accumulation points are strict: This follows with the same arguments as are carried out in the corresponding paragraph in the proof of Theorem 6.4 in [23].The only change is that the TV-term is replaced by the R α .
Outcome 3 (3) strict accumulation points are optimal for (TR α ) if the trust-region is bounded away from zero: This follows with the same arguments as are carried out in the corresponding paragraph in the proof of Theorem 6.4 in [23] when the role of Theorem 5.2 in [23] is taken by Theorem 5.5 and the role of Proposition 5.5 in [23] is taken by Proposition 5.2.
Outcome 3 (4) strict accumulation points are stationary if the trust-region radius vanishes: This follows with the same arguments as are carried out in the corresponding paragraph in the proof of Theorem 6.4 in [23] with the following adaptions.The assumed violation of L-stationarity in Theorem 6.4 is replaced by a violation of ( 12) and the TV-term is replaced by R α .The roles of Lemmas 3.3 and 3.5 in [23] are taken by Propositions 2.7 and 2.9.For the conditions (a), (b), and (c) on the choice of ∆ * in Theorem 6.4, we replace ε 1 by ε α 1 in (a) and the two occurrences of ∆ * κ −1 by (∆ * κ −1 ) 1 α in order to account for the exponent α in the estimate Lemma 2.8.Note that the non-negativity in (b) can be achieved because of the assumption α > 1 2 , which implies that (∆ * ) 2κ .Then the remaining steps can be carried out as in the proof of Theorem 6.4 in [23].

Computational experiment
We provide a computational example to give a qualitative impression of the behavior of the resulting discretization.We consider the binary control of an elliptic boundary value problem by means of a source term that enters the right hand side of the PDE.The main objective is a tracking-type functional so that the problems (P α ) become minimize u,w in Ω.
In order to discretize the problem, we choose a piecewise constant ansatz for the control input w on a uniform grid of squares of size n × n with n = 48.For the limit case, an isotropic discretization of the total variation seminorm in integer optimization is computationally difficult and recent approaches [26] are not computationally mature enough so far.Therefore, we compute the total variation as the length of the interfaces along the boundaries of the grid cells, which implies an anisotropic behavior in the limit.We then obtain integer linear programs for the trust-region subproblems as derived in Appendix B of [23].We expect that rectangular shapes are preferred for this anisotropic discretization.For the choices α = 0.5, α = 0.9, we tabulate the possible contributions of pairs of different cells to the double integral (2) and formulate the resulting trustregion subproblems as integer linear programs.Due to the double integral, this amounts to a number of variables in the order of n 4 , which is too much for standard integer programming solvers to handle easily.To alleviate this issue, we limit the contributions that are taken into account in the inner integral so that only cells whose center is within an ℓ 2 -distance of at most 7 1  n to the center of a cell in the outer integral are taken into account in the inner integral.This is justified by the decay of the integral kernel of the Gagliardo seminorm with distance to the current point.For the limit case, we compute the limiting regularizer as the sum of the interface lengths multiplied by their jump heights as in [23].
We discretize the PDE using the open source library FEniCSx 1 [28, 27, 1], where we use a much finer mesh than 48 × 48 cells to solve the PDE and solve the subproblems using the integer programming solver Gurobi 2 [17].We run the problems in single CPU mode on one node of TU Dortmund's Linux HPC cluster LiDO3 (node: 2x AMD EPYC 7542 32-Core CPUs and 1024 GB RAM).Even with our relatively coarse discretization and the approximation of the inner integral, many of the subproblems for α = 0.5 and α = 0.9 are very expensive, a lot of time to solve and the computations take almost three weeks (with several subproblems being solved inexactly because they did not solve to global optimality within 30 hours).The limit case is solved in a couple of hours.This high computational burden shows that more work is necessary to solve these structured integer linear programs efficiently and approximate the involved integrals sensibly.For α = 0.5, the trust-region algorithm accepts 31 steps until the trust-region radius contracts, that is it drops below the volume of one cell in the control grid.For α = 0.9, the trust-region algorithm accepts 28 steps until the trust-region radius contracts.For the limiting case, the trust-region algorithm accepts 36 until the trust-region radius contracts.
The limiting case shows an anisotropic behavior.This is expected because the geometric restriction induced by the control function ansatz implies that all interface lengths (jump height multiplied by length) are computed along the boundaries of the discretization into squares, see also the comments at the end of section 2 in [29].This is alleviated for α = 0.5 and α = 0.9 and a less anisotropic behavior can be observed.The three resulting controls at the respective final iterations are shown in Fig. 1.We also note that when running the experiments again (where parallelization in Gurobi is turned on), slightly different results are returned because the integer problems are numerically challenging and we are at the limits of what Gurobi can handle so that several similar integer configurations are within the tolerances and the outcome is not deterministic.We have also observed that this leads our trust-region algorithm to follow slightly different paths and contract at different stationary points.This highlights even more that more work is necessary to solve the subproblems efficiently to global optimality (or a constant factor approximation).

Conclusion
Our theoretical analysis opens a sensible way of approaching the computationally difficult approximation of the anisotropic total variation in contexts with discreteness restrictions on the variables and discretizations with fixed geometries.Specifically, we have approximated the boundary integral by a double volume integral, where the approximation properties are carried out by means of the fractional nonlocal perimeter.Like other recent steps in this direction, our computational experiments show that we end up with a computationally very challenging problem, which needs to be understood and scaled to meaningful problem sizes in the future.

10 :
else if not sufficient decrease according to (14) then 11:

Figure 1 :
Figure 1: Resulting shapes for the different values of α at the final iteration of the trust-region algorithm executed using discretized subproblems.