Optimal Control of ensembles of dynamical systems

In this paper we consider the problem of the optimal control of an ensemble of affine-control systems. After proving the well-posedness of the minimization problem under examination, we establish a $\Gamma$-convergence result that allows us to substitute the original (and usually infinite) ensemble with a sequence of finite increasing-in-size sub-ensembles. The solutions of the optimal control problems involving these sub-ensembles provide approximations in the $L^2$-strong topology of the minimizers of the original problem. Using again a $\Gamma$-convergence argument, we manage to derive a Maximum Principle for ensemble optimal control problems with end-point cost. Moreover, in the case of finite sub-ensembles, we can address the minimization of the related cost through numerical schemes. In particular, we propose an algorithm that consists of a subspace projection of the gradient field induced on the space of admissible controls by the approximating cost functional. In addition, we consider an iterative method based on the Pontryagin Maximum Principle. Finally, we test the algorithms on an ensemble of linear systems in $\mathbb{R}^2$.


Introduction
An ensemble of control systems is a parametrized family of controlled ODEs of the form ẋ θ (t) = G θ (x θ (t), u(t)) a.e. in [0, T ], where θ ∈ Θ ⊂ R d is the parameter of the ensemble, u : [0, T ] → R k is the control, and, for every θ ∈ Θ, G θ : R n × R k → R n is the function that prescribes the dynamics of the corresponding system. The peculiarity of this kind of problem is that the elements of the ensemble are simultaneously driven by the same control u. This framework is particularly suitable for modeling real-world control systems affected by data uncertainty (see, e.g., [28]), or the problem of controlling a large number of particles through a signal (see [9]). Also from the theoretical viewpoint, there is currently an active research interest in this topic. For instance, the problem of the controllability of ensembles of linear equations has been recently investigated in [13]. In [2] it was proved a generalization of the Chow-Rashevskii theorem for ensembles of linear-control systems. In [19,20] ensembles were studied in the framework of nuclear magnetic resonance spectroscopy. Moreover, as regards ensembles in quantum control, we report the contributions [4,5], and we recall the recent works [3,11]. Finally, we mention that the interplay between Reinforced Learning and optimal control of systems affected by partially unknown dynamics has been investigated in [21,25,26,24].
In the present paper, we focus on a particular instance of (0.1), corresponding to the case in which the dynamics has an affine dependence on the controls. More precisely, we consider ensembles with the following expression: where θ ∈ Θ ⊂ R d varies in a compact set, and, for every θ ∈ Θ, the vector field F θ 0 : R n → R n represents the drift, while the matrix-valued application F θ = (F θ 1 , . . . , F θ k ) : R n → R n×k collects the controlled fields. We set U := L 2 ([0, 1], R k ) as the space of admissible controls, and, for every θ ∈ Θ, the curve x θ u : [0, 1] → R n denotes the trajectory of (0.2) corresponding to the parameter θ and to the control u ∈ U. We are interested in the optimal control problem related to the minimization of a functional F : U → R + of the form F (u) := for every u ∈ U, where a : [0, 1] × R n × Θ → R + is a non-negative continuous function, while ν, µ are Borel probability measures on [0, 1] and Θ, respectively, and β > 0 is a constant that tunes the L 2 -squared regularization. When the support of the probability measure µ is not reduced to a finite set of points, the minimization of the functional F is often intractable in practical situations since a single evaluation of F potentially requires the resolution of an infinite number of Cauchy problems(0.2). Therefore, it is natural to try to replace µ with a sequence of probability measures (µ N ) N ∈N such that each of them charges a finite subset of Θ, and such that µ N ⇀ * µ as N → ∞. Then, we can consider the sequence of functionals (F N ) N ∈N defined as for every u ∈ U and for every N ∈ N. One of the goals of the present work is to study in which sense the functionals defined in (0.4) approximate the cost F . It turns out that, when considering the restrictions to bounded subsets of U, the sequence (F N ) N ∈N is Γ-convergent to F with respect to the weak topology of L 2 . We report that a similar approach was undertaken in [27], where the authors considered ensembles of control systems in the general form (0.1), and it was proved that the averaged approximations of the cost functional under examination are Γ-convergent to the original objective with respect to the strong topology of L 2 . We insist on the fact that our result is not reduced to a particular case of the one studied in [27]. Indeed, on one hand, using the strong topology, in [27] it was possible to establish Γ-convergence for more general ensembles of control systems, and not only under the affine-control dynamics (0.2). On the other hand, in the general situation considered in [27] the functionals of the approximating sequence are not equi-coercive (often neither coercive) in the L 2 -strong topology, and proving that the minimizers of the approximating functionals are (up to subsequences) convergent could be a challenging task. However, in the case of affine-control systems we manage to prove Γ-convergence even if the space of admissible controls U is equipped with the weak topology. Moreover, if for every N ∈ N we choose u N ∈ arg min U F N , standard facts in the theory of Γ-convergence ensure that the sequence (u N ) N ∈N is weakly pre-compact and that each of its limiting points is a minimizer of the original functional F defined in (0.3). What is more surprising is that -owing to the peculiar form of the cost (0.3)-it turns out that (u N ) N ∈N is also pre-compact in the L 2 -strong topology. Similar phenomena have been recently observed in [30] and [31], respectively in the frameworks of sub-Riemannian geodesics approximations and of data-driven diffeomorphisms reconstruction.
In the second part of the paper, we restrict our focus to the case of the average endpoint cost, i.e., when ν = δ t=1 in the integral at the right-hand side of (0.3) and (0.4). In this framework, from a direct application of the classical theory, we first derive the Pontryagin Maximum Principle for the problem of minimizing the functional F N for N ∈ N. Then, using again an argument based on the Γ-convergence, we manage to formulate the Pontryagin necessary conditions for local minimizers of the functional F . We report that our analysis has been inspired by the results in [6], where the authors establish the Maximum Principle for a large class of ensemble optimal control problems with average end-point cost. Even though our strategy is analogous to the path described in [6] (i.e., first considering auxiliary problems involving discrete measures, and then recovering the Maximum Principle for the ensemble optimal control problem), our case is not covered by the results presented in [6]. Namely, in [6] it is required that, for every point in a neighborhood of an optimal trajectory, the set of the admissible velocities is bounded, and this fact is crucial to prove the continuity of the trajectories when the controls are equipped with the Ekeland metric (see [6,Lemma 5.1]). Moreover, we observe that in [6] the limiting process evokes Ekeland's variational principle, while we employ Γ-convergence. Finally, we recall that in [33] the Maximum Principle for minimax optimal control was derived.
In the last part, we propose two numerical schemes for finite-ensemble optimal control problems with average end-point cost. More precisely, recalling that U is endowed with the usual Hilbert space structure, we first consider the gradient field induced by the functional F N : U → R + on its domain. This is done by adapting to the affine-control case a result obtained in [30] for linear-control systems. Then, we construct Algorithm 1 as the orthogonal projection of this gradient field onto a subspace U M ⊂ U such that dim(U M ) < ∞. On the other hand, Algorithm 2 is an adaptation to our problem of an iterative scheme originally proposed in [29], based on the Maximum Principle. Variants of Algorithm 1 and Algorithm 2 have been recently introduced in [31] as training procedures of a controltheoretic inspired Deep Learning architecture. We recall that a multi-shooting technique for ensemble optimal control has been recently investigated in [18].
We briefly outline the structure of this work. In Section 1 we establish some preliminary results. In particular, we show that the trajectories of the ensemble (0.2) are uniformly C 0 -stable for L 2 -weakly convergent sequences of admissible controls. This property is peculiar to affine-control dynamics and plays a crucial role in the other sections. In Section 2 we formulate the ensemble optimal control problem related to the minimization of the functional F : U → R + defined in (0.3), and we prove the existence of a solution using the direct method of calculus of variations. In Section 3 we establish the approximation results by showing that the sequence of functionals (F N ) N ∈N defined as in (0.4) are Γ-convergent to F with respect to the weak topology of L 2 . In Section 4, for every N ∈ N, we compute the gradient field induced by the functional F N on the space of admissible controls, and we derive the Pontryagin Maximum Principle for the optimal control problem related to the minimization of F N . Starting from Section 4 we restrict our attention to the end-point integral cost, that corresponds to the choice ν = δ t=1 in (0.3). In Section 5 we prove the Maximum Principle for local minimizers of the functional F , using a strategy based on Γ-convergence and the construction of auxiliary problems involving finite ensembles of control systems. In Section 6 we construct two numerical schemes for the minimization of F N in the case of end-point cost. The first method is based on the gradient field derived in Section 4, while for the second we make use of the Maximum Principle for finite ensembles. Finally, in Section 7 we test the algorithms on an approximately controllable ensemble of systems in R 2 .
General Notations. We introduce below some basic notations. For every d ≥ 1, we consider the space R d endowed with the usual Euclidean norm |z| 2 := z, z R d for every z ∈ R d , induced by the scalar product We sometimes make use of the equivalent norm | · | 1 defined as holds for every z ∈ R d .

Framework and Preliminary results
In this paper, we study ensembles of control systems in R n with affine dependence in the control variable u ∈ R k . More precisely, given a compact set Θ embedded into a finitedimensional Euclidean space, for every θ ∈ Θ we are assigned an affine-control system of the form where for every θ ∈ Θ we require that F θ 0 : R n → R n and F θ : R n → R n×k are Lipschitzcontinuous applications. We stress the fact that the control u : [0, 1] → R k does not depend on θ, so it is the same for every control system of the ensemble. Let us introduce F 0 : R n × Θ → R n and F : R n × Θ → R n×k defined respectively as for every (x, θ) ∈ R n × Θ. We assume that F 0 and F are Lipschitz-continuous mappings, i.e., that there exists a constant L > 0 such that to denote the vector obtained by taking the i th column of the matrix F (x, θ), for every i = 1, . . . , k. Similarly, for every θ ∈ Θ we shall use F θ i : R n → R n to denote the vector field corresponding to the i th column of the matrix-valued application F θ : R n → R n×k . We observe that (1.3)-(1.4) imply that the vector fields F θ 0 , F θ 1 , . . . , F θ k are uniformly Lipschitz-continuous as θ varies in Θ. Another consequence of the Lipschitz-continuity conditions (1.3)-(1.4) is that the vector fields constituting the affine-control system (1.1) have sub-linear growth, uniformly with respect to the dependence on θ. Namely, we have that there exists a constant C > 0 such that sup for every x ∈ R n . Finally, let us consider the application x 0 : Θ → R n that prescribes the initial state of (1.1), i.e., x 0 (θ) := x θ 0 (1.7) for every θ ∈ Θ. We assume that x 0 is continuous. As a matter of fact, there exists a constant C ′ > 0 such that sup We set U := L 2 ([0, 1], R k ) as the space of admissible controls, and we equip it with the usual Hilbert space structure given by the scalar product for every u, v ∈ U. For every u ∈ U and θ ∈ Θ, the curve x θ u : [0, 1] → R n denotes the solution of the Cauchy problem (1.1) corresponding to the system identified by θ and to the admissible control u. We recall that, for every u ∈ U and θ ∈ Θ, the existence and uniqueness of the solution of (1.1) are guaranteed by the Carathéodory Theorem (see, e.g., [17,Theorem 5.3]). Given u ∈ U, we describe the evolution of the ensemble of control systems (1.1) through the mapping X u : [0, 1] × Θ → R n defined as follows: for every (t, θ) ∈ [0, 1] × Θ. In other words, for every u ∈ U the application X u collects the trajectories of the ensemble of control systems (1.1). We study the properties of the mapping X u in Subsection 1.2 below. Before proceeding, we recall some elementary facts in functional analysis.
1.1. General results in functional analysis. We begin by recalling some basic facts about the space of admissible controls U := L 2 ([0, 1], R k ). First of all, the linear inclusion U ֒→ L 1 ([0, 1], R k ) is continuous, and from (0.5) and the Jensen inequality it follows that for every u ∈ U. We shall often make use of L 2 -weakly convergent sequences. Given a sequence (u m ) m∈N ⊂ U, we say that (u m ) m∈N is convergent to u ∈ U with respect to the weak topology of L 2 if lim m→∞ v, u m L 2 = v, u L 2 for every v ∈ U, and we write u m ⇀ L 2 u as m → ∞. If u m ⇀ L 2 u as m → ∞, then we have Finally, we recall that any bounded sequence (u m ) m∈N is pre-compact with respect to the L 2 -weak topology. For further details on weak topologies of Banach spaces, the reader is referred to [8,Chapter 3]. We conclude this part with the following fact concerning the one-dimensional Sobolev space For a complete survey on the topic, we recommend [8,Chapter 8].
Remark 1. Proposition 1.2 is the cornerstone of the theoretical results presented in this paper. Indeed, the fact fact that the trajectories of the ensemble (1.1) are uniformly convergent when the corresponding controls are L 2 -weakly convergent is used both to prove the existence of optimal controls (see Theorem 2.2) and to establish the Γ-convergence result (see Theorem 3.3). We stress that the fact that the systems in the ensemble (1.2) have affine dependence in the controls is crucial for the proof of Proposition 1.2.
In view of the next auxiliary result, we introduce some notations. For every θ ∈ Θ, we defineF θ : R n → R n×(k+1) as follows: for every x ∈ R n , i.e., we add the column F θ 0 (x) to the n × k matrix F θ (x). Similarly, for every u ∈ U = L 2 ([0, 1], R k ), we consider the extended controlũ ∈Ũ := L 2 ([0, 1], R k+1 ) defined asũ (t) = (1, u(t)) T (1.15) for every t ∈ [0, 1], i.e., we add the component u 0 = 1 to the column-vector u(t). Lemma 1.3. Let us consider a sequence of admissible controls (u m ) m∈N ⊂ U such that u m ⇀ L 2 u ∞ as m → ∞. For every m ∈ N ∪ {∞} and for every θ ∈ Θ, let x θ m : [0, 1] → R n be the solution of (1.1) corresponding to the ensemble parameter θ and to the admissible control u m . Then, for every θ ∈ Θ we have Proof. Let us fix θ ∈ Θ. By means of the matrix-valued functionF : R n → R n×(k+1) and the extended controlũ : [0, 1] → R k+1 defined in (1.14) and (1.15) respectively, we can equivalently rewrite the affine-control system (1.1) corresponding to θ as follows: for every u ∈ U. In other words, any solution x θ u : [0, 1] → R n of (1.1) corresponding to the admissible control u ∈ U is in turn a solution of the linear-control system (1.17) corresponding to the extended controlũ ∈Ũ. On the other hand, the convergence u m ⇀ L 2 u ∞ as m → ∞ implies the convergence of the respective extended controls, i.e.,ũ m ⇀ L 2ũ ∞ as m → ∞. Therefore, (x θ m ) m∈N is the sequence of solutions of the linear-control system (1.17) corresponding to the L 2 -weakly convergent sequence of controls (ũ m ) m∈N . Moreover, x θ ∞ is the solution of (1.17) associated with the weak-limiting controlũ ∞ . Using [30,Lemma 7.1], we deduce (1.16).
We are now in position to prove Proposition 1.2.
Proof of Proposition 1.2. Let us consider a L 2 -weakly convergent sequence (u m ) m∈N ⊂ U such that u m ⇀ L 2 u ∞ as m → ∞. We immediately deduce that there exists R > 0 such that ||u m || 2 L 2 ≤ R for every m ∈ N ∪ {∞}. Thus, in virtue of Lemma A.5, we deduce that the sequence of mappings {X m : [0, 1] × Θ → R n } m∈N is uniformly equi-continuous, while Lemma A.2 guarantees that it is uniformly equi-bounded. Therefore, applying the Ascoli-Arzelà Theorem (see, e.g., [8,Theorem 4.25]), we deduce that the family (X m ) m∈N is pre-compact with respect to the strong topology of the Banach for every (t, θ) ∈ [0, 1] × Θ. In particular, we deduce that the set of limiting points of the pre-compact sequence (X m ) m∈N is reduced to the single-element set {X ∞ }. This proves (1.13).
1.3. Adjoint variables of the controlled ensemble. In this subsection we introduce a function Λ u , which will play a crucial role in Section 5. Here we consider an assigned function a : R n × Θ → R such that (x, θ) → ∇ x a(x, θ) is continuous. Moreover, we further require that (x, θ) → ∂ ∂x F i (x, θ) is continuous for every i = 0, . . . , k. For every u ∈ U and every θ ∈ Θ, we define the function λ θ u : [0, 1] → (R n ) * as the solution of the following differential equation where the curve x θ u : [0, 1] → R n is the solution of the Cauchy problem (1.1) corresponding to the system identified by θ and to the admissible control u. We insist on the fact that in this paper λ θ u is always understood as a row-vector, as well as any other element of (R n ) * . The existence and the uniqueness of the solution of (1.18) follow as a standard application of the Carathéodory Theorem (see, e.g., [17,Theorem 5.3]). Similarly as done in the previous subsection, for every u ∈ U we introduce the function Λ u : (1.20) Before detailing the proof of Proposition 1.4, we establish an auxiliary result with a similar flavor as Lemma 1.3. Lemma 1.5. Let us assume that the mappings (x, θ) → ∂ ∂x F i (x, θ) are continuous for every i = 0, . . . , k, as well as the gradient (x, θ) → ∇ x a(x, θ). Let us consider a sequence of admissible controls (u m ) m∈N ⊂ U such that u m ⇀ L 2 u ∞ as m → ∞. For every m ∈ N∪{∞} and for every θ ∈ Θ, let λ θ m : [0, 1] → (R n ) * be the solution of (1.18) corresponding to the ensemble parameter θ and to the admissible control u m . Then, for every t ∈ [0, 1] and for every θ ∈ Θ, we have lim Proof. The weak convergence u m ⇀ L 2 u ∞ as m → ∞ implies that there exists R > 0 such that ||u m || L 2 ≤ R for every m ∈ N ∪ {∞}. Let us fix θ ∈ Θ. With the same argument as in the proof of Lemma B.2 we deduce that the sequence (λ θ m ) m∈N ⊂ H 1 ([0, 1], R k ) is equi-bounded. Therefore, there exists a weakly convergent subsequence (λ θ m ℓ ) ℓ∈N such that λ θ m ℓ ⇀ H 1λ θ as ℓ → ∞. Moreover, this implies thatλ θ m ℓ ⇀ L 2λ θ as ℓ → ∞, while from the compact inclusion H 1 ֒→ C 0 we deduce that λ θ m ℓ → C 0λ θ as ℓ → ∞. In particular, this last convergence and Lemma 1.3 imply that where for every m ∈ N ∪ {∞} the curve x θ m : [0, 1] → R n denotes the solution of (1.1) corresponding to the control u m and to the parameter θ. We want to prove thatλ θ : [0, 1] → (R n ) * is the solution of (1.18) corresponding to the control u ∞ . We recall thaṫ for every ℓ ∈ N. We observe that, in virtue of Lemma A.2, there exists K R ⊂ R n such that x θ m (t) ∈ K R for every m ∈ N ∪ {∞} and for every (t, θ) ∈ [0, 1] × Θ. Then, owing to the continuity of the mappings ( as ℓ → ∞. (1.24) Combining (1.23) and (1.24), we derive thaṫ (1.25) The identities (1.22) and (1.25) show thatλ θ ≡ λ θ ∞ , where λ θ ∞ : [0, 1] → (R n ) * is the unique solution of (1.18) corresponding to the control u ∞ . Hence, since any H 1 -weakly convergent subsequence of (λ θ m ) m∈N must converge to λ θ ∞ , we get (1.21). Since this argument holds for every choice of θ ∈ Θ, we deduce the thesis.
We are now able to prove Proposition 1.4.
Proof of Proposition 1.4. The argument is the same as in the proof of Proposition 1.2. Namely, Lemma 1.5 guarantees the pointwise convergence of the mappings (Λ m ) m∈N to Λ ∞ , while Lemma B.1 and Lemma B.4 ensure, respectively, that the elements of the sequence are uniformly equi-bounded and uniformly equi-continuous.
1.4. Gradient field for affine-control systems with end-point cost. In this subsection we generalize to the case of affine-control systems some of the results obtained in [30] in the framework of linear-control systems with end-point cost. As we shall see, the strategy that we pursue consists in embedding the affine-control system into a larger linear-control system, similarly as done in the proof of Lemma 1.3. Therefore, we can exploit a consistent part of the machinery developed in [30] to cover the present case. Let us consider a single affine-control system on R n of the form where F 0 : R n → R n and F : R n → R n×k are C 2 -regular applications that design the affinecontrol system, and u ∈ U = L 2 ([0, 1], R k ) is the control. We introduce the functional J : U → R defined on the space of admissible controls as follows: for every u ∈ U, where a : R n → R is a C 2 -regular function and β > 0 a positive parameter. After proving that the functional J is differentiable, we provide the Riesz's representation of the differential d u J : U → R. Before proceeding, it is convenient to introduce the linear-control system in which we embed (1.26). Similar to (1.14), letF : R n → R n×(k+1) be the function defined as for every x ∈ R n . If we define the extended space of admissible controls asŨ := L 2 ([0, 1], R k+1 ), we may consider the following linear-control system whereũ ∈Ũ. We observe that we can recover the affine system (1.26) by restricting the set of admissible controls in (1.29) to the image of the affine embedding i : U →Ũ defined as We introduce the extended cost functionalJ :Ũ → R as for everyũ ∈Ũ , where xũ : [0, 1] → R n is the absolutely continuous solution of (1.29) corresponding to the controlũ. To avoid confusion, in the present subsection we denote by ·, · U and ·, · Ũ the scalar products in U andŨ, respectively. In the next result we prove that the functional J : Proposition 1.6. Let us assume that F 0 : R n → R n and F : R n → R n×k are C 1 -regular, as well as the function a : R n → R designing the end-point cost. Then, the functionals J : U → R andJ :Ũ → R defined, respectively, in (1.27) and in (1.31) are Gateaux differentiable at every point of their respective domains.
Proof. We observe that the functional J : U → R satisfies the following identity: for every u ∈ U, where i : U →Ũ is the affine embedding reported in (1.30). Since i : U →Ũ is analytic, the proof reduces to showing that the functionalJ :Ũ → R is Gateaux differentiable. This is actually the case, sinceũ → β 2 ||ũ|| L 2 is smooth, while the first term at the right-hand side of (1.31) (i.e., the end-point cost) is Gateaux differentiable owing to [30, Lemma 3.1].
By differentiation of the identity (1.32), we deduce that (1.33) for every u, v ∈ U, where we have introduced the linear inclusion i # : U →Ũ defined as for every v ∈ U. In virtue of Proposition 1.6, we can consider the vector field G : U → U that represents the differential of the functional J : U → R. Namely, for every u ∈ U, let G[u] be the unique element of U such that Similarly, let us denote byG :Ũ →Ũ the vector field such that for everyũ,ṽ ∈Ũ. In [30] it was derived the expression of the vector fieldG associated with the linear-control system (1.29) and to the cost (1.31). In the next result we use it in order to obtain the expression of G. We use the notation F (x) T to denote the matrix in R k×n obtained by the transposition of the matrix F (x) ∈ R n×k , for every x ∈ R n . The analogue convention holds forF (x) T , for every x ∈ R n . Theorem 1.7. Let us assume that F 0 : R n → R n and F : R n → R n×k are C 1 -regular, as well as the function a : R n → R designing the end-point cost. Let G : U → U be the gradient vector field on U that satisfies (1.35). Then, for every u ∈ U we have , where x u : [0, 1] → R n is the solution of (1.26) corresponding to the control u, and λ u : [0, 1] → (R n ) * is the absolutely continuous curve of covectors that solves Remark 2. In this paper, we understand the elements of (R n ) * as row-vectors. Therefore, for every t ∈ [0, 1], λ u (t) should be read as a row-vector. This should be considered to give meaning to (1.38).
Proof of Theorem 1.7. In virtue of (1.33), from the definitions (1.35) and (1.36) we deduce that for every u, v ∈ U, whereG :Ũ →Ũ is the gradient vector field corresponding to the functionalJ :Ũ → R, and π :Ũ → U is the linear application π : for everyṽ ∈Ũ. Therefore, we can rewrite (1.39) as where i and π are defined, respectively, in (1.30) and in (1.40). This implies that we can deduce the expression of G from the one ofG. In particular, from [30,Remark 8] it follows that for everyũ ∈Ũ we haveG is the solution of (1.29) corresponding to the controlũ, and λũ : [0, 1] → (R n ) * is the absolutely continuous curve of covectors that solves We stress the fact that the summation index in (1.43) starts from 0. Then, the thesis follows immediately from (1.41)-(1.43).
Remark 3. The identity (1.41) implies that the gradient field G : U → U is at least as regular asG :Ũ →Ũ. In particular, under the further assumption that F 0 : R n → R n , F : R n → R n×k and a : R n → R are C 2 -regular, from [30, Lemma 3.2] it follows that G :Ũ →Ũ is Lipschitz-continuous on the bounded sets ofŨ. In particular, under the same regularity hypotheses, G : U → U is Lipschitz-continuous on the bounded sets of U.

Optimal control of ensembles
In this section we formulate a minimization problem for the ensemble of affine-control systems (1.1). Namely, let us consider a non-negative continuous mapping a : [0, 1] × R n × Θ → R + , a positive real number β > 0 and a Borel probability measure ν on the time interval [0, 1]. Therefore, for every θ ∈ Θ we can study the following optimal control problem: where the curve x θ u : [0, 1] → R n is the solution of (1.1) corresponding to the parameter θ ∈ Θ and to the admissible control u ∈ U. We recall that the ensemble of control systems (1.1) is aimed at modeling our partial knowledge of the data of the controlled dynamical system. Therefore, it is natural to assume that the space of parameters Θ is endowed with a Borel probability measure µ that quantifies this uncertainty. In view of this fact, we can formulate an optimal control problem for the ensemble of control systems (1.1) as follows: The minimization problem (2.2) is obtained by averaging out the parameters θ ∈ Θ in the optimal control problem (2.1) through the probability measure µ.
In this section we study the variational problem (2.2), and we prove that it admits a solution. Before proceeding, we introduce the functional F : U → R + associated with the minimization problem (2.2). For every admissible control u ∈ U, we set We first prove an auxiliary lemma regarding the integral cost in (2.2).
We are now in position to prove that (2.2) admits a solution.
Proof. We establish the thesis by means of the direct method of calculus of variations (see, e.g., [15,Theorem 1.15]). Namely, we show that the functional F is coercive and lower semi-continuous with respect to the weak topology of L 2 . We first address the coercivity, i.e., we prove that the sub-level sets of the functional F are L 2 -weakly pre-compact. To see that, it is sufficient to observe that for every M ≥ 0 we have where we used the fact that the first term at the right-hand side of (2.3) is non-negative. To study the lower semi-continuity, let us consider a sequence of admissible controls (u m ) m∈N ⊂ U such that u m ⇀ L 2 u ∞ as m → ∞. Using the family of applications (Y m ) m∈N∪{∞} defined as in (2.4), we observe that the integral term at the right-hand side of (2.3) can be rewritten as follows provided by Lemma 2.1 implies in particular the convergence of the integral term at the right-hand side of (2.3): Finally, combining (1.12) with (2.8), we deduce that This proves that the functional F is lower semi-continuous, and therefore we obtain the thesis.
Remark 4. The constant β > 0 in (2.3) is aimed at balancing the effect of the squared L 2 -norm regularization and of the integral term. This fact can be crucial in some cases, relevant for applications. Indeed, let us assume that, for every ε > 0, there exists u ε ∈ U such that Then, let us set and letû ∈ U be a minimizer for the functional F : U → R + defined as in (2.3). Therefore, we have that In particular, this means that, when the constant β > 0 is chosen small enough, the integral cost achieved by the minimizers of F can be made arbitrarily small.
Remark 5. The non-negativity assumption on the cost function a : [0, 1] × R n × Θ → R + is used to deduce the inclusion (2.7). This hypothesis can be relaxed by requiring, for example, that a is bounded from below. More in general, our analysis is still valid for any continuous function a : [0, 1] × R n × Θ → R such that the sublevels {u ∈ U : F (u) ≤ M} ⊂ U are bounded in L 2 for every M ∈ R. For simplicity, we will assume throughout the paper that a is non-negative.

Reduction to finite ensembles via Γ-convergence
In this section we deal with the task of approximating infinite ensembles with growingin-size finite ensembles, such that the minimizers of the corresponding ensemble optimal control problems are converging. In this framework, a natural attempt consists in approximating the assigned probability measure µ on the space of parameters Θ with a probability measureμ that charges a finite number of elements of Θ. Therefore, if µ andμ are close in some appropriate sense, we may expect that the solutions of the minimization problem involvingμ provide approximations of the minimizers of the original ensemble optimal control problem (2.2). This argument can be made rigorous using the tools of Γ-convergence.
We briefly recall below this notion. For a thorough introduction to this topic, we refer the reader to the textbook [15]. Definition 1. Let (X , d) be a metric space, and for every N ≥ 1 let F N : X → R ∪ {+∞} be a functional defined over X. The sequence (F N ) N ≥1 is said to Γ-converge to a functional F : X → R ∪ {+∞} if the following conditions holds: • liminf condition: for every sequence (u N ) N ≥1 ⊂ X such that u N → X u as N → ∞ the following inequality holds • limsup condition: for every x ∈ X there exists a sequence (u N ) N ≥1 ⊂ X such that u N → X u as N → ∞ and such that the following inequality holds: If the conditions listed above are satisfied, then we write The importance of the Γ-convergence is due to the fact that it relates the minimizers of the functionals (F N ) N ≥1 to the minimizers of the limiting functional F . Namely, under the hypothesis that the functionals of the sequence (F N ) N ≥1 are equi-coercive, ifû N ∈ arg min F N for every N ≥ 1, then the sequence (û N ) N ≥1 is pre-compact in X , and any of its limiting points is a minimizer for F (see [15,Corollary 7.20]). In other words, the problem of minimizing F can be approximated by the minimization of F N , when N is sufficiently large.
We now focus on the ensemble optimal control problem (2.2) studied in Section 2 and on the functional F : U → R + defined in (2.3). As done in the proof of Theorem 2.2, it is convenient to equip the space of admissible controls U := L 2 ([0, 1], R k ) with the weak topology. However, Definition 1 requires the domain X where the limiting and the approximating functionals are defined to be a metric space. Unfortunately, the weak topology of L 2 is metrizable only when restricted to bounded sets (see, e.g., [8,Remark 3.3 and Theorem 3.29]). In the next lemma we see how we should choose the restriction without losing any of the minimizers of F .
The previous result implies that the following inclusion holds and where ρ > 0 is provided by Lemma 3.1. Since X is a closed ball of L 2 , the weak topology induced on X is metrizable. Hence, we can restrict the functional F : U → R + to X to construct an approximation in the sense of Γ-convergence. With a slight abuse of notations, we shall continue to denote by F the functional restricted to X . As anticipated at the beginning of the present section, the construction of the functionals (F N ) N ≥1 relies on the introduction of a proper sequence of probability measures (µ N ) N ≥1 on Θ that approximate the probability measure µ prescribing the integral cost in (2.2). We first recall the notion of weak convergence of probability measures. For further details, see, e.g., the textbook [12, Definition 3.5.1].
Definition 2. Let (µ N ) N ≥1 be a sequence of Borel probability measures on the compact set Θ. The sequence (µ N ) N ≥1 is weakly convergent to the probability measure µ as N → ∞ if the following identity holds for every function f ∈ C 0 (Θ, R). If the previous condition is satisfied, we write µ N ⇀ * µ as N → ∞.
For every N ≥ 1 we consider a subset {θ 1 , . . . , θ N } ⊂ Θ and a probability measure that charges these elements: We assume that the sequence (µ N ) N ≥1 approximates the probability measure µ in the weak sense, i.e., we require that µ N ⇀ * µ as N → ∞.
Remark 6. In the applications, there are several feasible strategies to achieve the convergence µ N ⇀ * µ as N → ∞, and the crucial aspect is whether the probability measure µ is explicitly known or not. If it is, the discrete approximating measures can be defined, for example, by following the construction proposed in [6,Lemma 5.2]. We observe that the problem of the optimal approximation of a probability measure with a convex combination of a fixed number of Dirac deltas is an active research field. For further details, see, e.g., the recent paper [22]. On the other hand, in the practice, it may happen that there is no direct access to the probability measure µ, but it is only possible to collect samplings of random variables distributed as µ. In this case, the discrete approximating measures can be produced through a data-driven approach. Namely, if {θ 1 , . . . , θ N } are the empirically observed samplings, a natural choice is to set in (3.6) α j = 1 N for every j = 1, . . . , N.
We are now in position to introduce the family of functionals (F N ) N ≥1 . For every N ≥ 1, let F N : X → R + be defined as follows where x θ u : [0, 1] → R n denotes the solution on (1.1) corresponding to the parameter θ ∈ Θ and to the control u ∈ X . We observe that F N and F have essentially the same structure: the only difference is that the integral term of (2.3) involves the measure µ, while (3.7) features the measure µ N . Before proceeding to the main theorem of the section, we recall an auxiliary result.
be a sequence of probability measures on Θ such that µ N ⇀ * µ as N → ∞, and let ν be a probability measure on [0, 1]. Then, the sequence of the product Proof. The thesis follows directly from Fubini Theorem and Definition 2.
We now show that the sequence of functionals (F N ) N ≥1 introduced in (3.7) is Γ-convergent to the functional that defines the ensemble optimal control problem (2.2).  .7), we obtain that for every N ∈ N. Moreover, we observe that the uniform convergence Therefore, using the triangular inequality and Lemma 3.2, from (3.10) we deduce that Combining (3.9) with (3.11) and (1.12), we have that which concludes the first part of the proof. We now establish the limsup condition. For every u ∈ X , let us consider the constant sequence u N = u for every N ≥ 1. In virtue of Lemma 3.2, we have that for every u ∈ X , where X u : [0, 1] × Θ → R n is defined as in (1.10). This fact gives for every u ∈ X , and this shows that the limsup condition holds.
Remark 7. We observe that Theorem 2.2 holds also for F N : X → R + for every N ∈ N. Indeed, the domain X is itself sequentially weakly compact, and the convergence (2.8) occurs also with the probability measure µ N in place of µ. Therefore, as the functional is F N coercive and sequentially lower semi-continuous with respect to the weak topology of L 2 , it admits a minimizer.
The next result is a direct consequence of the Γ-convergence result established in Theorem 3.3. Indeed, as anticipated before, the fact that the minimizers of the functionals (F N ) N ∈N provide approximations of the minimizers of the limiting functional F is a wellestablished fact, as well as the convergence inf X F N → inf X F as N → ∞ (see [15,Corollary 7.20]). We stress the fact that, usually, the approximation of the minimizers occurs in the topology that underlies the Γ-convergence result. However, we can actually prove that, in this case, the approximation is provided with respect to the strong topology of L 2 , and not just in the weak sense. Similar phenomena have been recently described in [30,Theorem 7.4] and in [31,Remark 6].
Corollary 3.4. Let X ⊂ U be the set defined in (3.4). For every N ≥ 1, let F N : X → R + be the functional introduced in (3.7) and letû N ∈ X be any of its minimizers. Finally, let F : X → R + be the restriction to X of the application defined in (2.3). Then, we have Moreover, the sequence (û N ) N ∈N is pre-compact with respect to the strong topology of L 2 , and any limiting point of this sequence is a minimizer of F .
Proof. Owing to Theorem 3.3, we have that F N → Γ F as N → ∞ with respect to the weak topology of L 2 . Therefore, from [15,Corollary 7.20] it follows that (3.13) holds and that the sequence of minimizers (û N ) N ∈N is pre-compact with respect to the weak topology of L 2 , and its limiting points are minimizers of F . To conclude we have to prove that it is pre-compact with respect to the strong topology, too. Let us consider a subsequence (û N j ) j∈N such thatû N j ⇀ L 2û ∞ as j → ∞. Using the fact thatû ∞ is a minimizer for F , as well asû N j is for F N j for every j ∈ N, from (3.13) it follows that (3.14) Moreover, with the same argument used in the proof of Theorem 3.3 to deduce the identity (3.11), we obtain that Combining (3.14) and (3.15), and recalling the definitions (3.7) and (2.3) of the functionals F N : X → R + and F : X → R + , we have that which implies thatû N j → L 2û ∞ as j → ∞. Since the argument holds for every L 2weakly convergent subsequence of the sequence of minimizers (û N ) N ∈N , this concludes the proof.
Remark 8. There are two possible interpretations for Theorem 3.3 and Corollary 3.4, depending if the probability measure µ that defines the limiting functional F is explicitly known or not. If it is, then the Γ-convergence result can be read as a theoretical guarantee to substitute an infinite-ensemble optimal control problem with a finite-ensemble one, as illustrated in the Introduction and at the beginning of this section. On the other hand, in real-world problems, the underlying measure µ may be unknown, but we can collect observations {θ 1 , . . . , θ N } of random variables distributed as µ, and we consider the empirical probability measure µ N = 1 N N j=1 δ θ i . In this framework, Theorem 3.3 and Corollary 3.4 can be interpreted as stability results for the number of observations N. Indeed, from the fact that µ N ⇀ * µ as N → ∞, the Γ-convergence of the sequence (F N ) N ∈N implies that, when the number of collected observations is large enough, we should not expect dramatic changes in the solutions of the optimal control problems if we further increase the samplings.

Gradient field and Maximum Principle for the approximating problems
In the present section we address the question of actually finding the minimizers of the approximating functionals (F N ) N ∈N introduced in Section 3. Namely, starting from the result stated in Theorem 1.7 for a single affine-control system with end-point cost, we obtain the expression of the gradient fields that the functionals (F N ) N ∈N induce on their domain. Moreover, we state the Pontryagin Maximum Principle for the optimal control problems corresponding to the minimization of the functionals (F N ) N ∈N . Both the gradient fields and the Maximum Principle will be used for the construction of the numerical algorithms presented in Section 6.
From now on, we specialize on the following particular form of the cost associated with the ensemble optimal control problem (2.2): for every u ∈ U, where a : R n × Θ → R + is a C 1 -regular function, and β > 0 is a positive parameter that tunes the L 2 -regularization. We observe that (4.1) is a particular instance of (2.3). Indeed, it corresponds to the case ν = δ t=1 , where ν is the probability measure on the time interval [0, 1] that appears in the first term at the right-hand side of (2.2). In other words, we assume that the integral cost in (2.2) depends only on the final state of the trajectories of the ensemble. For every N ∈ N, let the probability measure µ N have the same expression as in (3.6), i.e., it is a finite convex combination of Dirac deltas centered at {θ 1 , . . . , θ N } ⊂ Θ. Therefore, for every N ∈ N, the functional F N : U → R + that we consider in place of (4.1) has the the form for every u ∈ U.
Remark 9. In Section 3 for technical reasons we defined the functionals (F N ) N ∈N on the domain X ⊂ U introduced in (3.4). However, the functionals (F N ) N ∈N and the corresponding gradient fields can be defined over the whole space of admissible controls U.
At this point, it is convenient to approach the minimization of the functional F N in the framework of finite-dimensional optimal control problems in finite-dimensional Euclidean spaces. For this purpose, we introduce some notations. For every N ∈ N, let {θ 1 , . . . , θ N } ⊂ Θ be the set of parameters charged by the discrete probability measure µ N . Then, we study the finite sub-ensemble of (1.1) corresponding to the parameters {θ 1 , . . . , θ N }. Namely, we consider the following affine-control system on R nN : where x = (x 1 , . . . , x N ) T ∈ R nN , and F N 0 : R nN → R nN and F N : R nN → R nN ×k are applications defined as follows: for every x ∈ R nN . Finally, the initial value is set as x 0 := (x 0 (θ 1 ), . . . , x 0 (θ N )), where x 0 : Θ → R n is the mapping defined (1.7) that prescribes the initial data of the Cauchy problems of the ensemble (1.1). Moreover, we can introduce the function a N : where a : R n × Θ → R + is the function that designs the integral cost in (4.1), and for every j = 1, . . . , N the coefficient α j is the weight corresponding to δ θ j in the convex combination (3.6). In this framework, the functional F N : U → R + can be rewritten as follows: for every u ∈ U, where x N u : [0, 1] → R nN is the solution of (4.3) corresponding to the admissible control u. In the next result we derive the expression of the vector field G N : U → U that represents the differential of the functional F N , i.e., that satisfies for every u, v ∈ U.
Theorem 4.1. Let us assume that for every θ ∈ Θ the functions x → F 0 (x, θ) and x → F (x, θ) are C 1 -regular, as well as the function x → a(x, θ) that defines the end-point cost in (4.1). Let {θ 1 , . . . , θ N } ⊂ Θ be the subset of parameters charged by the measure µ N that designs the integral cost in (4.2). Let F N : U → R + be the functional defined in (4.2). Then, F N is Gateaux differentiable at every u ∈ U, and we define G N : U → U as the gradient vector field on U that satisfies (4.8). Then, for every u ∈ U we have for a.e. t ∈ [0, 1], where for every j = 1, . . . , N the curve x θ j u : [0, 1] → R n is the solution of (1.1) corresponding to the parameter θ j and to the admissible control u, and λ j u : [0, 1] → (R n ) * is the absolutely continuous curve of covectors that solves (4.10) Remark 10. We use the convention that the elements of (R n ) * are row-vectors. Therefore, for every j = 1, . . . , N and t ∈ [0, 1], λ j u (t) should be read as a row-vector. This should be considered to give sense to (4.9) and (4.10). The same observation holds for Theorem 4.2.
Proof of Theorem 4.1. As done in (4.3), we can equivalently rewrite the sub-ensemble of control systems corresponding to the parameters {θ 1 , . . . , θ N } ⊂ Θ as a single affine-control system in R nN . Moreover, the regularity hypotheses guarantee that the functions F N 0 : R nN → R nN and F N : R nN → R nN ×k defined in (4.5) are C 1 -regular, as well as the function a : R nN → R + introduced in (4.6). Therefore, owing to Theorem 1.7, we obtain the expression for the gradient field induced by the functional F N written in (4.7). Indeed, we deduce that G N [u] = F N (x u (t)) T Λ u (t) + βu (4.11) for every u ∈ U, where x u : [0, 1] → R nN is the solution of (4.3) corresponding to the control u, and Λ u : [0, 1] → (R nN ) * is the curve of covectors that solves where F N 1 , . . . , F N k : R nN → R nN denote the vector fields obtained by taking the columns of the matrix-valued application F N : R nN → R nN ×k . Moreover, if we consider the curves of covectors λ 1 u , . . . , λ N u : [0, 1] → (R n ) * that solve (4.10) for j = 1, . . . , N, it turns out that the solution of (4.12) can be written as Λ u (t) = (α 1 λ 1 u (t), . . . , α N λ N u (t)) for every t ∈ [0, 1], where α 1 , . . . , α N are the coefficients of convex combination involved in the definition of µ N (3.6). Finally, owing to this decoupling of Λ u , the identity (4.10) can be deduced from (4.11) using the expression of F N 0 , . . . , F N k . In the previous result we obtained the Riesz's representation of the differential of the functional F N : U → R + . We now establish the necessary condition for an admissible controlû N ∈ U to be a minimizer of F N . This essentially descends as a standard application of Pontryagin Maximum Principle. For a complete survey on the topic, the reader is referred to the textbook [1].
Remark 11. We can equivalently reformulate the Maximum condition (4.14) of Theorem 4.2 as follows: We recall that the Pontryagin Maximum Principle provides necessary condition for minimality. An admissible controlū ∈ U is a (normal) Pontryagin extremal for the optimal control problem related to the minimization of F N : U → R + if there exist λ 1 u , . . . , λ N u : [0, 1] → (R n ) * satisfying (4.13) and such that the relation (4.14) holds. Remark 12. Letū ∈ U be a critical point for the functional F N : U → R + , i.e., G N [ū] = 0. Therefore, from (4.9) it turns out that for a.e. t ∈ [0, 1], where for every j = 1, . . . , N the curve x θ j u : [0, 1] → R n is the trajectory of (1.1) corresponding to the parameter θ j and to the controlū, and λ j u : [0, 1] → (R n ) * is the solution of (4.10). We observe that, for every j = 1, . . . , N, λ j u : [0, 1] → (R n ) * solves as well (4.13), and thatū(t) satisfies for a.e. t ∈ [0, 1]. This shows that any critical point of F N : U → R + is a (normal) Pontryagin extremal for the corresponding optimal control problem. Conversely, an analogue argument shows that any Pontryagin extremal is a critical point for the functional F N .

Maximum Principle for ensemble optimal control problems
In the present section we use a Γ-convergence argument to recover necessary optimality conditions for (local) minimizers of the functional F defined in (4.1). The result that we prove here is in the same flavor as the Maximum Principle derived in [6], even though the tools employed are rather different.
Letū ∈ U be a local minimizer for the functional F . Then, for every ε > 0, we define the following perturbed functional F ε : U → R + : We immediately observe that the following property holds.
For every local minimizerū ∈ U of the functional F , we set Given a sequence of discrete probability measures (µ N ) N ≥1 as in (3.6) such that µ N ⇀ * µ as N → ∞, for every ε > 0 and for every N ≥ 1 we introduce the functional F N,ε : Xū → R + as follows: Similar to Section 3, we can establish a Γ-convergence result.
Proposition 5.2. Letū ∈ U be a local minimizer of the functional F : U → R + introduced in (4.1), and let Xū ⊂ U be the set defined in (3.4), equipped with the weak topology of L 2 . For every N ≥ 1 and for every ε > 0, let F N,ε : Xū → R + be the functional presented in (5.3), and let F ε : Xū → R + be the restriction to Xū of the application defined in (5.1). Then, we have that F N,ε → Γ F ε as N → ∞. Moreover, if for every N ≥ 1 we consider u N,ε ∈ argminF N,ε , we obtain that Proof. The fact that F N,ε → Γ F ε as N → ∞ follows from a verbatim repetition of the arguments of the proof of Theorem 3.3. In addition, [15,Corollary 7.20] guarantees that (5.5) and that any of the weak-limiting points of the sequence (û N,ε ) ⊂ Xū is itself a minimizer of the restriction of F ε to Xū. However, owing to Lemma 5.1, we know thatū is the unique minimizer of the restriction of F ε to Xū. Therefore, we deduce thatû N,ε ⇀ L 2ū as N → ∞. We are left to show that the latter convergence holds also with respect to the strong topology of L 2 . Using a similar reasoning as in the proof of Corollary 3.4, from (5.5) we obtain the identity Finally, recalling the weak semi-continuity of the L 2 -norm (1.12), the previous expression yields (5.4).
We are now in position to prove the Maximum Principle for the local minimizers of the ensemble optimal control problem related to the functional F : U → R + . Theorem 5.3. Let us assume that the mappings (x, θ) → ∂ ∂x F i (x, θ) are continuous for every i = 0, . . . , k, as well as the gradient (x, θ) → ∇ x a(x, θ). Letū ∈ U be a local minimizer of the functional F : U → R + introduced in (4.1). Let Xū : [0, 1] × Θ → R n be the mapping defined in (1.10) that collects the trajectories of the ensemble corresponding to the controlū, and let us consider the application Λū : for every θ ∈ Θ. Then, we have that for a.e. t ∈ [0, 1].
Proof. Let us fix ε > 0 and, for every N ≥ 1, let us consider the functional F N,ε : Xū → R + and letû N,ε ∈ arg min Xū F N,ε . As done in the proof of Theorem 4.2, the problem of minimizing F N,ε over Xū can be reduced to a classical optimal control problem with endpoint cost. Therefore, using similar computations as in the proof of Theorem 4.2, we deduce that for every N ≥ 1 the controlû N,ε is associated with a normal Pontryagin extremal of the cost functional F N,ε . Using the notations introduced in Remark 11, if we consider the application Λû N,ε : [0, 1] × Θ → (R n ) * defined in (1.19) and corresponding to the admissible controlû N,ε ∈ U, we obtain that for a.e. t ∈ [0, 1] i.e.,û Moreover, recalling that µ N ⇀ * µ as N → ∞ by assumption, if we set Z := Z ∞ ∪ N ≥1 Z N , then for every t ∈ [0, 1] \ Z we can take the pointwise limit of (5.8) as N → ∞, which yields:ū i.e.,ū for a.e. t ∈ [0, 1]. From (5.9) -which we observe does not depend on the choice of ε > 0we finally obtain (5.7).
Remark 13. Theorem 5.3 shows that any local minimizer of the functional F is associated with a normal extremal of the ensemble optimal control problem. Moreover, we observe that there are no nontrivial abnormal extremals. Indeed, if we take ǫ ∈ R and we consider Λū(1, θ) = ǫ∇ x a (Xū(1, θ), θ) for every θ ∈ Θ as the final-time datum for (5.6), when ǫ = 0 we obtain (Λū, ǫ) ≡ 0. Finally, we observe that, in virtue of the concave quadratic term, the maximization problem (5.7) always admits a solution. Hence, there are no singular arcs.
Remark 14. For some global minimizersū ∈ U of the functional F : U → R + defined as in (2.3), Theorem 5.3 can be directly deduced from the Γ-convergence result established in Section 3. Namely, this is the case for those global minimizersū ∈ arg min U F that can be recovered as the limiting points of the minimizers of the approximating functionals F N : U → R + introduced in (3.7). Indeed, ifû N ∈ arg min U F N for everyN ≥ 1 and u ∈ U is an L 2 -strong accumulation point of the sequence (û N ) N ≥1 , then Corollary 3.4 guarantees thatū ∈ arg min U F , and we can obtain the condition (5.7) by repeating the proof of Theorem 5.3 with ε = 0. However, in general, given a family of functionals I N : X → R on a metric space (X , d) such that I N → Γ I as N → ∞, there could be elements in arg min X I that cannot be recovered as limiting points of minimizers of (I N ) N ≥1 . For instance, if we set X = [−1/2, 1/2] with the Euclidean distance, we have that the functions I N : X → R defined as I N (x) := |x| N are Γ-converging as N → ∞ to the function I ≡ 0. On one hand, we have that arg min X I = X , while arg min X I N = {0} for every N ∈ N. As a matter of fact, the minimizers of I in X \ {0} cannot be recovered as a limit of minimizers of (I N ) N ∈N . For this reason, in our case, the introduction of the auxiliary functionals F ε and (F N,ε ) in, respectively, (5.1) and (5.3) is precisely aimed at managing this situation, as well as deducing the Maximum Principle also for local minimizers, and not only for global minimizers.
Remark 15. Results concerning the necessary optimality conditions for ensemble optimal control problems are of great interest from the theoretical viewpoint. A natural question is whether they could be successfully employed to derive numerical methods for the approximate resolutions of such problems. Some efforts in this direction were done in [7], where the authors obtain a mean-field Maximum Principle for problems with uncertain initial datum and with the controlled dynamics unaffected by the unknown parameter. In that framework, a key-ingredient of the Maximum Principle [7, Theorem 4.1] is a real-valued function ψ ∈ C 1 ([0, 1], C 2 c (R n )) that solves a backward-evolution PDE. We observe that the quantity ∇ x ψ is somehow related to the function Λ u that we introduced in our discussion (see [7,Proposition 4.9] for more details). In [7] the authors proposed a numerical scheme for their mean-field optimal control problem relying on an approximated computation of the solution of the backward-evolution PDE. Despite the encouraging results obtained in the experiments, the main drawback of this approach is that the resolution of the PDE is affordable only in low dimensions (e.g., in [7] examples in dimensions 1 and 2 were considered).

Numerical schemes for optimal control of ensembles
In the present section we introduce two numerical schemes for finite-ensemble optimal control problems with end-pint cost. The starting points are the results of Section 4, and we follow an approach similar to [31]. The first method consists of the projection of the field G N : U → U induced by F N onto a finite-dimensional subspace U M ⊂ U. The second one is based on the Pontryagin Maximum Principle and it was first proposed in [29].
Before proceeding, we introduce the notations and the framework that are shared by the two methods. Let us consider the interval [0, 1], i.e., the evolution time horizon of the ensemble of controlled dynamical systems (1.1), and for M ≥ 2 let us take the equispaced nodes {0, 1 M , . . . , M −1 M , 1}. Recalling that U := L 2 ([0, 1], R k ), let us define the subspace U M ⊂ U as follows: where u 1 , . . . , u M ∈ R k . For every l = 1, . . . , M, we shall write u l = (u 1,l , . . . , u k,l ) to denote the components of u l ∈ R k . Then, any element u ∈ U M will be represented by the following array: We observe that in (6.3) we dropped the reference to the control that generates the trajectories. This is done to avoid hard notations, since we hope that it will be clear from the context the correspondence between trajectories and control. Similarly, for every j = 1, . . . , N, let λ j u : [0, 1] → (R n ) * be the solution of (4.10), and let us introduce the corresponding array of the evaluations: 6.1. Projected gradient field. In this subsection we describe a method for the numerical minimization of the functional F N : U → R + defined as in (4.2). This algorithm consists of the projection of the gradient field G N : U → U derived in (4.9) onto the finite-dimensional subspace U M ⊂ U defined as in (6.1). This approach has been introduced in [31], where it has been studied the problem of observations-based approximations of diffeomorphisms. We observe that we can explicitly compute the expression of the orthogonal projector for every l = 1, . . . , M. We are now in position to describe the Projected Gradient Field algorithm. We report it in Algorithm 1.
Remark 16. We observe that the for loops at the lines 9-12 and 18-21 (corresponding, respectively, to the update of the curves of covectors and of the trajectories) can be carried out in parallel with respect to the index j = 1, . . . , N. This can be considered when dealing with large sub-ensembles of parameters. : R n → R n drift fields; • F θ 1 , . . . , F θ N : R n → R n×k controlled fields; • (x j 0 ) j=1,...,N = (x θ 1 0 , . . . , x θ N 0 ) initial states of trajectories; • a(·, θ 1 ), . . . , a(·, θ N ) : R n → R + end-point costs, and β > 0. Algorithm setting: update of the control at the r-th iteration is rejected, at the r + 1-th iteration it is not necessary to re-compute the array of covectors (λ j l ) j=1,...,N l=0,...,M . In this regards, the if clause at the line 8 prevents this computation in the case of rejection at the previous passage.
6.2. Iterative Maximum Principle. In this subsection we present a second numerical method for the minimization of the functional F N : U → R + , based on the Pontryagin Maximum Principle. The idea of using the Maximum Principle to design approximation schemes for optimal control problems was well established in the Russian literature (see [10] for a survey paper in English). Here we adapt to our problem the method proposed in [29], which is in turn a stabilization of one of the algorithms reported in [10]. Finally, this approach has been recently followed in [31] in the framework of diffeomorphisms approximation.
The key idea relies on iterative updates of the control through the resolution of a maximization problem related to the condition (4.14). However, the substantial difference from  with a maximization problem analogue to (6.8). Finally, we sequentially repeat the same procedure for every l = 2, . . . , M. We report the scheme in Algorithm 2.
Remark 18. The maximization at line 17 can be solved directly at a very low computational cost. Indeed, we have that T for every l = 1, . . . , M. This is essentially due to the fact that the systems of the ensemble (1.1) have an affine dependence on the control.
Remark 21. Also in Algorithm 2 the step-size is adaptively adjust, and it is reduced if, after the iteration, the value of the functional has not decreased. In case of rejection of the update, it is not necessary to recompute (λ j l ) j=1,...,N l=0,...,M . This is a common feature with Algorithm 1, as observed in Remark 17.

Numerical experiments
In this section we test the algorithms described in Section 6 on an optimal control problem involving an ensemble of linear dynamical systems in R 2 . Namely, given θ min < θ max ∈ R, let us set Θ := [θ min , θ max ] ⊂ R, and let us consider the ensemble of control systems where θ → x θ 0 is a continuous function that prescribes the initial states, u = (u 1 , u 2 ) T ∈ U := L 2 ([0, 1], R 2 ), and, for every θ ∈ Θ, we have For every N ≥ 1 and for every subset of parameters {θ 1 , . . . , θ N } ⊂ Θ, we represent the corresponding sub-ensemble of (7.1) as an affine-control system on R 2N , as done in Section 4. More precisely, we consider where A N ∈ R 2N ×2N and b 1 , b 2 ∈ R 2N are defined as follows: Moreover, we observe that (7.1) can be interpreted as a control system in the space C 0 (Θ, R 2 ). Indeed, we can consider the control system where A : C 0 (Θ, R 2 ) → C 0 (Θ, R 2 ) is the bounded linear operator defined as for every θ ∈ Θ and for every Y ∈ C 0 ([0, 1], R 2 ), and b 1 , b 1 : Θ → R 2 are defined as for every θ ∈ Θ, and finally X 0 : Θ → R 2 satisfies X 0 (θ) := x θ 0 for every θ ∈ Θ. The integrals in (7.5) should be understood in the Bochner sense, and, for every u ∈ U, the existence and uniqueness of a continuous curve t → X u,t in C 0 (Θ, R 2 ) solving (7.5) descends from classical results in linear inhomogeneous ODEs in Banach spaces (see, e.g., [14,Chapter 3]). In particular, from the uniqueness we deduce that for every u ∈ U, t ∈ [0, 1] and θ ∈ Θ, where x θ u : [0, 1] → R 2 is the solution of (7.1) corresponding to the parameter θ and to the control u. We now prove some controllability results for the control systems (7.3) and (7.5).
Proposition 7.1. For every N ≥ 1 and for every subset {θ 1 , . . . , θ N } ⊂ Θ, let us consider y tar ∈ R 2N . Then, there exists a controlū ∈ U such that the corresponding solution xū : [0, 1] → R 2N of (7.3) satisfies xū(1) = y tar . Moreover, for every Y tar ∈ C 0 (Θ, R 2 ) and for every ε > 0, there exists a control u ε ∈ U such that the curve t → X uε,t that solves (7.5) satisfies Proof. We observe that the first part of the thesis follows if we prove the exact controllability of the system (7.3). An elementary result in control theorey (see, e.g., [1,Theorem 3.3]) ensures that the last condition is implied by the identity A direct computation shows that this is actually the case.
As regards the second part of the thesis, owing to [32, Theorem 3.1.1] we have that it is sufficient to prove that We observe that therefore the identity (7.7) follows from the Weierstrass Theorem on polynomial approximation.
We now introduce the problem that we studied in the numerical simulations. We set θ min = − 1 2 , θ max = 1 2 , and we consider on Θ = [− 1 2 , 1 2 ] the probability measure µ, distributed as a Beta(4, 4) centered at 0. We observe that during the experiments we assumed to have no explicit knowledge of the probability measure µ. On the other hand, we imagined to be able to sample observations from that distribution, and we pursued the data driven approach described in Remark 6. After that the approximated optimal control had been computed, we validated the policy just obtained on a testing sub-ensemble of newly-sampled parameters. Let us assume that the initial data in (7.1) is not affected by the parameter θ, i.e, there exists x 0 ∈ R 2 such that x θ 0 = x 0 for every θ ∈ Θ. We imagine that we want to steer the end-points of the trajectories of (7.1) as close as possible to a target point y tar ∈ R 2 . Therefore, we consider the functional F : U → R + defined as for every u ∈ U. We observe that the second part of Proposition 7.1 implies that we are in the situation described in Remark 4. Indeed, if we set Y tar (θ) := y tar for every θ ∈ Θ, we have that for every ε > 0 there exists u ε ∈ U such that where we used the identity (7.6). Therefore, in correspondence of small values of β, we expect that the minimizers of (7.8) drive the end-point of the controlled trajectories very close to y tar . In the simulations we considered β = 10 −3 . Finally, we approximated the probability measure µ with the empirical distribution µ N , obtained with N independent samplings of µ, using N = 300. Moreover, we chose x 0 = (0, 0) T and y tar = (−1, −1) T . We report below the results obtained with Algorithm 1 and Algorithm 2, where we set M = 64. We observed that performances of the two numerical methods are very similar, as regards both the qualitative aspect of the controlled trajectories and the decay of the cost during the execution.

Conclusions
In this paper we considered the problem of the optimal control of an ensemble of affinecontrol systems. We proved the well posedness of the corresponding minimization problem, and we showed with a Γ-convergence argument how we can reduce the original problem to an approximated one, involving ensembles with a finite number of elements. For these ones, in the case of end-point cost, we proposed two numerical schemes for the approximation of the optimal control. We finally tested the methods on a ensemble optimal control problem in dimension two.
For future development, we plan to study algorithms also for more general costs, and not only for terminal-state penalization. Moreover, we hope to extend the Γ-convergence results to some proper class of ensembles of nonlinear-control systems. As well as in the affine-control case, we expect that weak topologies on the space of controls are required to have equi-coercivity of the functionals. On the other hand, the challenging aspect is that, in nonlinear-control systems, weakly convergent controls do not induce, in general, locally C 0 -strongly convergent flows.

Decay of the cost
Iterative PMP Projected Gradient Figure 2. In the graph we reported the decay of the discrete cost achieved by Algorithm 1 (Projected Gradient) and Algorithm 2 (Iterative PMP). As we can see, the performances on this problem are very similar.
Appendix A. Auxiliary results of Subsection 1.2 Here we prove some auxiliary properties of the mapping X u : [0, 1] × Θ → R n , which has been defined in (1.10) for every u ∈ U. Before proceeding, we recall a version of the Grönwall-Bellman inequality.
Lemma A.1 (Grönwall-Bellman Inequality). Let f : [a, b] → R + be a non-negative continuous function and let us assume that there exists a constant α > 0 and a non-negative for every s ∈ [a, b]. Then, for every s ∈ [a, b] the following inequality holds: Proof. This result follows directly from [16, Theorem 5.1].
We first prove that for every u ∈ U the mapping X u : [0, 1] × Θ → R n is bounded.
We shall prove that, when the control u varies in a bounded subset of U, the corresponding functions X u : [0, 1] × Θ → R n that captures the evolution of the ensemble of control systems (1.1) are uniformly equi-continuous on their domain. We first show separately the uniform equi-continuity for the variables in the time domain [0, 1] and in the parameter domain Θ. In the next result we observe that the trajectories of the ensemble are Hölder-continuous, uniformly with respect to the parameter θ ∈ Θ. Lemma A.3. For every u ∈ U, let X u : [0, 1] × Θ → R n be the application defined in (1.10) collecting the trajectories of the ensemble of control systems (1.1). Then, for every for every t 1 , t 2 ∈ [0, 1] and for every θ ∈ Θ.
We are now in position of stating the uniform equi-continuity result.
Proof. The thesis (A.8) follows directly from the triangular inequality and from Lemma A.3 and Lemma A.4.
From the definition of Λ u : [0, 1] × Θ → (R n ) * in (1.19), it follows that In the next lemma we show that Λ u is Hölder-continuous in time.
Finally, the next results proves the uniform continuity of Λ u .
Proof. The thesis (B.7) follows directly from the triangular inequality and from Lemma B.2 and Lemma B.3.