Sparse Control of Alignment Models in High Dimension

For high dimensional particle systems, governed by smooth nonlinearities depending on mutual distances between particles, one can construct low-dimensional representations of the dynamical system, which allow the learning of nearly optimal control strategies in high dimension with overwhelming confidence. In this paper we present an instance of this general statement tailored to the sparse control of models of consensus emergence in high dimension, projected to lower dimensions by means of random linear maps. We show that one can steer, nearly optimally and with high probability, a high-dimensional alignment model to consensus by acting at each switching time on one agent of the system only, with a control rule chosen essentially exclusively according to information gathered from a randomly drawn low-dimensional representation of the control system.


INTRODUCTION
In view of the increasing technical ability of collection of large amounts of time-evolving data and of potentially modeling them into high-dimensional dynamical systems, the controllability of complex multiagent interactions has become an actual challenge of paramount importance due to its social and economical impact. In this paper, we shall investigate the applicability of the following Meta-theorem. For high dimensional particle systems, governed by smooth nonlinearities depending on mutual distances between particles, one can construct low-dimensional representations of the dynamical system, which allow the learning of nearly optimal control strategies in high dimension with overwhelming confidence.
As control is usually goal-oriented, hence highly dependent on the specific dynamical system, investigating the qualitative applicability of this statement in its full generality may risk to dilute its quantitative understanding. Thus we shall prove in this paper a specific instance of it, which conveys nonetheless all the relevant aspects and technical issues potentially encountered in other situations. In particular we shall focus on alignment models inspired by the seminal work of Cucker and Smale [10,11]. In this class of dynamical systems the particles influence each other according to a positive rate of communication a x i (t) − x j (t) depending on the mutual distance towards the alignment of the entire group to a common conduct, and they read   ẋ The classically mentioned inspiring application is the modeling of the emergence of a flock moving with the same velocity in a group of migrating birds. However, the emergence of a common direction may be depending on whether the initial conditions lay within a corresponding basin of attraction and such conditional pattern formation has been fully explored, for instance, in [6,7,17]: for the following result holds.
In those initial conditions where ∞ √ X(0) a √ 2Nr dr < V (0), Key words and phrases. Cucker-Smale model, consensus emergence, sparse control, Johnson-Lindenstrauss embedding, dimensionality reduction. and the convergence towards alignment is not anymore guaranteed, despite being desirable, for instance when it comes to unanimous decisions in assemblies, one may wonder whether the application of a parsimonious external control can lead nevertheless to consensus emergence. This issue has been recently explored in the series of papers [6,7], where the sparse controllability of alignment models towards consensus have been established (see Figure 1) regardless of the dimensionality of the problem, see also [3,4] for extensions and generalizations. In particular, alignment should not be interpreted exclusively relative to motion in the three dimensional Euclidean space, but there are several instances of "abstract alignment" which may occur in high-dimension, for instance in [1] the authors consider an application of alignment models to predict the collective phenomenon of asset pricing and volatilities in financial markets. Therefore, in those circumstances where the dimensionality of the dynamics is very high, it becomes a relevant question whether it is possible to define control strategies of the dynamics by observing instances of the system in lower dimension. In recent years, several techniques have been developed in order to reduce the dimensionality of timeevolving point clouds, such as diffusion maps applied to networks changing in time [9] and geometric multiscale dimensionality reductions [5], just to mention a few. Besides these perhaps involved methods based on computationally demanding nonlinear embeddings of the high-dimensional clouds in lower dimension, Johnson-Lindenstrauss embeddings, introduced in the seminal work [18], have the remarkable property of being simple linear operators M ∈ R k×d preserving the distances between points in the cloud P ⊂ R d up to an ε-distortion: where k ∼ ε −2 log(#P).
As Johnson-Lindenstrauss embeddings with such scaling of the low-dimension are constructed by generating random projections, the quasi-isometry property on the point cloud is usually stated with a certain (high) probability. The random linear projection of high-dimensional systems governed by smooth nonlinearities depending on mutual distances has been investigated in [14]: roughly speaking, given a dynamical system in highdimension d ≫ 1 governed by locally Lipschitz functions f i : R N×N and its lower-dimensional counterparṫ where M : R d → R k is a Johnson-Lindenstrauss linear embedding for k ∼ ε −2 log(N), the following finite time approximation holds ζ i (t) − Mz i (t) ≤ C T ε, for all t ∈ [0, T ], with high probability. If we applied such linear projections verbatim to each equation of a Cucker-Smale system, we would obtain the following approximation leading to the formulation of the low-dimensional system in R k   ẏ with initial conditions (y(0), w(0)) = (Mx(0), Mv(0)). The first result of this paper, refining and generalizing those in [14], is roughly summarized as follows.
As we highlight in details in Section 3, not only the approximation (1) holds for finite time, but, remarkably, the lower dimensional representation also shows also a rather impressive faithfulness in terms of the asymptotic (long time) detection of collective behavior emergence, i.e., global alignment occurs in lower dimension k if and only if it occurs in high dimension d with high probability. The key technical tool for proving this result and the ones following is a weak form of the Johnson-Lindenstrauss Lemma, formulated below in Lemma 2.4, valid for continuous trajectories and not only for clouds of points. Similar results appear, to some extent in greater generality in [2,13], but not in the weak form we consider here.
Additionally we combine the analysis of [14] with the sparse controllability results in [6] and show that a high-dimensional dynamical systems of Cucker-Smale type can be nearly optimally stabilized towards consensus by means of a control strategy completely identified by the optimal control strategy in lowdimension with high probability. More formally we consider for a given (x(0), v(0)) the high-dimensional controlled system   ẋ and its low-dimensional system counterpart with initial data (y(0), w(0)) = (Mx(0), Mv(0)), The sparse control strategies applied to the systems are defined as follows: fix θ > 0 and define for otherwise, Notice that the control u h is sparse (all the components are zero except one) and defined exclusively through the following information: the indexι which is computed from the low-dimensional control problem, the consensus parameter vˆι , which is actually the only information to be observed in high-dimension, and the mean consensus parameter v(t) = v(0) t 0 u h i (s)ds, which one does compute by integration and sums of previous controls. Our main result reads as follows.
Theorem 0.3. Let M ∈ R k×d and Θ > 0. Assume that (x, v) and (y, w) are solutions of the d-dimensional and k-dimensional so controlled Cucker-Smale systems with initial values (x(0), v(0)) and (Mx(0), Mv(0)), respectively. Further assume that M is a Johnson-Lindenstrauss matrix for a certain distortion ε > 0 and low dimension k, which depends exponentially on the number of agents, but not on the dimension d. Then both controlled Cucker-Smale systems (a) stay close to each other after the projection of the high-dimensional trajectories; (b) reach the consensus region of Theorem 0.1 in finite time, and (c) reach the consensus region, when a certain parameter of the low-dimensional systems falls below a known threshold.
We consciously do not wish to be more detailed at this point than this rather general and perhaps rough explanation because the precise statements appear in the rest of the paper in a rather technical form and we wish here, in the introduction, mainly to convey their fundamental message. Let us stress again that in our view the content of this paper is of technical nature towards a proof of concept and we expect our main results actually to extend similarly to other high-dimensional dynamical systems whose nonlinearities depend smoothly on mutual Euclidean distances. We refer to [14] for more examples of relevant dynamical systems of this type. While in this paper we consider the sparse controllability of alignment systems for d → ∞, we mention also the related investigations towards a sparse mean-field optimal control for N → ∞ in [15,16].
The paper is organized as follows: Section 1 presents the Cucker-Smale model and some of its main features. Section 2 deals with Johnson-Lindenstrauss embeddings, which shall be used extensively to obtain low-dimensional counterparts of Cucker-Smale models retaining all the information about the asymptotic behavior of the system for large times. Section 3 studies the interplay between a high-dimensional Cucker-Smale model and the low-dimensional system obtained via Johnson-Lindenstrauss embeddings: in particular, in Theorem 3.2 we derive an error estimate for the approximation of the projected highdimensional system by the low-dimensional one. Section 4 introduces the sparse control strategy we shall exploit to enforce alignment in the high-dimensional system using only information gathered from the lowdimensional system and presents Theorem 4.5, the main result of this paper. In Section 5 we discuss about the appropriate size of the dimension onto which we should project a given high-dimensional system and the construction of suitable Johnson-Lindenstrauss embeddings fulfilling the conditions stated in the main result. Finally, Section 6 shows a series of numerical experiments and compares the sparse control strategy to several other possible stabilization procedures.

THE CUCKER-SMALE MODEL
In the following, we shall work in the ambient space R d equipped with the ℓ d 2 -Euclidean norm · ℓ d 2 , omitting the subscript if the dimensionality of the norm can be retrieved from the context. Consider a system of N agents, whose state is described by a pair (x i , v i ) of vectors of R d , where x i represents the main state of the agent and v i its consensus parameter. The alignment model as presented in [17] assumes that the dynamics of the i-th agent of the group evolves according to the following system of ordinary differential for every i = 1, . . . , N, where a is a non-increasing positive Lipschitz function on [0, ∞). In this model, at any time every agent adjusts its consensus parameter to match those of the other agents according to a weighted average of the differences: how much the i-th agent will align with the j-th agent depends on the Euclidean distance, meaning that the i-th agent is more influenced by those which are near to him than to those which are far away from him.
As a prominent example, the Cucker-Smale models considered in the seminal paper [10] are governed by a function a of the form where the parameters K > 0, σ > 0, and β ≥ 0 tune the social interaction in the group of agents. Definition 1.1. We say that a solution (x(t), v(t)) of system (2) tends to consensus if the consensus parameter vectors tend to the mean is a conserved quantity for a system of the type (2), but later we shall consider below controlled systems for which v(t) is eventually time dependent.
Given a solution (x(t), v(t)) of system (2), we reformulate the convergence to consensus by means of the following quantities is a bilinear form on the space (R d ) N , and ·, · denotes the usual scalar product on R d . If we denote with , v(t)) of system (2) the following are equivalent: = 0 for every i = 1, . . . , N; (2) lim t→+∞ v ⊥ i (t) = 0 for every i = 1, . . . , N; A sufficient condition for a solution of system (2) to converge to consensus can be given using the following functional ) be a solution of system (2). Then X(t) and In particular, if the initial datum ( then the solution of (2) with initial data (x 0 , v 0 ) tends to consensus.

Remark 1.4.
A simple proof of this crucial observation can be found in the Appendix of [7]. Notice that it follows immediately that V is decreasing.
we say that the system is in the consensus region at the time t.

A CONTINUOUS JOHNSON-LINDENSTRAUSS LEMMA
As it will be made clear below, we indent to reduce the computational effort of extracting fundamental features of the dynamical system (2), for instance about its asymptotic behavior, by projecting it to a kdimensional space for k ≪ d by a linear mapping M ∈ R k×d . In particular, we apply such a matrix M to each equation of (2) and by setting y i = Mx i as well as w i = Mv i for i = 1, . . . , N, we obtain the system   ẏ where we formally applied the equivalences For (5) to hold, at least approximately, we need that M is nearly an isometry (here we further refine and extend results from [14, Section 3]). Definition 2.1. Let M ∈ R k×d , δ > 0, and ε ∈ (0, 1). Then we say, that M is fulfilling the weak Johnson-Lindenstrauss property of parameters ε and δ at x ∈ R d if either We say that M is fulfilling the (strong) Johnson-Lindenstrauss property of parameter ε at x ∈ R d if exclusively (6) holds at x ∈ R d .

Remark 2.2.
The earliest result providing the existence of matrices M for which (6) holds for every x ∈ P, P ⊆ R d such that N = #P for the dimensionality k scaling as is the celebrated Johnson-Lindenstrauss Lemma from the seminal paper [18]. We refer to [13] for a rather general version of this result and to the references therein for an extended literature.
The only construction of a matrix M fulfilling the (strong) Johnson-Lindenstrauss property with scaling (8) known up to now is stochastic, i.e., the matrix is randomly generated and satisfies (6) with high probability. One of the remarkable features of these embeddings, which we exploit extensively in this paper, is that for their construction there is no need to know the specific points in advance: given a fixed cloud of points (not necessarily explicitely given!) a random matrix drawn according to certain distributions will fulfill the (strong) Johnson-Lindenstrauss property with high probability. Let us recall briefly some well-known instances of such distributions: (S1) k × d matrices M whose entries m i j are independent realizations of Gaussian random variables, i.e., (S2) k × d matrices M whose entries m i j are independent realizations of scaled Bernoulli random variables, i.e., (S3) k × d matrices M which are random projections and are scaled by a factor d/k, see [12].
then there exists a matrix M ∈ R k×d for k ∼ ε −2 log(d · ρ · ε −1 ) such that (10) ( for all t ∈ [0, 1]. As already announced at the beginning of this section, we would like to use (10) for being (x i (t), v i (t)) the trajectory of the i-th agent in (2). Unfortunately, (9) does not hold in this case even if we assume that x i (0) − x j (0) ≥ c > 0 for all i = j: Let us consider, for instance, Example 1 from [6] of a Cucker-Smale system of the type (2) with communication function (3) of two agents moving on the real line with positions and velocities at time t given by (x 1 (t), v 1 (t)) and (x 2 (t), v 2 (t)). Let us assume that β = 1, K = 2 as well as σ = 1. We indicate by x(t) = x 1 (t) − x 2 (t) the relative main state and by v(t) = v 1 (t) − v 2 (t) the relative consensus parameter. The system can be reformulated in terms of relative variables with initial conditions given by . Its solution can be characterized by integration by the following differential equation Let us stress that (9) is a necessary condition for (10) to hold (see [14,Remark 1]). This motivates the relaxation of the strong Johnson-Lindenstrauss property to its weak version in Definition 2.1. Hence we prove a result based on the more general weak Johnson-Lindenstrauss property which will be sufficient for us in the following.
In the rest of the paper, given a Lipschitz function ϕ : Then the matrix M fulfills the weak Johnson-Lindenstrauss property of parameters ε and δ at ϕ(t) for every t ∈ [0, 1] with the same high probability, i.e., either Since ε ∈ (0, 1), by (11) we have that Using this latter inequality and the Lipschitz continuity of ϕ we obtain and also Let us now assume ϕ(t j ) > δ /2.
Using again the Lipschitz continuity of ϕ we obtain the estimate where in the last inequality we used that This estimate of the distance ϕ(t) − ϕ(t j ) and the (strong) Johnson-Lindenstrauss property at ϕ(t j ) enable us to extend the (strong) Johnson-Lindenstrauss property at ϕ(t) as well, as a direct application of [14, Lemma 3.2], i.e., Both cases together show the (weak) Johnson-Lindenstrauss property at ϕ(t) for every t ∈ [0, 1].
We show in the following lemma that the mean-square norm and the relative order of the magnitudes of points in a cloud in high dimension are nearly preserved when projected in lower dimension by a weak Johnson-Lindenstrauss embedding. Lemma 2.5. Let a 1 , . . . , a N ∈ R d , b 1 , . . . , b N ∈ R k and M ∈ R k×d such that there is ∆ > 0 with the following properties: (i) The matrix M fulfills the weak Johnson-Lindenstrauss property with ε = 1/2 and δ = ∆ for the points a i , i.e., either or a i ≤ ∆ and Ma i ≤ ∆, (13) for all i ∈ 1, . . . , N.
(ii) We have the following approximation bound for all i ∈ 1, . . . , N.
Letι be the smallest index such that bˆι ≥ b j for all j = 1, . . . , N and let This shows the first estimate of the first part of the lemma. Let us address the second estimate. Let j ∈ {1, . . . , N} for j =ι. If b j ≥ 2∆, then, using the same argument as above, we have Ma j ≥ ∆ and thus by (12) we get On the other hand, if b j < 2∆, then Ma j ≤ 3∆. Then either (12) holds and we have a j ≤ 2 Ma j ≤ 6∆, (15) or (13) holds and automatically a j ≤ ∆. Now we can estimate the mean-square norm A. We obtain where A 1 is the index set of all j ∈ {1, . . . , N} \ {ι} such that b j ≥ 2∆ and A 2 is the index set of all j ∈ {1, . . . , N} for which b j < 2∆. Using (14) and (15) we obtain ≤ 289 aˆι 2 N using the maximality of bˆι and the first part of the lemma. Furthermore, we have Let now √ B ≤ 2∆. We can argue in the same way as for the second estimate of the first part: If b j ≥ 2∆, then as in (14) a j ≤ 3 b j .
If b j ≤ 2∆, then by (15) and the arguments thereafter we get a j ≤ 6∆.
Putting both estimates together and using the notationÃ 1 for the index set of all j ∈ {1, . . . , N} such that b j ≥ 2∆ as well asÃ 2 for the index set of all j ∈ {1, . . ., N} such that b j < 2∆ yield Taking the square root on both sides finishes the proof.

DIMENSION REDUCTION OF THE CUCKER-SMALE MODEL WITHOUT CONTROL
In this section we consider the projection of the Cucker-Smale system without control. We compare two quantities: First, we calculate the trajectory of the high-dimensional Cucker-Smale system and then project the agents' parameters by M ∈ R k×d . Second, we project the initial configurations to dimension k by applications of M. Then we compute from these initial values the trajectories of the corresponding low-dimensional Cucker-Smale system. What we shall do in the upcoming Theorem 3.2 is to give a precise bound from above to the distance between the the two k-dimensional trajectories, computed as described above.
More formally, given We introduce the low-dimensional analogues of X and V by Here the bilinear form B is intended to act on R k instead of R d , but with the same meaning of the symbol as before.
Remark 3.1. By Lemma 1.3 we know that V and W are decreasing. Hence for all i, j ∈ {1, . . . , N} An analogous estimate holds for V and v. Furthermore, we have Define the following errors: Furthermore, let L a be the Lipschitz constant of the function a, set Proof. We estimate the decay of E x 2 (t) and E v 2 (t) in order to use Gronwall's Lemma. For the following estimates we may assume that -without loss of generality -e v i (t) = 0 for t ∈ [0, T ] and for every i = 1, . . . , N: if this is not the case, either e v i ≡ 0 in a neighborhood of t or, by continuity, the estimates will also hold true at t. Hence we may assume that e v i is differentiable at t ∈ [0, T ]. By Cauchy-Schwarz inequality it holds Using triangle inequality, the Lipschitz continuity of a and its monotonicity, we obtain We now estimate the derivative of E v 2 . First of all, again by Cauchy-Schwarz inequality it follows If we insert (17) into the last inequality and we use triangle as well as Cauchy-Schwarz inequality in sequence, we get Let us now estimate now the first term of the sum. It holds and Furthermore, for the second sum we have Hence our computation yields Now we apply the assumptions on the matrix M: For every i, j ∈ {1, . . ., N} and t ∈ [0, T ] either (6) holds, and then or (7) holds, and then Using (vector-valued) Minkowski inequality and observing that, by Lemma 1.3, V and W are decreasing, we derive On the other hand, in the same way as in (18), we obtain be as in the statement of the theorem. Then, rearranging the previous calculations in vector form and integrating from 0 to t, we get the inequality Notice that Now we apply the ℓ 1 -norm to the inequality and we use Gronwall's Lemma, see Lemma 7.2, to deduce and hence we have Together with (23), we deduce the upper bound Using the trivial estimate of the ℓ ∞ -norm by the ℓ 2 -norm we conclude as well the estimate and

Remark 3.3.
In the proof we used that V and W are decreasing. When we consider controlled systems below, we even have a better estimate on the integral of V . In particular we use the following: Assume additionally that t 0 2V (s) ds ≤ α for all t ≤ T, for a fixed α > 0. Then for all t ≤ T we have To verify the latter estimate, just consider the boundedness of t 0 2V (s) ds within the inequality (21) in the proof of Theorem 3.2, and then proceed further as before.

Remark 3.4.
Among the hypotheses of Theorem 3.2, we assumed the existence of a matrix M ∈ R k×d fulfilling the weak Johnson-Lindenstrauss property for all curves of the form We show now that M is such a matrix provided that it fulfills the (strong) Johnson-Lindenstrauss property for all the (finite) vectors of the form and that the target dimension k is sufficiently large.
Indeed, we can adapt the proof of Lemma 2.4 in order to obtain a result valid simultaneously for all the curves ϕ i j : For each of these curves we have the Lipschitz estimate thus L ϕ i j (0, T ) ≤ 2NV (0). In order for the argument of the proof to work, for each curve ϕ i j we need N · T points (where N is as in (24), and the factor T is due to stretching the dynamics from a reference time domain [0, 1] to [0, T ]) at which the (strong) Johnson-Lindenstrauss property must hold, bringing the total number of points N at which that property must be true to N ′ · T · N 2 . So it holds Thus, if M is a k × d matrix fulfilling the (strong) Johnson-Lindenstrauss property of parameter ε at these N points, where k ≥ k 0 with then M satisfies also the hypothesis of Theorem 3.2 for any δ > 0.
Remark 3.5. In the remark above we calculated the necessary minimal dimension k 0 for a matrix M to satisfy the weak Johnson-Lindenstrauss property for all curves of the form The dependency of k 0 on N and ε is quite natural, but the dependency on the dimension d, even only logarithmically, is perhaps not desirable. But one can circumvent the dependence on the dimension using certain direct estimates within the proof of Theorem 3.2.
In analogy to what we did before, take t m = m/N ′ with m = 0, . . . , ⌈T · N ′ ⌉ − 1 and N ′ -the number of sampling points -is to be chosen large enough later on. Furthermore, we assume that the matrix M fulfills the (strong) Johnson-Lindenstrauss property at t m , i.e., we require that M satisfies . We start at the estimate (19): where and (y i − y j )(·) on [0, T ], respectively. Furthermore, using the (strong) Johnson-Lindenstrauss property of M at t m we get the same estimates as in (20), only with t m instead of t: For the estimate of the last two terms in (26) we choose N ′ large enough so that Thus we arrive at Following the steps of the proof of Theorem 3.2, we can get an analogue of (22): So, the main difference is the replacement of E x 2 (t) with E x 2 (t m ) on the right-hand sides, with t m = m/N ′ . At this point in the proof of Theorem 3.2 we applied Gronwall's Lemma, see the estimates before (23). Now here we intend to use its discrete version, Lemma 7.3: let again K 1 = L a NW (0) 2X(0), K 2 = 2L a NW (0), and K 3 = 1/2 · L a NW (0) 2V (0). Integrating between t m and t we get Now applying the ℓ 1 -norm and Lemma 7.3 we get This is a slightly worse estimate than the original one of Theorem 3.2 by a factor 2 in the exponential, since So eventually we obtain At the cost of a slightly worse estimate, we gain, however, that the admissible lower dimensionality k of the matrix M does not depend anymore on the higher dimension d: indeed we make use of the (strong) Johnson-Lindenstrauss property on N = 2 · ⌈T · N ′ ⌉ · N 2 points. Hence, it suffices to take the minimal target dimension k 0 such that M ∈ R k×d with k ≥ k 0 for Actually, in oder to verify the independence of the dimension d, we have to estimate the number of sampling points N ′ independently of it. By (25) in Remark 3.4, we know that Hence we obtain so that we confirmed that there is no asymptotic dependence on d.
Remark 3.6. The estimate of Theorem 3.2 explains the plot presented in [14,Fig. 3.5], where surprisingly the error for large time was shown to decrease instead of exploding according to classical Gronwall's estimates. Indeed, since V (t) and W (t) are decreasing functions, there is a time when the bound swaps from the exponential Gronwall-type bound to the decreasing curve given by Moreover, if both the high-dimensional and the low-dimensional trajectories entered the consensus region already, then V (t) and W (t) approach 0 as t tends to +∞, forcing E v (t) to tend to 0. The vanishing of the discrepancy between the low-dimensional trajectory (w k (t)) N k=1 of the consensus parameters and the projected trajectory (Mv k (t)) N k=1 is a remarkable property of the Cucker-Smale system (2) as the initial mean-consensus parameter w(0) = Mv(0) is actually a conserved quantity.

DIMENSION REDUCTION OF THE CUCKER-SMALE MODEL WITH CONTROL
It was proven in [7] that a system of type (2) can be driven to the consensus region using a sparse control strategy, i.e., a control acting at every instant only on one agent, whose consensus parameter is the farthest away from the mean consensus parameter. However, if the dimension d of each agent is very large, the numerical simulation of such a dynamical system and its sparse control becomes computationally demanding.
In this section we consider a k-dimensional Cucker-Smale system, where k ≪ d, having as initial conditions the projection of the initial configuration of the original d-dimensional system. The projection will be done by a matrix M ∈ R k×d fulfilling the (strong) Johnson-Lindenstrauss property for a certain amount of points. We shall show that the solution of the k-dimensional system obtained in this way will stay close to the projected dynamics of the original d-dimensional system via the matrix M. This, in turn, shall allow us to prove our main result: If we gather the information of which is the farthest agent away from consensus in the k-dimensional system and we control this agent in the original high-dimensional system by the sparse strategy presented in [7], then we will still be able to drive the high-dimensional system to the consensus region in finite time and with a near-optimal rate.
One of the main consequences of this fact is that simulations following this strategy will save a relevant amount of computational time with respect to approaching directly the problem in high dimension: indeed, we present in Section 6 numerical examples, which show that we can take k even conspicuously smaller than d and still be able to implement a successful sparse control strategy steering the dynamics to the consensus region nearly optimally.
Formally, let us now consider a controlled version of the high-dimensional system   ẋ with initial datum (x(0), v(0)) ∈ (R d ) N × (R d ) N , and of the associated low-dimensional system   ẏ with initial condition (y(0), w(0)) ∈ (R k ) N × (R k ) N , where y i (0) = Mx i (0) and w i (0) = Mv i (0) for every i = 1, . . . , N, and M ∈ R k×d is a matrix fulfilling the (strong) Johnson-Lindenstrauss property at certain points of the high-dimensional trajectories.
We have already stated that the control u h in high dimension shall depend on u ℓ , the low-dimensional one. Since the latter control is a function of the low-dimensional dynamics determined by the initial datum (y(0), w(0)), which in turn depends on M, the trajectories of the high-dimensional dynamics depend on M as well.
As already stated before, given a set of N points, not necessarily explicitly, a random matrix generated by one of the constructions reported in Remark 2.2 fulfills the Johnson-Lindenstrauss property at these N points with a certain high probability. Unfortunately, in the current situation and differently from the one encountered in Section 3, the points on the trajectories at which the Johnson-Lindenstrauss property has to hold seem depending on the matrix M that we have generated! As we shall see in detail in Section 5, we can resolve this dependency of the high-dimensional trajectories on the generated matrix M, by observing that the realization of the trajectories depends actually on a finite number of control switchings. Hence, for the moment, we just assume that the Johnson-Lindenstrauss property holds at certain points of the trajectory and we postpone to Section 5 the explanation of how this assumption can in fact hold true.
In what follows, we shall always indicate with θ > 0 the maximal amount of resources that the external policy maker is allowed to spend at every instant to keep the system confined. This means that our controls u h and u ℓ will satisfy -respectively -the ℓ N 1 (ℓ d 2 )-constraint and the ℓ N and W (t) be as in (4) and (16), respectively. Let us fix a Γ ≥ 0 and define T c 0 := inf {t ∈ [0, T ] : W (t) ≤ Γ} if the set is non-empty, otherwise set T c 0 := T . We define the componentwise feedback controls u h and u ℓ as follows: • if t > T c 0 , then u h (t) = 0 and u ℓ (t) = 0. We setι(t) := 0. We say that the trajectory in low dimension has entered the consensus region given by the threshold Γ if t ∈ [T c 0 , T ). Let us stress now the following observation.

Remark 4.2.
Notice that the control u h is sparse (all the components are zero except one) and defined exclusively through the following information: the indexι which is computed from the low-dimensional control problem according to (29), the consensus parameter vˆι , which is actually the only information to be observed in high-dimension and enters the definition (30), and the mean consensus parameter v(t) = v(0) t 0 u h i (s)ds, which one easily computes by integration and sum of previous controls, and is also used in (30).
There are situations where the computation of the controls u h and u ℓ from Definition 4.1 turns out to be problematic. For instance, if there are only three agents and in the low dimensional system their consensus parameters form an equiangular and equinormal set of vectors at a certain time t, then u ℓ (and thus u h ) are not pointwise computable after t because of chattering effects. A method to avoid chattering in such situations is the use of sample solutions, as defined in [8].
be continuous in x and u as well as locally Lipschitz in x uniformly on every compact subset of R m ×U. Given a feedback control function u : R m → U, τ > 0, and x 0 ∈ R m we define the sampling solution associated with the sampling time τ of the differential systeṁ in the interval t ∈ [nτ, (n + 1)τ] recursively for n ∈ N, where u(t) = u(x(nτ)) is constant for t ∈ [nτ, (n + 1)τ]. As the initial value x(nτ) we use the endpoint of the solution of the preceding interval and start with x(0) = x 0 .
Let us fix a sampling time τ > 0. In the following we shall consider d-dimensional and k-dimensional Cucker-Smale systems for k ≪ d and feedback controls u h and u ℓ , respectively, as introduced in Definition 4.1. We shall focus on their sampling solutions (x, v) and (y, w) associated with τ as defined in Definition 4.3, hence for t ∈ [nτ, (n + 1)τ). Since we are only able to change the control at times which are multiples of τ, we define the switch-off time of the sampled control associated with the threshold Γ as otherwise set T s 0 := T if the set whose infimum is taken is empty. Because in the rest of the paper we shall deal only with sampled control, we will refer to T s 0 with T 0 , omitting the superscript. In the following, we shall show an estimate of the error between the projection of the sampled controlled high-dimensional system and the sampled controlled low-dimensional system, under the crucial assumption of the validity of the weak Johnson-Lindenstrauss property for M for the differences of trajectories of the system.
This result is the controlled counterpart of Theorem 3.2.

Suppose that W is non-increasing in time and that there exists a constant
Let ε ′ ∈ (0, 1) be so small that Define the following errors: Then it holds Proof. We argue by induction: We want to show that if (32) holds true at t ∈ {0, τ, . . . , nτ}, then it is also true for t ∈ [n, (n + 1)τ], in particular at t = (n + 1)τ, as long as nτ ≤T and nτ < T 0 , i.e., the control is not switched off before (n + 1)τ. Obviously, (32) holds for n = 0, this means at t = 0, and actually arguing in the same way as in the following inductive step, the base step is verified.
So, let t ∈ [nτ, (n + 1)τ] for n ∈ N 0 . First, we consider the estimate on the agent on which the control is acting. We shall estimate the decay in order to use Gronwall Lemma as in Theorem 3.2. We have For i ∈ {1 . . ., N} and i =ι we have We now focus on the control term: Since by assumption nτ < T 0 and (32) holds at nτ by the inductive hypothesis, it follows From assumption (JL2) and (37) it follows that the (strong) Johnson-Lindenstrauss property with parameter Inserting these estimates into (35) and using (36) we get Now we add the estimates for the derivatives of e v ι in (33) and e v i for i =ι in (34). By (JL1) the weak Johnson-Lindenstrauss property of parameter ε = ε ′ and δ = min ε ′ √ 2X(0)+α 2 for t ∈ [0, (n + 1)τ). Hence, for the first (uncontrolled) part of (33) and (34) we can use the same estimate as in (21). Thus, setting By integrating the estimates for d dt E v 2 (t) and d dt E x 2 (t) between nτ and t we get Now we apply the ℓ 1 -norm to the inequality and obtain The discrete Gronwall Lemma 7.3 applied for ∆ because the initial time errors are 0 by definition of the low-dimensional system. Hence using a trivial estimate of the ℓ 2 -norm by the ℓ ∞ -norm we conclude the induction and also the proof: Now we are in the position of showing that we can steer both the low-and high-dimensional systems simultaneously to the consensus region using the control defined in Definition 4.1 and Definition 4.3. We repeat that this means that we choose the index of the agent on which the sparse control acts from the lowdimensional system and use the same index for the control in the high-dimensional system. The challenge here is ensuring that the control coming out of this procedure drives the high-dimensional system to consensus as well. For this we need the estimates from Proposition 4.4 to show that the error of the projection of the high-dimensional system and the low-dimensional system stay near to each other. Additionally, from [7] it is known that the low-dimensional system will be steered optimally to the consensus region in finite time using the sampled version of the control introduced in Definition 4.1.

associated with the sampling time τ and (y(t), w(t)) be the sampling solution of the R k -projected Cucker-Smale system
it holds that at T 0 both the high-dimensional and the projected low-dimensional systems are in the consensus region defined by Lemma 1.3. Furthermore, we have the estimates Proof. First step: Let We shall prove the following implication for every n ∈ N such that nτ ≤T : if W (mτ) > 2∆ for every m = 0, . . . , n and the subsequent assumptions P 1 (n), P 2 (n), and P 3 (n) depending on n hold then also P 1 (n + 1), P 2 (n + 1), and P 3 (n + 1) hold true. So let us assume W (mτ) > 2∆ for every m = 0, . . . , n, which means that T 0 ≥ (n + 1)τ by definition of T 0 , and assume P 1 (n), P 2 (n), and P 3 (n). We begin by computing the derivative of V and W for t ∈ [nτ, (n + 1)τ]: The same computation yields . .
As we want to prove that P 1 (n + 1) holds, we need to deduce suitable lower bounds on φ h (t) and φ ℓ (t) to estimate the right-hand side of (45). To this purpose we need first do derive auxiliary bounds on the growth of V (t) and W (t), see formula (46) below: The general estimate with arbitrary vectors a and b 1 , . . . , b N , and B = 1 for all s ∈ [nτ, (n + 1)τ]. We use these bounds to estimate the right-hand side of (45) An integration between nτ and s ∈ [nτ, (n + 1)τ] yields With the help of (46) we now work out lower bounds for φ ℓ and φ h . It holds We now estimate the integrand. Froṁ for all s ∈ [nτ, (n + 1)τ).
Using (46) we get Plugging the last inequality into (47) we deduce The same calculations give us Together with (45) this yields By the assumption on τ ≤ τ 0 in (41) and by assumption (JL3) we have Applying this and the fact that W is decreasing in [0, nτ], which follows from P 1 (n), we use (48) to deduce the following upper bound Since we assumed that W (nτ) > 2∆, this shows that W is decreasing on [nτ, (n + 1)τ]. Additionally, using this former assumption we also can estimate for all t ∈ [nτ, (n + 1)τ]. Together with P 1 (n) this shows the stated assertion for W ′ (t) in P 1 (n + 1). In order to conclude the statement of P 1 (n) for V ′ (t) we need to take advantage of the estimates of the lower dimensional dynamics, of Proposition 4.4, and Lemma 2.5. By assumption P 3 (n) it holds that This estimate and assumption (JL2) allow us to use Lemma 2.5 for the vectors a i = v ⊥ i (nτ) and b i = w ⊥ i (nτ).
Second step: In the second step we shall prove that there exists an n * ∈ N 0 such that n * τ ≤T + τ and W (n * τ) ≤ 2∆ holds, whereT is defined as in (40). By definition of the threshold Γ = (2∆) 2 , this implies the switching of the control to 0 at time n * τ. Assume on the contrary that W ((n + 1)τ) > 2∆ (50) for all n ∈ N 0 with nτ ≤T . In the first step we showed that this yields in particular for t ∈ [0, (n + 1)τ) the estimates Hence for all t ∈ [0, (n + 1)τ) it holds Taking n 0 ∈ N 0 such that n 0 τ ≤T < (n 0 + 1)τ and using (JL3) we have This contradicts assumption (50). Thus there exists an n * ∈ N 0 such that n * τ ≤T + τ and it holds Third step: We shall show that (52) implies that the trajectories of both the low-and high-dimensional systems are in the consensus region identified by Lemma 1.3 at time n * τ, i.e., W (n * τ) ≤ γ(Y (n * τ)) and V (n * τ) ≤ γ(X(n * τ)).
We shall start considering the low-dimensional system. Since by (JL3) it holds and by the fact that the constant c from Lemma 2.5 is smaller than 1, we can estimate Y = 2Y (0)+ 2N 2 θ 2 W (0) 2 from below by 4X, where X = 2X(0) + 2N 2 c 2 θ 2 V (0) 2 . This together with (52), the definition of ∆ in (39), and P 2 (n * ) lead to It remains to prove that the high-dimensional system is in the consensus region identified by Lemma 1.3. Again, the conditions of Lemma 2.5 for the vectors a i = v ⊥ i (n * τ) and b i = w ⊥ i (n * τ) are fulfilled: as in the first step we have by Proposition 4.4 Mv ⊥ i (n * τ) − w ⊥ i (n * τ) ≤ ∆, and property (JL2) holds at n * τ. Thus, an application of Lemma 2.5 shows Hence the definition of ∆ in (39) and P 2 (n * ) yield V (n * τ) ≤ C∆ ≤ γ(X) ≤ γ(X(n * τ)).
We conclude that both the trajectories of the systems are in the consensus region at time n * τ. By Lemma 1.3 we are allowed to switch the control to 0 and both systems tend to consensus autonomously.
Fourth step: In the second and third steps we have proven that both systems enters the consensus region at time T 0 = n * τ, where n * τ ≤T + τ. By the computations in (51), we have the following estimate Moreover, by P 2 (n * ) we have This shows (43) and the proof is concluded.

HOW TO FIND A JOHNSON-LINDENSTRAUSS MATRIX
The main ingredient of Proposition 4.4 and Theorem 4.5 is the existence of a Johnson-Lindenstrauss matrix M ∈ R k×d for the trajectories. Let ∆ and ε ′ be as in Theorem 4.5 and let us recall what we explicitly needed. Assume thatT is an upper estimate for T 0 , the time to switch off the control. Then we need to define a matrix M ∈ R k×d such that the following properties hold: (JL2) Let ε = ε ′ and δ = ∆. For all n = 0, . . . , ⌊ T τ ⌋ + 1 and i ∈ {1, . . . , N} either we have (JL3) Let ε = 1/2. Then for all i ∈ {1, . . . , N} we have In order to prove conditions (JL2) and (JL3) one can directly invoke the Johnson-Lindenstrauss Lemma as discussed in Remark 2.2 while for (JL1) one can use its continuous version, Lemma 2.4, which boils down again to the application of the Johnson-Lindenstrauss Lemma on points sampled from the trajectories.
However, the Johnson-Lindenstrauss Lemma applies on points which are fixed a priori before generating randomly the matrix M ∈ R k×d . At a first look, due to the fact that the high-dimensional controls depend on the low-dimensional ones, which depend on the matrix M, the points on which we apply the Johnson-Lindenstrauss Lemma may be seen as directly depending on M as well.
In order to resolve this apparent paradox, we want to clarify that actually, due to the finite number of sampling times of the control and the finite number of agents, the number of possible realizable trajectories, and consequently the number of possible sampling points for the Johnson-Lindenstrauss Lemma, is finite and, actually, independent of the choice of the matrix M. Hence we are now left with the tasks of counting the number of such trajectories and of verifying that they fulfill the necessary Lipschitz continuity assumptions for applying Lemma 2.4.
Let us state again that the lower dimension k of M ∈ R k×d scales as where ε ∈ (0, 1) is the allowed distortion and N is the number of sampling points on all possible trajectories.
We focus first in (53) on the dependence of ε = min{ε ′ , 1 2 } on N, the number of agents, and the dimension d. According to (42) in Theorem 4.5 the estimate on ε ′ scales exponentially with N, i.e., ε ′ e −N , sinceT scales (at least) linearly with N, see (40) (for θ independent of N and d).
The positive aspect is that the estimate for ε ′ does not involve the dimension d.
In order to compute N in (53) we need first of all to estimate the number of realizable trajectories. Since we are insisting on sparse controls acting at most on one agent at the time, at every switching time nτ with nτ ≤ T 0 , i.e., as long as the control is not switched off, there are precisely only N possible controls and hence N possible branches of future developments of the trajectories. By Theorem 4.5 it holds T 0 ≤T + τ and thus we can estimate the number P of possible paths by Surprisingly, accounting for all the possible future branching is sufficient to show that actually we can already deterministically fix points a priory on which later apply an independently randomly drawn matrix!
(1) In order to fulfill (JL1) for every possible trajectory, an application of Lemma 2.4 yields an estimate of the number of necessary sampling points where the factor P · (T + τ) accounts for the number of trajectories and their time length, the factor N 2 accounts for the number of space trajectory differences x i − x j , and L x is an upper estimate for the individual Lipschitz constant, given by an estimate similar to (25) and the result from Theorem 4.5 that V is decreasing until T 0 as follows: (2) To fulfill (JL2) we shall now count the necessary sampling points at every switching time nτ. For n = 0 we have to consider N sampling points. For n = 1 there are already N possible paths to take into account and hence we need to take N 2 = N ·N sampling points. Going on in this way, at time nτ we have N n possible outcomes of the dynamical system and hence we have to take N n+1 = N n · N sampling points, as long as nτ ≤T + τ. Summing up the number of sampling points, we conclude (3) To fulfill (JL3) we need only N 3 = 2N sampling points. Hence we can eventually estimate N from above by Thus, we can choose the dimension k of a Johnson-Lindenstrauss matrix M ∈ R k×d as Since the estimate on ε ′ scales exponentially in N, i.e., ε ′ e −N , the dimension k grows exponentially in N. However, the positive aspect is that the estimate of k only scales logarithmically with the dimension d.
Hence we have shown that at least for very large dimension d ≫ 1 and relatively small number of agents N our dimensionality reduction approach will pay-off. As we show in Section 6, these theoretical bounds turn out to be by far over-pessimistic and, surprisingly, this method of dimensionality reduction for computing optimal controls can work effectively with lower dimensions k conspicuously smaller than d. Moreover, we show below ways to circumvent the exponential dependency of k with respect to N at the cost of using sequences of Johnson-Lindenstrauss matrices, see Remark 5.2 and Remark 5.3.
Remark 5.1. The log(d)-dependency only comes into play when we derive (JL1) from Lemma 2.4. One can actually use a similar argument as in Remark 3.5 in order to get rid of this logarithmic dependency. We do not elaborate further on this issue which appears to us just a mere and perhaps unnecessary technicality at this point.

Remark 5.2.
We observed that at least in the worst-case scenario here considered, the dimension k of the Johnson-Lindenstrauss matrix is blowing up exponentially with the number of agents N. A practical approach to circumvent this problem is to use not only one but a whole family of matrices M 0 , . . . , M ℓ . The matrix M 0 is used from time 0 up to a certain time t 0 and thus only needs to fulfill the Johnson-Lindenstrauss property in this short time interval. At time t 0 a new matrix M 1 is chosen. We have to observe the positions as well as the consensus parameters in high-dimension and project the system to low-dimension again, using M 1 , at t 0 . Then we use the new low-dimensional system to calculate the index of the control for the high-dimensional system from time t 0 up to time t 1 , eventually we again repeat the procedure with a new matrix M 2 etc. This approach has the advantage that it requires the Johnson-Lindenstrauss properties for M i , i = 1, . . . , ℓ, only for a short time interval. The disadvantage is that we have to observe the high-dimensional system and project it to low-dimension again at every time t i , i = 0, . . . , ℓ − 1.

Remark 5.3.
There is additionally the possibility to get rid of the mutual dependency of the matrix and the points of the trajectories using another family of matrices.
First, we take a matrix M 0 having the Johnson-Lindenstrauss properties (JL2) at t = 0 and (JL3). We compute the index i 0 of the control (as defined in Definition 4.1) at t = 0 using the projection M 0 .
Then we choose a matrix M 1 having the Johnson-Lindenstrauss properties (JL1) for all t ∈ [0, τ), (JL2) at t = τ, and (JL3). We compute the low-dimensional system using the projection M 1 in [0, τ] and let the control act on the agent i 0 calculated by M 0 . This is the main trick of the procedure: The points of the high-dimensional system in [0, τ] are not influenced by the matrix M 1 and hence the mutual dependency is removed, which means that there is no need of considering all trajectories P anymore, in contrast to (54). Now, from the low-dimensional system, computed by M 1 with control acting on i 0 in [0, τ], we choose the agent i 1 at τ on which the control will act in the next interval [τ, 2τ].
This procedure can be carried on using a family of matrices {M p : p = 0, . . . , ℓ} fulfilling the Johnson-Lindenstrauss properties (JL1) for all t ∈ [0, pτ), (JL2) at t = pτ, and (JL3). The agent i p on which the control shall act in the interval [pτ, (p + 1)τ) is computed at pτ using the low-dimensional system projected by M p , while the control acts on j q in [qτ, (q + 1)τ) for q = 0, . . . p − 1. Therefore, in [0, pτ] the index of the controlled agent and hence the trajectories of the high-dimensional system are independent of M p .

NUMERICAL EXPERIMENTS
In the following section we shall present some numerical experiments to confirm the theoretical observation of the interplay between the Cucker-Smale system, the dimension reduction by a Johnson-Lindenstrauss matrix and the quality of the control chosen from the low-dimensional (projected) system as defined in Definition 4.1.
For every ℓ = 0, 1, . . . we recursively solve the Cucker-Smale system with x 1 , . . . , and define the control as and u i (ℓτ) = 0 for every i =ι as long as V (ℓτ) > γ(X(ℓτ)) 2 . As soon as V (nτ) ≤ γ(X(nτ)) 2 is satisfied for some n ∈ N 0 , we set T 0 := nτ and the control to zero. This control was shown to be optimal in the work [7,Section 4] in terms of maximizing the rate of convergence to the consensus region, and shall be therefore employed as a benchmark to test the effectiveness of the other controls. (U) Uniform control: this control strategy acts on every agent simultaneously using a control pointing towards the mean consensus parameter with norm equal to θ /N as long as V (ℓτ) > γ(X(ℓτ)) 2 . This means Again, as soon as V (nτ) ≤ γ(X(nτ)) 2 is satisfied for some n ∈ N 0 , we set T 0 := nτ and the control to zero. (R) Random sparse control: as long as V (ℓτ) > γ(X(ℓτ)) 2 , at every sampling time ℓτ we choose an index j ∈ {1, . . . , N} at random following a uniform distribution. Then we define the control as and u i (ℓτ) = 0 for every i = j.
As in the above controls, as soon as V (nτ) ≤ γ(X(nτ)) 2 is satisfied for some n ∈ N 0 , we set T 0 := nτ and the control to zero. (DR) Dimension reduction sparse control chosen by the low-dimensional projected system: here u i (ℓτ) = u h i (ℓτ) is defined as in Definition 4.1. In order to test the performance of this control, and to avoid the stability complications arising from finite precision approximation, we calculate the trajectories of both the high-and the low-dimensional system: if the high-dimensional system enters the consensus region first (i.e., V (nτ) ≤ γ(X(nτ)) 2 for some n ∈ N 0 ), then we set the control to zero and T 0 := nτ. Instead, if the system in low dimension reaches the consensus region first (i.e., W (ℓτ) ≤ γ(Y (ℓτ)) 2 for some ℓ ∈ N 0 ), then we switch the control for the high-dimensional system to the random sparse control strategy (R) until V (nτ) ≤ γ(X(nτ)) 2 is eventually satisfied for some n ∈ N 0 .
Notice that all the controls above are time sparse, and only the uniform control strategy (U) is not componentwise sparse.
Remark 6.1. The reasons for using random sparse control at the end phase of (DR) in the case that the low-dimensional system reaches the consensus region before the high-dimensional one are of numerical and computational nature. In fact, the step width τ computed in Theorem 4.5 to ensure convergence to the consensus region in finite time is often way too small, and in our numerical experiments we need to exceed it. Moreover, as soon as the high-dimensional system enters the consensus region, the difference between consensus parameters becomes so small to render, for such a large time step, the choice of the sparse control highly inaccurate, leading to inefficient chattering phenomena, without steering the highdimensional system to consensus.
As an alternative, we employ the random sparse control as soon as the low-dimensional system has reached the consensus region (if this happens before the high-dimensional system does). This procedure has the advantage of always steering the system to the consensus region, and it only slightly affects the time that the high-dimensional system takes to reach the consensus region, since it is usually necessary for a very short time (provided that the dimension of the Johnson-Lindenstrauss matrix is sufficiently large).
6.1. Content of the Numerics. The following are the driving issues concerning the controls introduced above: (1) Does the control steer the system to the consensus region as defined in Definition 1.5 in finite time?
(2) How long does it take to steer the system to the consensus region?
In the following, for every experiment we fix the number of agents N, the dimension d, the control strength θ , the power of the interaction kernel β as in (3), the step width τ, and in particular the configuration (x 0 , v 0 ) at the beginning. We report the maximal step width τ 0 (theoretically) allowed by formula (41), and the estimate from above for the time to consensusT (taken from Theorem 4.5). We also report the quantity V (0) − γ(X(0)) 2 , accounting for the discrepancy of the original configuration from the consensus region.
For every configuration we shall present a table containing the performances of the different controls, measured by the time employed by the high-dimensional system to reach the consensus region T 0 , and the time T 0.5 it takes to halve the "distance" to the consensus region: this means that T 0.5 is the minimal time satisfying To test the performances of the control (DR) we shall use a variety of Bernoulli random matrices M ∈ R k×d for different dimensions k. For any of these dimensions, we also report the initial discrepancy W (0) − γ(Y (0)) 2 from the consensus region of the projected system, and the switching time T S at which the random sparse control replaces the original dimension reduction control strategy (if the high-dimensional system enters the consensus region before the low-dimensional one, we set T S := T 0 ). Figure 2 shows the first two coordinates of the initial configurations used in each section. 6.2.1. Configuration with one outlier. We take into account N = 9 agents in dimension d = 100 for which the j-th spatial component of the i-th agent is given by the formula The result obtained is a set of points non-homogeneously distributed over an almost spherical configuration, which, projected in R 2 , resembles an ellipse. A similar configuration is used for the consensus parameter of each agent, for which we have the initial consensus parameter of the N-th agent is instead the vector with all entries set equal to 10. We first observe that if the system is left alone, with no control acting on it, the quantity V (t) − γ(X(t)) 2 decreases only from 1031.3 to 946.2 at time 100, from which we can infer that the system would not reach the consensus region without an external intervention. Notice that the Sparse Control (SP) is the fastest; this shall be a common feature of all our experiments, as expected by its optimality shown in [7,Proposition 3]. The uniform control (U) and the random control (R) perform similarly and both take more than three times longer to reach the consensus region as (SP). The control (DR) has comparable performances to (SP), and very surprisingly even when projecting to dimension k = 1 the system reaches the consensus region faster than with the controls (U) and (R).  In Figure 3, we illustrate the time T 0 the system takes to reach the consensus region as a function of the projected dimension k for the control (DR). If multiple tests are made with the same dimension k, we consider an average of the results. We also report, in different colors, the values of T 0 we obtain with the control (SP) and the control (R) (blue and green line, respectively). It can be seen how the performance of (DR) is basically the same as (SP) even if we reduce the dimensionality by 80% .
Up to now, we don't have any procedure to test if the randomly generated matrix we use to implement the control (DR) satisfies the requested properties of Theorem 4.5. Moreover, to get a precise answer, we would need to gather information which belongs to the high-dimensional system beyond time 0, something which we are not allowed to know in advance. We claim, however, that the quantity, which we call the exactness of the matrices at 0, is a measure of how good the matrix M is. To show that, we have considered six different M ∈ R k×d for k = 10 and their respective time to the consensus region: we report in Figure 4 the time to consensus for the system in function of the exactness of the matrices at 0. A correlation between how E M is close to zero and how effective is the control, is clearly visible. 6.2.2. Configuration generated by a geometric distribution. In this section we consider a system where the locations are distributed as in the example before, while the consensus parameters are given by the formula This results in a more heterogeneous situation at the beginning. We also increase the dimension d to 500, the strength of the force θ to 20 and β to 0.65. If we let the system free to evolve, the quantity V (t) − γ(X(t)) 2 decreases only from 1195.5 to 1122.3 at time 30. The slowness of the decay implies the necessity of a control. The uniform control (U) and the random control (R) perform similarly, as in the example before. However, the control (DR) overwhelms both when the projected dimension is large enough (k ≥ 10). Figure 5 shows the performance of (DR) in function of k and compares it with (R) and (SP).    6.2.3. Configuration generated by a Cauchy distribution. For the system considered in this section, the initial configuration is calculated as follows: the j-th spatial component of the i-th agent is the value of a normal distribution with expected value 0 and standard deviation 1, independently selected for different i and j. The j-th component of the consensus parameter of the i-th agent is ruled by a Cauchy distribution, whose density is given by .
We choose the height to be b = 1/40 (to get a reasonably large V (0) in the computations). The initial configuration is generated once and then fixed for all the experiments with the different controls (SP), (R), (U) and (DR). Below we list the parameters we fix for this section: As in the examples before, the control (DR) clearly outperforms both (R) and (U), and in this case even for k = 1. Figure 6 compares the effectiveness of the controls (DR) (in function of k), (R) and (SP). We point out that, even in this situation, a control is necessary to steer the system to consensus since the quantity V (t) − γ(X(t)) 2 decreases only from 464 to 436.5 at time 50 if no control is applied.
6.3. Examples in which the performances of (R) and (U) are comparable to (DR).
6.3.1. Configuration generated by a normal distribution. In this example, the j-th spatial (resp., consensus parameter) component of the i-th agent is independently generated by a normal distribution with expected value 0 and standard deviation 10 (resp., 8). As in Section 6.2.3, we generate the initial configuration once and we use it for all the experiments with the controls.
The parameters used for this configuration are listed in the table below, and after it we report the performances of the various controls: This time the controls (R) and (U) are quasi-optimal, performing in almost the same way as the benchmark control (SP). Figure 7 shows that the control (DR) behaves similarly to (R) and (SP) up to a reduced dimension k = 50 (hence up to 10% of the original dimension): from that point on the efficiency rapidly deteriorates, making the control unfeasible.
Time to consensus for (DR) in function of the projected dimension k, and comparison with (SP) and (R)    6.3.2. Uniform configuration. As last example we consider a configuration similar to the one of Section 6.2.1: the j-th spatial and consensus parameter components of the i-th agent are both given by (x i ) j = (v i ) j = cos(i + j √ 2) for j = 1, . . . , d and i = 1, . . . , N.
The following tables report the parameters of the configuration taken into account and the outcomes with the different controls: As before, (R) and (U) perform similarly to (SP); (DR) is able to compete up to a dimension reduction of 25% of the original dimension (k = 50). From there on, its efficiency steadily declines. This phenomenon can be witnessed in Figure 8.

Conclusions from the experiments.
In this section we summarize the conclusions that can be drawn from the list of experiments reported in this numerical section.
(1) A common feature of all the experiments is that the control (DR) is highly competitive with respect to the benchmark control (SP) up to a reduced dimension which is 10% of the original one. Indeed, in this case (DR) takes between 5 to 22% more time than (SP) to steer the system to consensus. This suggests that the approach of dimension reduction works in general much better practically than theoretically, and that our analysis in Theorem 4.5 is quite conservative. (2) The dimension of the matrix is not the only necessary ingredient to obtain a competitive control: a matrix should also fulfill the Johnson-Lindenstrauss property for certain points of the high dimensional system. Since to check the latter condition we need information regarding the future development of the system, we need to design different criteria to distinguish "good" matrices versus "bad" ones. In Section 6.2.1, we have seen that an efficient sieve is the notion of exactness of a matrix at 0: the smaller this value is, the better the control shall perform, according to the empirical data we have gathered. (3) There is no proof yet that random sparse control (R) forces the system to enter the consensus region almost surely for every configuration, but numerical experiments suggest this behavior. Furthermore, it is interesting to notice that the time to consensus obtained by the use of the uniform control (U) is always very close to the one we get by using the random sparse control strategy (R): this strongly hints that the expected value of the time to consensus of the random control (R) could be very near or even equal to the one of (U). (4) A common feature of the last two examples is the "relative homogeneity" of the consensus parameters with respect to the mean consensus parameter: by this we mean that the consensus parameters of all the agents compete to be the farthest away from it, and thus the sparse control will jump from one to another continuously, showing a chattering behavior. In contrast, all the first three experiments feature a relatively small subgroup of agents whose consensus parameters are the farthest away from the mean consensus parameter by a considerable margin. These are the case where the controls (SP) and (DR) are substantially more efficient than (R) and (U): by firmly acting on the most "badly behaving" agents, we are able to steer the system to consensus faster than employing control strategies which are blind to the structure of the group. It is thus advisable to use sparse strategies only when the consensus parameters of the agents are sufficiently "asymmetric" at the starting point.

APPENDIX
We need a technical lemma which can be found also in [7]. But with a slightly different argument, we could improve the inequalities there and get rid of an N 2 , which is important for estimating the time of entrance in the consensus region of controlled Cucker-Smale systems, depending on N. Proof. Integrating the first assumption one has Furthermore, to prove the second statement of the lemma we observe On the other hand, using the (vector-valued) Minkowski inequality in the second step and furthermore by (x + y) 2 ≤ 2x 2 + 2y 2 it follows X(t) ≤ 2X(0) + 2 η 2 V 2 (0).

Gronwall's estimates and variations on the theme.
We need to employ at several places Gronwall's estimates. However, besides the classical one, we need to develop a variation for piecewise continuous evolutions. Both are reported as follows. (1) ρ is non-decreasing and bounded on I, (2) β 1 is non-decreasing and continuous on I, (3) β 2 is non-negative and continuous on I and (4) u be non-negative and continuous on I. Assume that for every t ∈ [0, T ] it holds: Let n ∈ N 0 such that nτ ≤ t < (n + 1)τ and assume u(t) ≤ (ρ(t) − ρ(nτ)) + (1 + β 1 (t) − β 1 (nτ))u(nτ) +