Filippov trajectories and clustering in the Kuramoto model with singular couplings

We study the synchronization of a generalized Kuramoto system in which the coupling weights are determined by the phase differences between oscillators. We employ the fast-learning regime in a Hebbian-like plasticity rule so that the interaction between oscillators is enhanced by the approach of phases. First, we study the well-posedness problem for the singular weighted Kuramoto systems in which the Lipschitz continuity is deprived. We present the dynamics of the system equipped with singular weights in all the subcritical, critical and supercritical regimes of the singularity. A key fact is that solutions in the most singular cases must be considered in Filippov's sense. We characterize sticking of phases in the subcritical and critical case and we exhibit a continuation criterion of classical solutions after any collision state in the supercritical regime. Second, we prove that strong solutions to these systems of differential inclusions can be recovered as singular limits of regular weights. We also provide the emergence of synchronous dynamics for the singular and regular weighted Kuramoto models.


Introduction
Synchronization is the natural collective behavior arising from agents-based interactions given by periodic rules. These rhythmical motions can be easily observed in various biological complex systems such as flashing of fireflies, beating of cardiac cells, etc. One of the most significant examples of synchronization appear in neurons. Associative or Hebbian learning [24] proposes an explanation for the adaptation of neurons in the brain during the learning process. Such mechanism is founded in the assumption that synchronous activation of cells (firing of neurons) leads to selectively pronounced increases in synaptic strength between those cells. The consequence is that the pattern of activity will become self-organized. In Hebb's words: Any two cells or systems of cells that are repeatedly active at the same time will tend to become associated, so that activity in one facilitates activity in the other. In neuroscience, this processes provide the neuronal basis of unsupervised learning of cognitive function in neural networks and can explain the phenomena that arises in the development of the nervous system.
Since Kuramoto proposed a mathematical model for coupled oscillators in [28,29], the synchronization has received a lot of attention and has been studied extensively in various disciplines from this point of view [1]. In the classical Kuramoto model, the system of oscillators has an all-to-all coupling with uniform weights given by a constant coupling strength K: where Ω i 's are the natural frequencies of oscillators. However, the uniform and constant couplings are a bit restrictive to explain the complicatedness of phenomena. Thus, it is more interesting to consider a generalization of the Kuramoto model which is equipped with plastic couplings introduced in [3,18,21,34,39,41,42]: where K ij is the coupling between i-th and j-th oscillators which has its own dynamics depending on the phase configuration. The coupling K ij is assumed to be where a ij ∈ [0, 1] measures the degree of connectedness between the i-th and j-th oscillators. They will be allowed to vary adaptively relying on the associated phases θ i and θ j , via the dynamic learning law for some plasticity function Γ. Here, η is regarded as the learning rate parameter such that a small η delays the adaptation of weight a ij . According to the choice of the function Γ, the dynamics of the system (1.2) shows various scenarios. In neural networks systems, the Hebbian-type dynamics is considered for the learning algorithm of couplings between oscillators. Such learning law amounts to saying that the weight of coupling increases if the phases of oscillators are close to each other. For example, in [21,34,42], Γ is assumed to be Γ(θ) = cos θ so that attraction between near oscillators is reinforced whereas repulsive interaction arises between apart phases. On the other hand, anti-Hebbian type is also considered such as Γ(θ) = | sin θ| in [21,41]. In this case, the synchronization emerges slowly due to the reduction of weight for nearby oscillators. Other types of adaptive rules are considered in [18,39]. We will consider a Hebbian-like Γ for the dynamics of adaptive coupling so that the coupling is enhanced by approach of phases. Assume that the Hebbian-like plasticity function Γ is given by where σ ∈ (0, π), ζ ∈ (0, 1] and |θ| o is the orthodromic distance (to zero) over the unit circle, that can be defined by |θ| o := |θ| forθ ≡ θ mod 2π,θ ∈ (−π, π].
Here, the parameter c α,ζ := 1 − ζ −1/α has been chosen so that whenever two phases θ i and θ j stay at orthodromic distance σ or larger, then the adaptive function Γ predicts a maximum degree of connectedness not larger than ζ between such oscillators.
Since the plasticity function Γ in (1.4) is Lipschitz-continuous, we can apply the Tikhonov's theorem [27] to (1.2)-(1.3) in order to rigorously derive the fast learning regime η → +∞. Then, we arrive at the following Kuramoto model with weighted coupling structure: that will play a central role in our work. If either the parameter α = 0 or ζ = 1, then our plasticity function (1.4) becomes 1 everywhere. In such case, our system (1.5) reduces to the classical Kuramoto model (1.1). Hence, we will assume that α > 0 and ζ ∈ (0, 1) from now on. Our main interest is to analyze the system (1. In the next section we will derive this new singular model from the regular one through a singular limit of the parameters. In the regular case (1.4), Γ is Lipschitz-continuous function and the system (1.5) becomes the Kuramoto model with regular weights depending on the phase configuration. Then, the well-posedness of global-in-time classical solutions is standard. However, in the singular case (1.6), the system (1.5) has a singular weight and we must deal with non-Lipschitz right hand side, where the Cauchy-Lipschitz theorem cannot guarantee the existence and uniqueness of global-in-time solutions. We will deal with three different regimes of the singularity α ∈ (0, 1 2 ), α = 1 2 and α ∈ ( 1 2 , 1) that we respectively call the subcritical, critical and supercritical cases.
The main results of this paper are listed as follows. First we study, the well-posedness of the singular weighted system. Depending on the value of α, the properties of the right hand side of (1.5) vary. Specifically, in the subcritical regime, we deal with systems of ODEs with Hölder-continuous right hand side while we face discontinuous right hand side of both bounded and unbounded type in the critical and supercritical cases. In addition, the type of uniqueness that we can expect in these systems is one-sided. Namely, a cluster of phases may eventually arise after a finite-time collision and oscillators belonging to such cluster might stay stuck together. This is a phenomenon that was recently found in other types of agent-based systems like the Cucker-Smale model with singular weights, see [37,38].
Our second result characterizes the explicit conditions for sticking in the subcritical and critical regimes. In the former case, we show that only clusters of oscillators with the same natural frequencies can stick together. Nevertheless, in the latter case, cluster of oscillators with different natural frequencies may stick together as long as such frequencies fulfill an appropriate condition. Regarding the supercritical case, the analogue sticking condition becomes trivial and we can show a continuation procedure of classical solutions after finitetime collisions. Namely, after a cluster is formed in finite time, the cluster keep stuck together no matter which are the natural frequencies of the involved oscillators.
The third result consists in showing that these singular weights are physically relevant. Specifically, we will show that the system (1.5)-(1.6) with singular weights can be obtained as a rigorous singular limit of the regular model (1.4)- (1.5). Again, the strategy will differ in each of the regimes. For the subcritical case, similar tools to those in [37,38] for the singular Cucker-Smale model can be adapted. What is more, we can even obtain an analogue gain of extra W 1,1 piece-wise regularity of the frequencies of oscillators. For the critical and supercritical cases we cannot resort on the same ideas. Hence, we use the underlying gradient-flow structure to gain compactness of frequencies. Identifying the limit will be the heart of the matter in this part.
Our last result faces the emergence of synchronization in each regime of the parameter α. For identical oscillators, we show the emergence of complete phase synchronization in finite-time under appropriate assumptions on the initial diameter of phases. At least in the subcritical regime, where frequencies become more regular, we study the asymptotic emergence of complete frequency synchronization of non-identical oscillators. Also, we study the stability properties of collision-less phase-locked states in all the three regimes.
The techniques are firstly inspired on a combination of results for the classical Kuramoto model, but these techniques require of a new perspective allowing for singular interactions. In this purpose, we introduce a well-posedness result "à la Filippov" that is valid for systems of ODEs with discontinuous right-hand sides. Specifically, we will rely on the study of absolutely continuous solutions of the differential inclusions associated with the Filippov's set-valued map. The values of such map are convex polytopes that are bounded and unbounded in the critical and supercritical case respectively. Hence, the classical theory can be used in the former case whereas new ideas are developed for the latter case. Also, we prove some one-sided uniqueness results for non-Lipschitzian interactions that rely on the structure of interaction kernel near the points of loss of Lipschitz-continuity. For the stability of equilibria, Lyapunov's first method entails a similar scenario to that of the classical Kuramoto model in the critical and supercritial regime. On the other hand, the subcritical regime requires a center manifold approach that yields the stability of the corresponding equilibria. What is more interesting is that we can still get some accurate control of the diameter of the system of singularly weighted coupled oscillators. Such control amounts to the corresponding finite-time and asymptotic synchronization for the identical and nonidentical cases. Unfortunately, the emergence of phased-locked states independently of the initial configurations cannot be derived as in previous results for the classical Kuramoto model (see [19]) because it is not clear whether the Lojasiewicz gradient inequality [32] holds for non-analytic systems with gradient structure like this. Regarding the singular limit of the regular coupling weights, the main goal is to prove that solutions of the regularized system converge towards absolutely continuous trajectories that fulfill the differential inclusion. For that, an appropriate H-representation (half-space representation) of such convex polytopes is obtained through convex analysis techniques. Then, the preceding gain of compactness of frequencies along with such geometric representation of the Filippov map will provide the necessary tools for the singular limit to work in the critical and supercritical regimes.
The rest of the paper is organized as follows. In Section 2, we present definitions, basic properties of the weighted Kuramoto model, the underlying gradient-flow structure, the passage from regular to singular plasticity function and the expected macroscopic equations. In Section 3, we study the system with singular weights and we prove the well-posedness theory in each regime. In Section 4, we prove the rigorous singular limit in every regime and compare the model with previous results derived in other agent-based systems, in particular we compare with Cucker-Smale models. In Section 5, we show the synchronization for the singular weighted system. In Appendix A, we will recover some classical tools of the Kuramoto models that we apply to show the emergence of synchronization in the regular weighted system for the sake of clarity. Appendix B shows the proofs of the H-representation of the Filippov set-valued map in the critical and supercritical cases. Finally, Appendix C introduces the explicit characterization of the sticking conditions.

Basic properties and definitions.
In this section, we study the basic properties of the weighted Kuramoto system and introduce some related results that will be useful in the following sections. For simplicity, let us denote the interaction kernel by h(θ) := Γ(θ) sin θ (here Γ can be any even function, e.g., (1.4) or (1.6)). Then the system (1.5) can be expressed as For simplicity, we shall sometimes use vector notation in (2.1). We define the vector field H = H(Θ) = (H 1 (Θ), . . . , H N (Θ)) whose components read Then, (2.1) can be restated as (

2.3)Θ = H(Θ).
Since h is an odd function, by taking sums on both sides of (2.1), we have i.e., the average of frequencies is conserved. Thus, without loss of generality, we may assume that the average of natural frequencies is zero, 1 N N j=1 Ω i = 0, in order to focus on the fluctuation from the constant average motion.
For the discussion in Section 4, we briefly introduce the second order augmentation of Kuramoto model, see [16]. By taking one more derivative on the system (2.1), we have the second order model For both systems (2.1) and (2.4) we have the following equivalence. (1) If Θ = (θ 1 , . . . , θ N ) is a solution to (2.1) with initial data Θ 0 , then (Θ, ω :=Θ) is a solution to (2.4) with well-prepared initial data (Θ 0 , ω 0 ): (2) If (Θ, ω) is a solution to (2.4) with initial data (Θ 0 , ω 0 ), then Θ is a solution to (2.1) with natural frequencies: For the regular cases (1.4), the proof can be found in [16]. However, one has to take a special care with the time regularity of solutions in the singular cases (1.6) before we take derivatives in (2.1). In that later case with α ∈ (0, 1 2 ), the type of solutions to be considered for (2.1) are absolutely continuous solutions while, for (2.4), solutions have to be taken in weak sense with C 1 and piecewise W 2,1 regularity (see [37] for this concept of solution for the discrete Cucker-Smale model with singular influence function). The well-posedness of both singular systems (2.1) and (2.4) will be established in Sections 3 and 4 (see Theorems 3.1, 3.3, 4.1, 4.2 and Remark 4.1) and comparisons with Cucker-Smale models with singular influence function will be given in Subsection 4.4. For the sake of completeness, we recall the different definitions of synchronization, [15].
be the phase configuration of oscillators of which the dynamics is governed by the system (1.5).
(1) The system shows the complete phase synchronization asymptotically if, and only if, the following condition holds: (2) The system shows the complete frequency synchronization asymptotically if, and only if, the following condition holds: (3) The system shows the emergence of a phase-locked state asymptotically if, and only if, there exist constants θ ∞ ij such that lim Analogue definitions of synchronization will be considered if, instead of asymptotically, the emergent dynamics takes place in some finite time T . In such case ∞ will be replaced by such finite time T in the above definitions.
We note that the complete phase-synchronization is a special case of phase-locked state. It is obvious that if the solution shows the emergence of phase-locked state, then it implies the complete frequency synchronization. However, the converse is valid when the frequency synchronization occurs fast, i.e., integrable decay of frequency differences.
2.2. Singular weighted model. In this part, we introduce the formal derivation of the Kuramoto model with singular weights as singular limit of the regular weighted model. We note that the regular weighted model is (2.1) with interaction kernel given by Recall that the degree of connectedness is smaller than ζ for interparticle distances larger than σ and α imposes the fall-off of the interactions. Consequently, σ measures the effective range of interactions. Similarly, the parameter K measures the maximum strength of interactions. Hence, one can propose the following scaling Or more specifically, using the change of variables where ε is a dimensionless parameter, we arrive at the next scaled system where the scaled interaction kernel now reads If we formally take limits when ε → 0, then we arrive at the desired singular weighted Kuramoto model, whose singular interaction kernel is All these arguments are heuristic. However they might become rigorous depending on the value of α. For a rigorous derivation of the singular limit in all the subcritical, critical and supercritical regimes, see Section 4.

2.3.
Emergence of clusters: collision and sticking of oscillators. In this part we introduce some notation that will be used along the whole paper. We will denote the set of pair-wise collisions of the i-th and j-th oscillators by whereθ denotes again the representative of θ in (−π, π]. Then, the set of collisions reads Consider any phase configuration of the N oscillators, i.e., We will say that the i-th oscillator collides with j-th oscillator when Θ ∈ C ij and we will say that Θ is a collision state when Θ ∈ C. In order to manage with collisions, let us define the following binary relation Since it is an equivalence relation, we can denote its equivalence classes by As it is apparent from the definition, C i (Θ) is the set of indices of collision with the i-th oscillator. Then, Θ is a collision state when some of its equivalence classes is non-trivial. Consequently, each of the equivalence classes can be regarded as a cluster of oscillators. Let us denote by E(Θ) the family of all the different equivalence classes that is, clusters. It is apparent that E(Θ) establish a partition of {1, . . . , N }, that we will call the collisional type of Θ. For simplicity of notation, we will enumerate the equivalence classes in such a way that the minimal representatives in each of them, i.e., ι k (Θ) := min E k (Θ), are increasingly ordered. κ(Θ) := #E(Θ) will denote the total amount of clusters in such a phase configuration Θ and we will denote the size of the k-th cluster, that is the number of particles which form the k-th cluster, by n k (Θ) := #E k (Θ), for each k = 1, . . . , κ(Θ).
Assume now that not only do we know some phase configuration at a particular time, but a whole absolutely continuous trajectory t → Θ(t) = (θ 1 (t), . . . θ N (t)) ∈ R N governing the dynamics of the N oscillators. Then, as long as it is clear from the context, we will simplify the notation and will denote Similarly, time may be omitted in our notation for simplicity. Apart form collisions into clusters, it is important to characterize when those clusters remain stuck together. If the i-th and j-th oscillators have collided at time t, we will say that they stick together when Then, we can define the set of indices of sticking with the i-th oscillator by In Section 3 we will introduce some results about the clustering and sticking behavior of solutions to our singular weighted Kuramoto model (2.5) with ε = 0.
2.4. Gradient flow structure. In this part, let us remark that our system (2.1) can be equivalently turned into a gradient flow system: governed by a potential V = V (Θ) that is defined by Here, W is the primitive function of h such that W (0) = 0, i.e., The function W can be regarded as the interaction potential of binary interactions while V int stands for the total interaction potential due to binary interactions. This approach is obviously formal and relies on specifying the regularity of the plasticity function Γ. For instance, if we choose Γ to be analytic, then (2.1) can be regarded as a gradient flow system with analytic potential V . In such particular case, one can oversimplify the proof of emergence of synchronization like in the classical Kuramoto model, see [17]. Specifically, some boundedness property of the trajectory is all we need to ensure the exponential convergence towards a phase-locked state by virtue of the Lojasiewicz inequality for analytic functions. For the choices of plasticity function of interest in this paper, i.e., (1.4) and (1.6), analyticity is missing and the same approach does not necessarily work. Nevertheless, we will focus on values of the parameter α that belong to the range α ∈ (0, 1) and, consequently, V will be globally a continuous function that is smooth outside the set of collisions. Since in general we are missing either analyticity or convexity of V , the gradient flow structure will not be used along this paper, except in Subsections 4.2 and 4.3.

2.5.
Kinetic formulation of the problem. In this part, we formally introduce the expected kinetic models associated with (2.5). The classical arguments to rigorously prove the mean field limit N → ∞ are based on the analysis of propagation of chaos in the system as the number N of particles becomes large, see [26,31]. On the one-hand, for every ε > 0 the mean field limit is governed by the following Vlasov-McKean equation with regular kernels for the distribution function f ε = f ε (t, θ, Ω) of oscillators where periodic boundary conditions in the variable θ are assumed. Similarly, when ε = 0 the corresponding mean field limit is governed by the corresponding Vlasov-McKean equation with singular kernels for f (t, θ, Ω), namely, with analogous periodic conditions in θ. The derivation of the mean field limit is much more involved in this latter case. Indeed, it requires a sharp analysis, in the same sense as in related singular models like [9,23,33], and will differ for each of the regimes of the exponent α. Let us briefly comment on the main idea supporting the above mean field limit. Fix the following empirical measure as initial condition in (2.13) associated to some initial configuration Θ N 0 = (θ N 1,0 , . . . , θ N N,0 ). Because of the results in this paper, the Filippov solution Θ N (t) = (θ N 1 (t), . . . , θ N N (t)) to the singular discrete model allows considering the next measure-valued solution to (2.13) The ultimate effort to be done is to show that any weak limit f of µ N consists in a measurevalued solution in some generalized sense to the singular macroscopic system. For a comprehensive analysis of the singular macroscopic model (2.13) see [36]. See [33], where a close approach has been developed in the Cucker-Smale model for the smaller range o parameters α ∈ (0, 1 4 ) of the subcritical regime. Analogue results in aggregation models and classical Kuramoto model has been studied in [9,11,12] and [7,31] respectively.

Well-posedness of singular interaction
We now consider the Kuramoto model with singular coupling Γ, that we introduced in Section 2 as a singular limit of regular weighted coupling Recall that in the limit ε → 0 of the regular kernel h ε we recover the singular interaction kernel of the model, i.e., For simplicity, we will forget about the constant c = c α,ζ = 1 − ζ −1/α . Then, we can rewrite the system (3.1) into Regarding the parameter α, it belongs to the interval (0, 1) to allow for mild singularities. Note that the kernel is continuous for α ∈ (0, 1 2 ), it exhibits a jump discontinuity at θ ∈ 2πZ for α = 1 2 , and it shows essential discontinuities for α ∈ ( 1 2 , 1), see Figure 1. In this section, we will focus on developing the well-posedness theory of such system (3.1) of coupled ODEs. Note that the uniqueness is not a trivial even in the subcritical case. Indeed, due to the choice of singular plasticity function, the right hand side of the system (3.2) does not satisfy Lipschitz-continuity in any of the subcritical, critical and supercritical regimes. Thus, we need to inspect the existence and uniqueness of the solution to the system (3.1) before we proceed the study of synchronization. For the following discussion, we recall the definition of the vector field H = H(Θ) in (2.2) that allows dealing with the system (3.2) in the vector form (2.2).
3.1. Well-posedness in the subcritical regime. In the subcritical case, namely α ∈ 0, 1 2 , the vector field H = H(Θ) in (2.2) is continuous. Therefore, it is a clear consequence of Peano's theorem that (3.1) has a local-in-time solution for every initial configuration Θ(0) = Θ 0 ∈ R N . Unfortunately, note that h(θ) exhibits an infinite slope at the phase values θ ∈ 2πZ and then, the classical Cauchy-Picard-Lindelöv theorem does not apply since H = H(Θ) is no longer a Lipschitz-continuous vector field. Nevertheless, one can still use an easy trick: it is enough to show that near the points of loss of Lipschitz-continuity our vector field can be locally split into the sum of a decreasing vector field and a Lipschitzcontinuous vector field, then ensuring the local one-sided Lipschitz condition that is enough to obtain a one-sided uniqueness result.
for every couple x, y ∈ V . Then, the following initial value problem (IVP) associated with any initial configuration x 0 ∈ R N enjoys one global-in-time solution, that is unique forward in time ẋ = F (x), t ≥ 0, Since the proof is classical, we omit it here. Let us now apply such result to our case of interest. To do so, it is enough to introduce a decomposition of the vector field H = H(Θ) in the Kuramoto model (3.2). We first set the following split of the interaction function h = h(θ). First, consider h andθ ∈ 0, π 2 such that h := max 0<r<π h(r) and 2α sinθ =θ cosθ.
is not a Lipschitz-continuous function because of the infinite slope at θ ∈ 2πZ, one can locally decompose it around such values in terms of a decreasing function f (θ) and a Lipschitz-continuous function g(θ).
Finally, consider any value Θ * = (θ * 1 , . . . , θ * N ) ∈ R N to locally decompose H around it. For Θ = (θ 1 , . . . , θ N ) in a small enough neighborhood V of Θ * in R N , we set where we recall that C i (Θ * ) stands for the set of indices of collision with the i-th oscillator in the phase configuration Θ * , see Subsection 2.3.
Proof. The decomposition of H is clear by virtue of the decomposition (3.3) and the definitions (3.4)-(3.5). Let us then focus on the last three properties. Fist, consider Θ = (θ 1 , . . . , θ N ), Θ = ( θ 1 , . . . , θ N ) ∈ R N in a small enough neighborhood of Θ * . Without loss of generality, we will directly assume that θ i − θ j and θ i − θ j belong to (−π, π]. In other case, we just need to work with the representatives. On the one hand, Changing the indices i and j we obtain where the properties of the sets C i (Θ * ) along with the antisymmetry of f have been used in the last line. Taking the mean value of both expressions and using that f is decreasing, we arrive at and, as a consequence, to the monotonicity of F . On the other hand, Since g is Lipschitz-continuous in (−π, π) and h is locally Lipschitz-continuous in (−π, π) \ {0}, then there exists some constant M = M (V) so that for every index i ∈ {1, . . . , N }, thus yielding the Lipschitz-continuity of G in V. The last part is a simple consequence: Namely, consider x, y ∈ V and note that where the preceding two properties have been used along with the Cauchy-Schwartz inequality.
Finally, putting together Lemma 3.1 and Proposition 3.1, one concludes the following well-posedness property.
Theorem 3.1. There is one global-in-time strong solution to the system (3.2), with α ∈ (0, 1 2 ), which is unique forwards in time, for any initial configuration. The next result is a simple consequence of the above well-posedness theorem and characterizes the eventual emergence of sticking in a cluster after a potential collision. Assume that two oscillators collide at t * , i.e.,θ i (t * ) =θ j (t * ) = θ * for some i = j. Then, the following two statements are equivalent: (1) θ i and θ j stick together at t * .
with initial data given by A similar technique to that in Theorem 3.1 clearly yields a global-in-time solution to such initial value problem. Hence, the following two trajectories in are both solutions to (3.2) such that at t = t * they take the value (θ * , θ * , θ 3 (t * ), . . . , θ N (t * )).

3.2.
Well-posedness in the critical regime. In the critical case, i.e. α = 1 2 , the vector field H = H(Θ) is no longer continuous and the Peano existence theorem does not work. Nevertheless, in such case H is still a measurable and essentially bounded vector field. Consequently, one can apply Filippov's existence criterion, see [4,14]. We introduce the necessary notation that will be used here on: 2 R N stands for the power set of R N , |N | for the Lebesgue measure of any measurable set N ⊆ R N , co(A) is the convex hull of A and co(A) = co(A) is its closure. For every convex set C we denote by m(C) the element of minimal norm of C, i.e. m(C) = π C (0), where π C is the orthogonal projection operator over the convex set C. The main ingredient will be the Fillipov set-valued map of a given single-valued measurable map. (1) F(x) is a closed and convex set for every x ∈ R N .
(4) If F takes non-empty values, then F is has closed graph. (5) If F has closed graph and m(F)(U x ) lies in a compact set for some neighborhood U x of each x ∈ R N , then F is upper semicontinuous. (6) If F is locally essentially bounded, then F is upper semicontinuous, it takes nonempty values and m(F)(U x ) lies in a compact set for some neighborhood U x of each x ∈ R N . (7) If F is essentially bounded, then F is upper semicontinuous, it takes non-empty values and m(F)(R N ) lies in a compact set.
Here m(F) stands for the map m(F)(x) := m(F(x)) for every x ∈ R N . Lemma 3.3. Let F : R N −→ 2 R N be any set valued-map with non-empty closed and convex values. Assume that F is upper semicontinuous and consider the following initial value problem (IVP) associated with any given initial datum x 0 ∈ R N : (1) If m(F)(U x ) lies in a compact set for some neighborhood U x of any x ∈ R N , then (IVP) has an absolutely continuous local-in-time solution.
(2) If m(F)(R N ) lies in a compact set, then (IVP) has an absolutely continuous globalin-time solution.
Putting together Lemmas 3.2 and 3.4 we arrive at the next result.
Lemma 3.4. Let F : R N −→ R N be any measurable map and consider its Filippov setvalued map F. Consider the following initial value problem (IVP) associated with any given initial datum x 0 ∈ R N : ẋ ∈ F(x), (1) If F is locally essentially bounded, then (IVP) has an absolutely continuous localin-time solution.
(2) If, in addition, F is globally essentially bounded, then such a solution is indeed global.
The solutions to such differential inclusion are called solutions in Filippov's sense to the original discontinuous dynamical system. To deal with uniqueness we first introduce the next technical result. Lemma 3.5. Let F : R N −→ R N be a measurable and locally essentially bounded map and consider its associated Filippov set-valued map F : R N −→ 2 R N . If F verifies the one-sided Lipschitz-condition a.e., then F also verifies it in the set-valued sense. Namely, there exists a positive constant M such that for every x, y ∈ R N and every X ∈ F(x), Y ∈ F(y).
Proof. Consider any couple x, y ∈ R N and fix X ∈ F(x), Y ∈ F(y). Also fix any δ > 0 (assume δ < 1 without loss of generality) and any negligible set N . Using the definition of H, the following properties hold true Then, one can take a couple of sequences {X n } n∈N ⊆ R N and {Y n } n∈N ⊆ R N such that X n → X, Y n → Y and for every n ∈ N. Therefore, the Caratheodory theorem from convex analysis allows restating X n and Y n as a convex combination Note that By defining the constants M x := ess sup |F (z)| and M y := ess sup Since the above property holds for arbitrary n ∈ N and 0 < δ < 1, we obtain Lemma 3.6. Let F : R N −→ R N be a measurable and essentially bounded vector field and consider the Filippov set-valued map F : R N −→ 2 R N . In addition, assume that F verifies the local one-sided Lipschitz condition. Then, the following initial value problem (IVP) associated with any initial configuration x 0 ∈ R N enjoys one global-in-time absolutely continuous solution, that is unique forwards in time Proof. The existence of global-in-time Filippov's solutions follows from Lemma 3.4. Let us just discuss the uniqueness of solution. We consider two Filippov solutions x 1 = x 1 (t) and x 2 = x 2 (t) with the same initial datum x 0 and define Our main goal is to prove that T = +∞ by contradiction. We assume that T < +∞. Let us define x * := x 1 (T ) = x 2 (T ) and take a small enough neighborhood V of x * so that F verifies the one-sided Lipschitz condition in it. By continuity there is some ε > 0 so that By the one-sided Lipschitz condition, there exists some constant M depending on x * such that d dt for every t ∈ [T, T + ε]. By Gronwall's inequality, one then obtains x 1 (t) = x 2 (t), for every t ∈ [T, T + ε], and this contradicts the assumption on T < +∞.
Let us now explicitly compute the Filippov set-valued map H = H(Θ) of our particular vector field H = H(Θ) for the critical case α = 1 2 . Recall Subsection 2.3 about the collision equivalence relation and the necessary notation to deal with clusters of oscillators.
Since the proof is clear by definition of the Filippov set-valued map, we omit it here.
Remark 3.2. Notice that for every (ω 1 , . . . , ω N ) ∈ H(Θ) the next property holds true Hence, the Filippov solutions in the critical case still preserve the average frequency like classical solutions do, for the subcritical case or the original Kuramoto model.
In order to gain some intuition about those sets, let us exhibit some particular examples: is the polytope consisting of the points (ω 1 , ω 2 , ω 3 ) ∈ R 3 such that for some y 12 , y 13 , y 23 ∈ [−1, 1]. The second and third cases yield line segments and the last one is a regular hexagon as it can be depicted in Figure 3.
Finally, let us apply Lemma 3.6 to construct the unique Filippov solutions of our particular system (3.2) in the critical case α = 1 2 . The way to go is similar to that in the preceding Subsection 3.1 and relies on a good decomposition of −h. Define the couple of function f = f (θ) and g = g(θ) in (−π, π) as follows  Figure 3b, N = 3 and the polytope is a regular hexagon with vertices and Ω 1 , For simplicity, the natural frequencies are set to zero and K = 1 in the figures.
is a continuous function because of the jump discontinuities at θ ∈ 2πZ, one can locally decompose it around such values in terms of a decreasing function f (θ) and a Lipschitz-continuous function g(θ).
Finally, for every Θ * = (θ * 1 , . . . , θ * N ) ∈ R N we locally decompose H around it as follows where the above functions are defined almost everywhere (note that f does not make sense at 0, thus F i just makes sense a.e.). Again, we recall thatθ is its representative modulo 2π in the interval (−π, π], for any θ ∈ R.
Proof. The proof is analogous to Proposition 3.1.
Finally, putting Lemma 3.4-3.6 and Proposition 3.3 together, one concludes the following well-posedness result. for any initial configuration, that is unique forwards in time.
Again, we can characterize the eventual emergence of sticking of a cluster after a potential collision in a similar way as we did in Theorem 3.2. We introduce the following notation.
For any N ∈ N, each 1 ≤ m ≤ N and every permutation σ of {1, . . . , N } we define the following couple of m × m matrices: i.e., M σ m (Ω) stands for the matrix of relative natural frequencies of the only m oscillators with indices i = σ 1 , . . . , σ m and J m is a m × m matrix of which the components are one.
Assume that t * is some collision time and fix any cluster E k (t * ) ≡ E k with k = 1, . . . , κ(t * ). Then, the following two statements are equivalent: (1) The n k (t * ) = #E k (t * ) oscillators in such cluster stick all together at time t * .

Consequently, the following two trajectories in
are Filippov solutions to (3.2) such that they take the same value at t = t * , namely, By uniqueness they agree and, in particular, for all t ≥ t * and every i = 1, . . . , n.
The sticking condition (3.11) can be characterized in a much more explicit manner by convex analysis techniques supported by Farkas' alternative. See Appendix C and, in particular, the characterization of condition (3.11) in Lemma C.2. Such ideas can be arranged in the next result. (1) The n k oscillators in the cluster E k stick all together at time t * .
(2) We have for every 1 ≤ m ≤ n k and every I ⊆ E k such that #I = m.
Remark 3.4. Notice that in Theorem 3.4 and Corollary 3.1 we have characterized when the whole cluster E k remains stuck together, but not when a subcluster of a given size instantaneously splits from the remaining oscillators of the cluster. The main problem to extend the above proof is that it is hard to quantify the way in which an oscillator splits from the subcluster. Specifically, it is possible that an oscillator departs from the cluster exhibiting a left accumulation of switches of state where it instantaneously splits and collides with the formed subcluster. Although this accumulating phenomenon will cause some problems throughout the paper, we will show how can we overcome them. Let us mention that such phenomenon is called left Zeno behavior in the literature. It appears in the Filippov solutions of some systems like the reversed bouncing ball. For instance, in [14, p. 116] Filippov proposed a discontinuous first order system with solutions exhibiting Zeno behavior. In [14, Theorem 2.10.4], the same author considered absence of Zeno behavior as part of the sufficient conditions (but not necessary) guaranteeing forwards uniqueness. We skip the analysis of Zeno behavior here and will address it in a future work.

3.3.
Well-posedness in the supercritical regime. Recall that in the supercritical regime, i.e., α > 1 2 , the vector field H = H(Θ) is not only discontinuous at the collision states but it is also unbounded near those points, see Figure 1. Thus, the classical theory for well-posedness cannot be applied either and one might seek for a notion of generalized solutions in the same sense as in the critical case α = 1 2 (see Subsection 3.2). Hence, one strategy could be to turn again the differential equation of interest into an augmented differential inclusion given by the associated Filippov set-valued map. A similar analysis to that in Proposition 3.2 yields the following characterization of the Filippov set-valued map for the supercritical regime.
Proposition 3.4. In the supercritical regime α > 1 2 , the Filippov set-valued map H = H(Θ) associated with H = H(Θ) stands for the convex and unbounded polytope consisting of the points (ω 1 , . . . , ω N ) ∈ R N such that The Filippov set-valued map enjoys similar expressions in the critical and supercritical regimes except for a "slight" change. In the former case, the coefficients y ij range in the interval [−1, 1] whereas in the latter case they take values in the whole R. Indeed, the same examples for α = 1 2 in Example 3.1 can be considered for α > 1 2 . For instance, similar polytopes to those in Figure 3 are obtained at the total collision phase configurations when the corresponding polygon is replaced by its affine envelope. Those similarities ensure that any Filippov solution to (3.2) with α > 1 2 also conserve the average frequency as in Remark 3.2. What is more, since H(Θ) is apparently non-empty, then Lemma 3.2 shows that H takes values in the non-empty, closed and convex sets and it has closed graph in the set-valued sense. However, the unboundedness in y ij entails a severe change of behavior. Specifically, it breaks the local compactness of m(H) and, as a consequence, the existence result in Lemma 3.3 fails to work. Such loss of compactness is fateful and implies that the supercritical regime α > 1 2 lies in the setting where all the "classical" assumptions ensuring global existence and one-sided uniqueness does not hold. The literature about the abstract analysis of unbounded differential inclusions is rare, see [25,45]. In addition, all those results require some sort of relaxed set-valued Lipschitz condition and linear growth that do not hold in our particular problem. Nevertheless, we will show that in some cases we can still construct a Filippov solution which is unique under some conditions. Remark 3.5. Notice that, despite the lack of uniqueness results in the supercritical case, the approach in Theorem 3.4 may still be used to obtain a partial answer. Namely, it might give a sufficient condition on the natural frequencies to ensure that after a collision of a classical solution, we can continue a Filippov solution with sticking of the formed cluster. Since we will elaborate on this idea later, we will skip it here and will just focus on the study of a necessary condition of sticking like in (3.11). Indeed, consider some Filippov solution Θ = (θ 1 , . . . , θ N ) to (3.2) with α > 1 2 and assume that it is defined in an interval [0, T ) and that t * ∈ (0, T ) is some collision time. Then, we might fix a cluster E k (t * ) ≡ E k and assume that the n k (t * ) ≡ n k oscillators in such cluster stick all together at time t * . Hence, a similar proof to that of Theorem 3.4 would entail the existence of some bijection σ : {1, . . . , n k } −→ E k and some Y ∈ Skew n k (R) such that One might want to obtain again a more explicit characterization of such condition. We can resort on similar ideas coming from Farkas' alternative, see Lemma C.1 in Appendix C. Such Lemma ensures that (3.13) is perfectly equivalent to the condition (C.2) for every i, j, k = 1, . . . , n k , where m ij denotes the (i, j) component of the matrix M σ m (Ω). Let us look into the particular structure of M σ n k (Ω) to restate the above condition (see (3.10)) Then, the necessary sticking condition is automatically satisfied for every given configuration of natural frequencies. This suggests that, independently on the chosen natural frequencies, any classical solution in the supercritical case that stops at a collision state might always be continued as Filippov solution with sticking of the cluster. For this, we will need some accurate control of the behavior of such classical solutions at the maximal time of existence.
The solution does not blow up at t * , i.e., The solution converges towards a collision state, i.e., there exists Θ * ∈ C such that In addition, the trajectory t → Θ(t) remains absolutely continuous up to the collision time Proof. We split the proof into three parts. The first part is devoted to show that the classical trajectories verify the following fundamental inequalities: for every t ∈ [0, t * ). Here, V int (Θ) is the second term of the potential V (Θ) in (2.10) and we set the constant We will show in the second step that such inequalities (3.14) and (3.15) infer the next ones for every t ∈ [0, t * ). Finally, the third part will focus on proving the assertions in the statement of the Lemma via such fundamental inequalities (3.14)-(3.18).
• Step 1: Recall that in Section 2, the classical solution t −→ Θ(t) of (3.2) equivalently solves a gradient flow system (2.9), i.e., for every t ∈ [0, t * ). Taking integrals in time, we obtain for every t ∈ [0, t * ). Recall that the function W in (2.11) involved in the potential (2.10) is a primitive function of h. Then, W ≥ 0 as a consequence of the antisymmetry of h and our choice W (0) = 0 and, in particular, V int ≥ 0. This, together with the Cauchy-Schwarz inequality, yield for every t ∈ [0, t * ). Using Young's inequality in the first term of (3.20), we arrive at the first fundamental inequality (3.14). The second inequality (3.15) is standard, but let us sketch it for the sake of clarity for every t ∈ [0, t * ) and integrating with respect to time yields (3.15).
• Step 2: First, taking limits t → t * in (3.14), we clearly obtain (3.16). Also, the finite length of the trajectory (3.17) holds true by virtue of the Cauchy-Schwarz inequality and Young's inequality both applied to the preceding one. Finally, inequalities (3.15) and (3.17) entail (3.18).
• Step 3: The classical trajectory t → Θ(t) is defined up to a finite maximal time t * . Hence, classical results show that either it blows up at t = t * or there exists some sequence {t n } n∈N t * and some Θ * ∈ C such that {Θ(t n )} n∈N → Θ * . Since the former option is prevented by (3.18), then the latter must hold true. Let us prove that the whole trajectory converges towards that collision state Θ * . In other case, there exists another sequence {s n } n∈N t * and some ε 0 > 0 such that for all n ∈ N. Without loss of generality we can assume that the sequences {t n } n∈N and {s n } n∈N are ordered as follows and that Thus, the trajectory would have infinite length and that contradicts (3.17). Hence, we find Such conclusion shows that, as expected, it is plausible to continue classical solutions by Filippov solutions (hence absolutely continuous) after a possible collision. The explicit method of continuation is exhibited in the following result.
and, according to Lemma 3.7, let us consider the collision state Θ * ∈ C such that lim t→t * Θ(t) = Θ * . Then, there exists some ε > 0 so that the classical trajectory t → Θ(t) can be continued by a Filippov solution to (3.2) in a short interval [t * , t * + ε) in such a way that oscillators belonging to the same cluster of the collision state Θ * remain all stuck together.
Proof. Let E k be the k-th cluster of oscillators with n k = #E k for k = 1, · · · , κ. We consider a bijection σ k : {1, . . . , n k } −→ E k , for every k = 1, . . . , κ. Since the necessary condition (3.13) is automatically satisfied as discussed in Remark 3.5, then there exists some matrix Y k ∈ Skew n k (R) such that for every couple of indices i, j ∈ {1, . . . , n k }. Let us define the following system of κ differential equations for k = 1, . . . , κ, with initial data given by Since the initial datum is a non-collision state in a lower dimension space R κ of phase configurations, then there exists a unique classical solution to such problem that is defined in a maximal existence interval [t * , t * * ) and such that if t * * < ∞, then (ϑ 1 , . . . , ϑ κ ) converges towards a new collision state by virtue of Lemma 3.7 (merge of clusters). The same result ensures that for every i ∈ E k and k = 1, . . . , κ. Both trajectories glue in a W 1,2 way and it is clear, by virtue of the definition of H k in (3.24) and Ω k in (3.23) along with the explicit expression of the Filippov map in Remark 3.6. It is clear that the above procedure can be repeated as many times as needed after each collision time of the classical solutions to the reduced systems (3.24)-(3.25). Indeed, by Remark 3.5 the necessary condition (3.13) is automatically satisfied. Since there can only be N − 1 collision of oscillators with sticking, we may apply Theorem 3.5 finitely many times to obtain global-in-time Filippov solutions to (3.2) in the supercritical case. However, one may wonder whether this global-in-time continuation procedure is unique or oscillators may also be allowed to split instantaneously after a collision. Although answering the general question for any number N of oscillators and any collision state is really convoluted, let us give some particular answer for the case N = 2: Consider the relative phase θ := θ 2 − θ 1 and relative natural frequency Ω := Ω 2 − Ω 1 . Then, the associated dynamics of a classical solution is governed by the next equatioṅ Here,θ stands for the unique (unstable) equilibrium of the system, see Proposition 5.2 in the subsequent Section 5. Without loss of generality, we will fix the initial relative phase so that θ(0) ∈ (0,θ) (the other cases are similar). Then, we arrive at a collision of oscillators at t = t * i.e., lim t→t * θ(t) = 0.
(1) Let us assume by contradiction that there was another Filippov solution in [t * , t * * ) consisting of two particles that instantaneously split again after t = t * . Such split can arise in only two different manners: (a) (Sharp split) There exists some small ε > 0 such that θ(t) = 0, for every t ∈ (t * , t * + ε). In such case, either θ(t) > 0, for all t ∈ (t * , t * + ε), or θ(t) < 0, for all t ∈ (t * , t * + ε). (b) (Zeno split) There exist a couple of sequences {t n } n∈N t * and {s n } n∈N t * such that θ(s n ) = 0 but θ(t n ) = 0, for every n ∈ N (recall the left accumulations of switches or Zeno behavior in Remark 3.4). Replacing t * by a suitable time, it is apparent that the second type of split at t * guarantees the first one at a (possibly) latter time. Let us then focus just on the fist case. Looking at the profile of Ω − kh(θ) in Figure 5, we then would arrive at the following conclusion: eitherθ(t) < 0 and θ(t) > 0 for all t ∈ (t * , t * + ε) orθ(t) > 0 and θ(t) < 0 for all t ∈ (t * , t * + ε). In any case, we obtain a contradiction.
(2) Hence, the only choice for the oscillators after the collision state is to stick together.
Let us define the phase of the reduced system, see (3.23) where Y ∈ Skew 2 (R) is any matrix verifying the necessary condition (3.13). Indeed, there just exists one such matrix Y , whose items read y 12 = −y 21 = Ω 2 −Ω 1 2 . Then, Ω = Ω 1 +Ω 2 2 and the reduced system (3.24) looks likė Consequently, the only Filippov solution to (3.2) evolves through (3.26)-(3.27) up to the collision time t * . After it, both oscillators stick together and they move with constant frequency equals to the average natural frequency. For general N , notice that it is not clear whether (b) in the above first item can be reduced to (a). Namely, we cannot guarantee that along a whole time interval (t * , t * + ε) all the formed subclusters splitting from the given cluster remain at positive distance. The main reason is the possible Zeno behavior, that accumulates time events with switches of the collisional type.

Rigorous limit towards singular weights
In the previous section, we studied the existence and one-sided uniqueness of absolutely continuous solutions to the singular weighted first order Kuramoto model in all the subcritical, critical and supercritical cases. Because of the continuity of the kernel for α ∈ 0, 1 2 , we can show that in that case the solutions are indeed C 1 , although we cannot say the same neither for the critical case α = 1 2 nor for the supercritical case α ∈ 1 2 , 1 . Also, these results does not necessarily provide any extra regularity of the frequencies ω i =θ i for an augmented second order model to make sense.
Let us recall that in Subsection 2.2, the singular Kuramoto model was formally obtained as singular limit ε → 0 of the scaled regular model (2.5)-(2.6). Notice that if apart form heuristically, we rigorously proved the limit ε → 0, then we would be led to an alternative existence result for the singular models. In this section, we will inspect to what extend such idea works and how many exponents we can obtain with such technique. We will recover the existence results in Section 3. Indeed, this technique will yield a gain of piecewise W 1,1 regularity of the frequencies ω i in the subcritical case and will provide an equation for them in weak sense that will be discussed and related with similar models in Subsection 4.4. However, such idea fails for the more singular cases, where the compactness of frequencies is very weak. While the singular limit for the subcritical case is straightforward, we need to derive new ideas to deal with the limiting set-valued Filippov map in the critical and supercritical cases along with the loss of strong compactness of the frequencies in such cases.
Proof. All the properties directly follow from the first one along with the Ascoli-Arzelà theorem. Recall that there is some constant M > 0 such that for every θ, θ 1 , θ 2 ∈ R and every ε > 0. Then, the first property is also a straightforward consequence of such uniform-in-ε boundedness and Hölder-continuity of the kernel.
The following result holds true as a clear consequence of the uniform equicontinuity of the sequence h ε along with the compactness of the sequence {Θ ε } ε>0 . In what follows, we will see that such procedure provides us with extra a priori estimates for the "acceleration" (derivatives of frequencies). Also, such procedure will allow us to derive a "piecewise weak equation" for them. This is the rest of the content of this subsection.
Remark that a necessary and sufficient condition for two oscillators θ i and θ j that collide at some time to stick together is that Ω i = Ω j by virtue of Theorem 3.2. In some sense, those two oscillators are identified in an unique cluster with a bigger "mass". Then, we can quantify the times of "pure collisions" as follows. Starting with T 0 = 0, we define for every k ∈ N. Recall the notation in Subsection 2.3, and see [38] for related notation in the discrete Cucker-Smale model with singular influence function. Then, taking derivatives in (2.5)-(2.6) we can obtain the next spliẗ The idea is to show that we can pass to the limit in the above expressions in L 1 ([T k−1 , τ ])-weak, for every k ∈ N and for every τ ∈ (T k−1 , T k ). This is the content of the next theorem. Before going on, let us discuss the possible scenarios for the sequence {T k } k∈N and how can we cover the whole interval [0, +∞) with them in any case so that our dynamics can be reduced to each of them: (1) It might happen that there exists some k 0 ∈ N such that T k 0 +1 = +∞ (then, T k = +∞ for every k > k 0 ). This is the case either all particles have stuck together in finite time or after some finite time there is no more collision. In this case and at each interval there is no collision. (2) Also it might happen that the sequence {T k } k∈N is infinite and unbounded, i.e., T k +∞. Hence, and there is no collision in each interval. (3) Finally, it might also be the "odd" case that the sequence {T k } k∈N is infinite but bounded. In such case, there exists some T ∞ ∈ R + with right Zeno behavior, i.e. T k T ∞ . Then, a straightforward argument involving the mean value theorem shows that T ∞ is a sticking point. Then we can split the dynamics up to time T ∞ through Taking T ∞ as our initial time, we can repeat each of the steps 1, 2 and 3 above so that we can globally recover the whole dynamics. Notice that since there just can be N − 1 times of sticking, then there just can be N − 1 times like T ∞ .
For simplicity in our arguments, we will assume that we lie in the case 2, although the same results apply to any of the other cases. Before going to the heart of the result, let us summarize some good properties of the kernel h ε . Lemma 4.2. Consider any value α ∈ 0, 1 2 . Then, the following properties hold true: (1) Formula for the derivative: (2) Upper bound by an L 1 (T)-function: (4) Weighted Hölder-continuity: for every couple of exponents β, γ ∈ (0, 1) such that γ = 2α + β.
Proof. The first two results are straightforward and the third one is a clear consequence of the dominated convergence theorem. The fourth property follows from an obvious application of the mean value theorem and the fifth one is a standard property of mildly singular kernels (one can show that M = α/β) .
• Step 1: In the first case, fix any i ∈ {1, . . . , N } and j / ∈ C i (T k−1 ). There exists (by definition) some positive constant δ 0 = δ 0 (k, τ ) < π such that Then, by the uniform convergence in Lemma 4.1 there exists some ε 0 > 0 such that for every ε ∈ (0, ε 0 ). Consequently, by crossing terms we have for every t ∈ [T k−1 , τ ]. Hence, both two terms converge to zero uniformly in [T k−1 , τ ], as ε → 0. This is due to (4.3), the third property in Lemma 4.2, the uniform continuity of h in compact sets away from 2πZ and the uniform convergence of the phases in Lemma 4.1. This ends the proof of the first part.
• Step 2 : In the second case, i ∈ {1, . . . , N } and j ∈ C i (T k−1 ) \ S i (T k−1 ). Then, Thus, it is clear that we again have |θ j (t) − θ i (t)| o > 0, for t ∈ [τ * , τ ] and for every τ * ∈ (T k−1 , τ ). This amounts to saying that the preceding argument again holds in [τ * , τ ] and consequently, , for every τ * ∈ (T k−1 , τ ). Then, we just need to prove the weak convergence in some interval [T k−1 , τ * ]. Let us set τ * . Sinceθ j (T k−1 ) =θ i (T k−1 ), we can assume without loss of generality that δ 0 :=θ j (T k−1 ) −θ i (T k−1 ) > 0. By continuity ofθ j andθ i , there exists some small τ * ∈ (T k−1 , τ ) such that Then, by the uniform convergence of the frequencies (see Lemma 4.1), we can take a small enough ε 0 > 0 such that if ε ∈ (0, ε 0 ) then In particular, we have well defined inverses of θ j − θ i and θ ε j − θ ε i in [T k−1 , τ * ], for every ε ∈ (0, ε 0 ). Indeed, the inverse function theorem states that: and a similar statement holds for θ ε j − θ ε i . In order to show the weak convergence in L 1 ([T k−1 , τ * ]), we equivalently claim that the following assertions are true: i.e., there exists some constant M > 0 such that (2) Convergence of the mean values over finite intervals, i.e., for every τ * * ∈ (T k−1 , τ * ). Let us then prove such claim. Regarding the first assertion, we just focus on h ε (θ ε j − θ ε i ) (the other case is similar). Due to a simple change of variables θ = (θ ε j − θ ε i )(t) and (4.5)-(4.6) Then the assertion under consideration follows from the second item in Lemma 4.2. Regarding the second assertion we split into two terms where, The same change of variables as above allows us restate I ε in the following way .
Then, estimate (4.5) along with the strong L 1 (T) convergence of the kernels in (3) of Lemma 4.2 shows that I ε vanishes when ε → 0: For the term II ε , we use the forth item in Lemma 4.2 to show Then, a new change of variables along with the equations (4.5)-(4.6) and the local integrability in one dimension of an inverse power of order γ entail the existence of a non-negative constant C that does not depend on ε such that Then, the second step follows from the uniform convergence of the phases in Lemma 4.1.
• Step 3: In the third case, consider i ∈ {1, . . . , N } and j ∈ S i (T k−1 ). By the uniqueness in Theorem 3.1, we can ensure that θ j (t) = θ i (t) for all t ≥ T k−1 . Then, the uniform convergence of the kernels h ε along with the uniform convergence of the phases in Lemma 4.1 shows that  2 ), that we constructed in Theorem 3.1, satisfy that θ i ∈ C 1,1−2α ([0, ∞), R N ) and the frequenciesθ i exhibit higher regularity. Indeed, they are piecewise W 1,1 in the sense thatθ i ∈ W 1,1 ([T k−1 , τ ]), for every k ∈ N and every τ ∈ (T k−1 , T k ). In addition, they verify the following equation in weak sense . Throughout the proof of the above result we have just used the local integrability in one dimension of any inverse power of order smaller than 1. However, one might have tried to use that such inverse powers actually belong to L p loc in order to show that in Steps 2 the convergence take place in L p ([T k−1 , τ ])-weak for any 1 ≤ p < 1 2α . In this way, the gain of regularity is in reality higher, namelyθ i ∈ W 1,p ([T k−1 , τ ]), for every 1 ≤ p < 1 2α .
In the following, we will discuss the corresponding singular limit in the critical and supercritical case. Since the Filippov set-valued map is relatively simpler in that latter case, we will start with that supercritical case. Later, we will adapt the ideas therein to show a parallel result in the critical regime.

4.2.
Limit in the supercritical case. Using a similar vector notation to that in (2.3) for the singular weighted model, our regularized system (2.5)-(2.6) can be restated as where the components of the vector field H ε read for every Θ ∈ R N and every i ∈ {1, . . . , N }. Then, one can mimic the ideas in Section 2 to show that the regularized system can also be written as a gradient flow where the regularized potential now reads for every Θ ∈ R N . Again, W ε is the anti-derivative of h ε such that W ε (0) = 0, i.e., Also, it is clear that W ε ≥ 0 in the supercritical case, for every ε > 0. Then, the following result holds true.

Lemma 4.3.
In the supercritical case α ∈ ( 1 2 , 1), consider the unique global-in-time classical solution Θ ε to the regularized system (4.8). Then, for every t > 0 and every ε > 0, where The above result shows that {Θ ε } ε>0 is bounded in H 1 ((0, T ), R N ), for every T > 0. Then, there exists some subsequence that we denote in the same way so that {Θ ε } ε>0 weakly converge to some Θ ∈ H 1 loc ((0, ∞), R N ) in H 1 ((0, T ), R N ) for every T > 0. The Sobolev embedding and the definition of weak convergence ensure that for every T > 0. Before we obtain the desired convergence result of (4.8) towards a Filippov solution, let us introduce the following split of the frequencies: where, componentwise, each term reads as follows Then, it is clear by definition that for every T > 0, and y ε (t) ∈ H(Θ(t)), for every t ≥ 0. As a consequence, we infer that Θ ε becomes a Filippov approximate solution in the following sense: (4.11)Θ ε (t) ∈ H(Θ(t)) + x ε (t).

Remark 4.2.
Recall that H(Θ(t)) is a closed set, for every t ≥ 0, see Proposition 3.2. Consequently, in order to prove that the limiting Θ(t) yields a Filippov solution, it would be enough to show the almost everywhere convergence of the sequence {Θ ε } ε>0 towardsΘ. Unfortunately, it is well known that weak convergence in L 2 is not enough for that purpose. Hence, we must deal only with such weak convergence.
Before going to the heart of the matter, we need to exhibit another characterization of the Filippov set-valued map in terms of implicit equations. The next technical lemma will be used for that. For the sake of clarity, a proof has been provided in Lemma B.1 of Appendix B.
Lemma 4.4. Consider any n ∈ N and any vector x ∈ R n . Then, the following assertions are equivalent: (1) There exists some Y ∈ Skew n (R) such that (2) The following implicit equation holds true where j stands for the vector of ones.
Hence, we are ready to obtain the above-mentioned characterization.
Proof. By Proposition 3.4, H(Θ) consists of the set of points (ω 1 , . . . , ω N ) ∈ R N such that for every k = 1, . . . , κ there exist a skew symmetric matrix Y k ∈ Skew n k (R) and a bijection σ k : {1, . . . , n k } −→ E k such that the following equations hold true for every i = 1, . . . , n k . Then, the result follows by applying Lemma 4.4 to each of the above sets of n k equations to the particular vectors x k ∈ R n k (Θ) with components: when we equivalently restate it using the notation in Subsection 2.3.

Remark 4.3.
Recall that in the subcritical case α ∈ (0, 1 2 ) in Subsection 4.1, any strong limit Θ already yielded a solution to the limiting system (3.2). Indeed, there just can be one and only one such strong limit by the one-sided uniqueness of the limiting system (3.2) Also, in that subcritical case one can find a nice split of the dynamics in a sequence of intervals where no collision happens. Thus, on every such interval, the kind of collisional state of our trajectory remains unchanged. Let us remember that the reason why that sequence fills the whole half line in the subcritical case relies on the following facts: first, by uniqueness we can characterize the sticking of oscillators and once they stick during some time they remain stuck for all times. In particular, only N − 1 sticking times can exist. Second, when an accumulation of collisions takes place, it has to be at a sticking time. Hence, there just can be N − 1 such accumulations of collisions, thus recovering the whole half line.
Unfortunately, at this step we are missing for the critical and supercritical cases α ∈ 1 2 , 1 whether any limit Θ becomes a Filippov solution to the limiting system (3.2). Thus, despite the fact that we have clear characterizations of sticking of such solutions we cannot apply them to any such limit Θ. In addition, the behavior of any H 1 weak limit can be very wild. Specifically, a possible scenario of a H 1 loc trajectory is that sticking might happen just for a short period of time and, after it, the cluster splits. Also, "pure collisions" might accumulate at a non-sticking time exhibiting Zeno behavior (recall Remark 3.4). Thereby, a split of the dynamics into countably many intervals (T k , T k+1 ) like in the above Subsection 4.1, where the collisional state remains unmodified, is not viable.
Since the above Remark prevent us to achieve a split of the dynamics into countably many time intervals that fills the whole half-line and, each of them exhibiting unvaried collisional state, we will develop a new approach supported by the above explicit H-representation of the Filippov set-valued map at any collision state. One of our main tools will be the Kuratowski-Ryll-Nardzewski measurable selection theorem [30] that applies to set-valued Effros-measurable maps. For the sake of completeness we include the statement of such result that we adapt to a finite-dimensional setting.
Lemma 4.5 (Kuratowski-Ryll-Nardzewski). Consider any n, m ∈ N and any set-valued map F : R n −→ 2 R m with values in the non-empty and closed subsets of R m . Assume that F is Effros-measurable, that is, for every open set U ⊆ R m , the following set is measurable Then, F has a measurable selection, i.e., there exists a measurable function F : Sometimes, it is helpful to control how many of these single-valued measurable selections of the Effros-measurable set-valued map do we need in order to essentially have the whole set-valued map "represented" in some sense. This is the content of an intimately related result: the Castaing representation theorem, see [13,Theorem III.30].
Lemma 4.6 (Castaing). Consider any n, m ∈ N and any set-valued map F : R n −→ 2 R m with values in the non-empty and closed subsets of R m . Assume that F is Effros-measurable. Then F has a Castaing representation, i.e., there exists a sequence {F n } n∈N of measurable maps F n : R n −→ R m such that F(x) = {F n (x) : n ∈ N}, a.e. x ∈ R n . Such results will be directly applied to the critical case in the next Subsection 4.3. However, for the supercritical case, we will need a refinement of the above theorem to allow for integrable representations of the set-valued map. The Effros-measurability has to be improved to some integrability condition for set-valued maps. We will focus on the next result.
Lemma 4.7. Consider any n, m ∈ N and any set-valued map F : R n −→ 2 R m with values in the non-empty and closed subsets of R m . Assume that F is Effros-measurable and strongly integrable, that is, the single-valued map |F| is integrable, where |F| is defined by |F|(x) := sup{|y| : y ∈ F(x)}, a.e. x ∈ R n .
Then, every measurable selection of F is integrable. In particular, F enjoys a Castaing representation consisting of integrable selections.
Proof. Let us take any measurable selection F of the set-valued F, that exists by Lemma 4.5. Then, by definition of |F| we obtain Since |F| is integrable, the first part of the result holds true. The second one is a simple consequence of the first one along with Lemma 4.6.
Remark 4.4. Notice that the same ideas as in the above result in Lemma 4.7 also yield similar statements for the spaces L 1 loc (R n ) and L ∞ (R n ). Namely, (1) If F is locally strongly integrable, i.e., |F| ∈ L 1 loc (R n ), then every measurable selection belongs to L 1 loc (R n ). (2) If F is strongly essentially bounded, i.e., |F| ∈ L ∞ (R n ), then each measurable selection belongs to L ∞ (R n ). where each of the P l (t) stands for the hyperplane P l (t) := {x ∈ R N : a l (t) · x = b l (t)}. Here, the above vector and scalar functions a l (t) and b l (t) read as follows for almost every t ≥ 0. However, it is not clear whether B is strongly locally integrable since we expect eventual switches of the collisional type of the limiting Θ(t), thus on its coefficients b l (t).
• Step 3: Strong local integrability of B. Let us show that the above wild behavior still does not prevent us from our goal. Consider the regularized coefficients We can associate a similar set-valued map B ε : for every l = 1, . . . , κ(t) since j / ∈ C i (t) in their definitions and, at those θ j (t) − θ i (t), the limiting kernel h is continuous. Since both B(t) and B ε (t) consist of finitely many terms, we deduce that By definition, it is clear thaṫ where we have cancelled the terms with j ∈ E l (t) in the last step by the antisymmetry of h ε . Then, our set-valued maps are strongly dominated as follows Putting (4.17) into 4.16 we obtain Here, we have used the Cauchy-Schwarz inequality in the second step and the a priori bound in Lemma 4.3 in the last one. Then, Remark 4.4 yields the existence of a Castaing representation {B n } n∈N ⊆ L 1 loc (0, +∞) of the map B. Again, we conclude that (4.18) {b l (t) : l = 1, . . . , κ(t)} = {B n (t) : n ∈ N}, for almost every t ≥ 0.
• Step 4: Conclusion. Since y ε (t) ∈ H(Θ(t)), for every ε > 0 and every t ≥ 0, then the H-representation (4.13) along with the essentially bounded and locally integrable representations (4.14) and (4.18) yield the equations for almost every t ≥ 0. In particular, for every ε > 0, each ϕ ∈ C c (R + ) and any n ∈ N. Notice that the boundedness and local integrability of our selectors allows such expression to make sense. We can now use the weak convergence in L 2 of y ε towardsΘ to obtain for every ϕ ∈ C c (R + ) and each n ∈ N. The fundamental lemma of calculus of variations along with the Castaing representations in (4.14) and (4.18) and the H-representation in (4.13) allow us to conclude the desired result.

4.3.
Limit in the critical case. In this Subsection, we will address the singular limit of the regularized system (2.5)-(2.6) towards a Filippov solution to (3.2) in the critical regime α = 1 2 . We will mostly apply a similar approach to that in the supercritical regime. Nevertheless, there are several novelties to be considered, that make the study slightly different. First, we will show that we actually enjoy a better W 1,∞ a priori estimate, apart from the above H 1 bound in Lemma 4.3. Second, the explicit expression of the Filippov map in Proposition 4.1 in terms of intersection of hyperplanes will be adapted to this case. Lemma 4.8. In the critical regime α = 1 2 , consider the unique global-in-time solution Θ ε to the regularized system (4.8). Then, We omit the proof since it is a clear consequence of the boundedness of h in the critical case. As a consequence of the above Lemma 4.8, we infer the existence of a subsequence of {Θ ε } ε>0 that we denote in the same way so that it weakly-* converges to some Θ ∈ for every T > 0. In addition, the same split as in (4.10) can be considered and we obtain and y ε (t) ∈ H(Θ(t)), for every t ≥ 0 and ε > 0. Hence, Θ ε becomes an approximate solution in the same sense as in (4.11). What is more, the same Remark 4.2 is in order. Then, again we cannot ensure pointwise convergence ofΘ ε . In order to obtain an analogue characterization of the Filippov map, we will need the next technical lemma.
Lemma 4.9. Consider any n ∈ N and any vector x ∈ R n . Then, the following two assertions are equivalent: (1) There exists some Y ∈ Skew n ([−1, 1]) such that (2) We have for every permutation σ of {1, . . . , n} and any k ∈ N.
A complete proof is provided in Appendix B. The following result is a consequence of Lemma 4.9 along with the explicit formula in Proposition 3.2.
Then, we move to the main result, i.e., the convergence of the singular limit towards a Filippov solution to the critical system.
where m = #I. Now, the coefficients are clearly uniformly bounded. Then, a straightforward application of Remark 4.4 leads to the existence of essentially bounded selectors for the coefficients. Namely, we can give an ordering such as for almost every t ≥ 0. Recall that y ε (t) ∈ H(Θ(t)), for every ε > 0 and every t ≥ 0. Then, by virtue of (4.20), (4.21) and (4.22), we equivalently have A n (t) · y ε (t) ≤ B +,n (t) and A n (t) · y ε (t) ≥ B −,n (t), for all n ∈ N, each ε > 0 and almost every t ≥ 0. In particular, for all n ∈ N, each ε > 0 and any non-negative ϕ ∈ C c (R + ). Then, using the weak-* convergence in L ∞ we obtain that

4.4.
Comparison with previous results about singular weighted systems. In the previous parts, we studied the existence and one-sided uniqueness for the singular weighted first order Kuramoto model in all the subcritical, critical and supercritical regimes. We now compare our result with previous research on the singular weighted Cucker-Smale model which is a second order system describing the flocking behavior of interacting particles. In order to set these relations, let us recall Section 2, where the first order Kuramoto model (2.1) was shown to be equivalent to its second order augmentation (2.4). On the one hand, this is clear for regular weights as studied in Theorem 2.1, see [16,22]. What is more, it remains true in our case, which is characterized by singular weights. However, we must be specially careful with the time regularity in order for such heuristic arguments to become true. Let us focus on the subcritical regime, where the rigorous equivalence between (2.1) and (2.4) follows from Remark 3.6 by virtue of the one-sided uniqueness in both models. Indeed, in such subcritical case, the "influence function" of the augmented flocking-type model reads (4.23) h which enjoys mild singularities of order 2α < 1 in the subcritical case. Such singular second order model (2.4)-(4.23) shares some similarities with the Cucker-Smale model with singular weights, where the communication weight ψ is given by (4.25) ψ(r) := 1 r β , for r > 0 and β > 0. Although some results regarding the asymptotic behavior of such system have been established [20], the well-posedness theory has not been addressed until very recently in [37,38] for the microscopic model and [8,33,40,43] for some first and second order kinetic and macroscopic versions of the model. Regarding the microscopic system (4.24)-(4.25), the existence of global C 1 piece-wise weak W 2,1 solutions (x 1 , . . . , x N ) has been established in [37] for β ∈ (0, 1), which corresponds to α ∈ (0, 1 2 ) in our setting (see Theorem 3.1, Theorem 4.1 and Remark 4.1). Also, in the weakly singular regime β ∈ (0, 1 2 ) (i.e., α ∈ (0, 1 4 )), the same author proved in [38] that the velocities (v 1 , . . . , v N ) are indeed absolutely continuous. Consequently, the C 1 weak solutions (x 1 , . . . , x N ) are actually W 2,1 loc in such latter case. This latter property was proved through a differential inequality.
The method of proof is similar to ours in Section 4 and relies on a regularization process of the second order model near the collision times. In our case, we have obtained a similar regularization process of the first order model, entailing the corresponding regularization of the augmented second order model. Indeed, such method has not only proved succeed in our subcritical case, but also in the critical and supercritical case. Also, we have obtained the well-posedness results in an alternative way based on the gain of continuity of the kernel in the first order model along with its particular structure near the points of loss of Lipschitzcontinuity. Indeed, we have succeeded in introducing an analogue well-posedness theory in Filippov sense for the endpoint case α = 1 2 and the supercritical case α > 1 2 . Regarding the more singular cases β ≥ 1 (i.e., α ≥ 1 2 ), one can show that there exists some class of initial data for (4.24)-(4.25) such that one can avoid collisions and the solutions remain smooth for all times. Indeed, such solutions exhibit asymptotic flocking dynamics, see [2]. Very recently, it was shown in [10] that the loss of integrability of the kernel when β ≥ 1 actually ensures the avoidance of collisions for general initial data. In such regime, the asymptotic flocking behavior is not guaranteed for any initial data. However, such ideas for (4.24)-(4.25) fails in our model (2.4)-(4.23) because the kernel h with α ≥ 1 2 does no longer behave like the communication weight ψ with β ≥ 1. Specifically, ψ is always a positive and decreasing function whereas h is negative and increasing (see Figure 6). Then, we do expect our solutions to exhibit finite time collisions as depicted in the results in next Section 5. This is the reason for the generalized theory in Filippov sense to come into play in the critical and supercritical cases.

Synchronization of the singular weighted system
We now analyze the collective behavior in the system (3.2). We first consider the system of two interacting oscillators. We extend the argument to the N-oscillator system in succession. Figure 6. Comparison of the functions h (θ) and ψ(θ) with α = 0.75.

5.1.
Two oscillator case. In this part, we consider the dynamics of two oscillator. The system (3.1) for two oscillator becomes Recall that in the critical and supercritical cases we do expect collisions, see Subsections 3.2 an 3.3. Then, the above representation of the system is only valid before the first collision. After that, the right-hand side has to be replaced with the corresponding Filippov set-valued map. At this step, we shall focus on the dynamics before the first collision. Let us define the relative phase and natural frequency by θ := θ 2 − θ 1 and Ω := Ω 2 − Ω 1 . Then, the system (5.1) can be rewritten into the following form: Proposition 5.1. Let θ : [0, T ) → R be a maximal classical solution to the differential equation (5.2) with α ∈ (0, 1) such that the oscillators are identical, i.e., Ω = 0, and initial datum 0 < |θ 0 | < π. Then, the maximal time of existence T lies in the interval [t min , t max ], where In addition, the following lower and upper estimates hold, for all t ∈ [0, T ) and lim t→T θ(t) = 0. Hence, two identical oscillators confined to the half-circle exhibit finite-time phase synchronization.
Denote y = |θ| 2α+1 , then the equation becomes We now consider upper and lower estimates for (5.3) separately.
In particular, the above lower estimate shows that • Upper estimate: As long as 0 ≤ y < π 2α+1 , the solution y is non-increasing, i.e., d dt y ≤ 0. Since the initial data θ 0 satisfies |θ 0 | < π, we have y 0 < π 2α+1 , thus y(t) ≤ y 0 , for t > 0. Hence, we have the following inequality This is equivalent to Again, the upper estimate shows that Assume that the oscillators are non-identical Ω = Ω 2 − Ω 1 > 0 and the system (5.1) has a phase-locked state (θ 1 ,θ 2 ) satisfying 0 <θ 2 −θ 1 < π. Then, the equation (5.2) has an equilibriumθ =θ 2 −θ 1 ∈ (0, π) such that To guarantee the existence of such equilibrium, we need the following conditions for the coupling strength K: where h := max 0<r<π h(r). Note that the equilibrium exists for the case of α > 1 2 without any condition on the coupling K > 0. We now investigate the stabilities of the equilibria in each cases.
Proposition 5.2. Let θ be a solution of (5.2). We have the following stability results.
Moreover, due to the monotonic increase of θ, we obtain the lower estimate for the frequency: Hence, there exists a finite time t 1 < 2π−θ 0 Ω−Kh(θ 0 ) , for which the solution converges to 2π. • Case 2 (θ 0 <θ): We can apply an analogous argument for this case. Since the function h is decreasing, we deduce h(θ) > h(θ) for θ ∈ (0,θ). Thus, we havė This monotonic decrease of phase yields the upper estimate for the frequency: So, there exists a finite time t 2 < θ 0 |Ω−Kh(θ 0 )| , for which the solution converge to zero.
(2) For the case of α < 1 2 , we consider two steps for the aymptotic convergence to the equilibrium: • Step 1: We first show the solution moves into the interval (0,θ) in finite time when the initial datum θ 0 is located in (−2π +θ * , 0] ∪ [θ,θ * ). As long as the solution θ located in [θ,θ * ), we have h(θ) > h(θ). Thus, the solution is non-increasing: Moreover, the non-increase of solution θ(t) ≤ θ 0 gives an upper bound of frequency: • Step 2: We will show that the solution converges to the stable equilibriumθ asymptotically, when the initial datum is in (0,θ). Suppose the initial data is located in (0,θ). Then, the following inequality holds for the function h. Thus, the solution satisfies the differential inequality By Grönwall's lemma, we obtain Similarly, if the initial datum θ 0 is in (θ,θ), the function h satisfies Then, we have the following differential inequality: Hence, by Grönwall's lemma, we find Remark 5.1. In the subcritical case α ∈ 0, 1 2 , the emergence of phase-locked state for two non-identical oscillators occurs asymptotically (see Proposition 5.2), whereas the phase synchronization for two identical oscillators appears in finite time (see Proposition 5.1). However, in the critical and supercritical cases α ∈ 1 2 , 1 , phase synchronization always appears in finite time as depicted in the above-mentioned Propositions 5.2 and 5.1 as long as the initial phase configuration does not agree with the unstable phase-locked state θ. Namely, in the supercritical case both oscillators stick together into a unique cluster moving at constant frequency Ω = Ω 1 +Ω 2 2 , independently on the chosen natural frequencies. However, in the critical case, the same only happens under the assumption |Ω 1 − Ω 2 | ≤ K. In other case, the formed cluster will instantaneously split.

5.2.
N -oscillator case. In this subsection, we consider the system of N interacting oscillators. We will first focus on the dynamics in the simpler subcritical case α ∈ 0, 1 2 , where solutions have proved to be classical, see Theorem 3.1. The reason to start with this case is that the right hand side of (3.2) can be considered in the single-valued sense for that case. The dynamics in the critical case α = 1 2 and some intuition about the dynamics in the supercritical regime α ∈ ( 1 2 , 1) will be provided at the end of this Subsection. Let Θ = (θ 1 , . . . , θ N ) be the solution to the system (3.2). We first study the phase synchronization for identical oscillators. Fist, let us set the indices M and m to satisfy Theorem 5.1. Let Θ = (θ 1 , · · · , θ N ) be the solution to (3.2) with α ∈ 0, 1 2 for identical oscillators (Ω i = 0), for i = 1, . . . , N . Assume that the initial configuration Θ 0 is confined in a half circle, i.e., 0 < D(Θ 0 ) < π. Then, there is complete phase synchronization at a finite time not larger than T c where Proof. We consider the dynamics of phase diameter: Since h(θ j − θ M ) < 0 and h(θ j − θ m ) > 0 as long as D(Θ) < π, we have d dt D(Θ) ≤ 0 and D(Θ(t)) ≤ D(Θ 0 ) < π, for t > 0.
Due to the contraction of phase, and the fact that θ ∈ (0, π) → h(θ) θ is decreasing, we have Thus, we attain the following differential inequality: By Grönwall's lemma, we obtain Notice that h(θ) behaves like θ 1−2α near the origin. Indeed, it is easy to prove that for every θ * ∈ (0, π) The main idea is to show that the mapping Since the phase diameter D(Θ) is bounded above by D(Θ 0 ) we can take θ * = D(Θ 0 ) and apply the above lower estimate for h to attain the following estimate of the phase diameter for every t ≥ 0. In the last inequality we have used that 1 − 2α ∈ (0, 1) and, consequently, for every couple of nonnegative numbers a, b ∈ R. Then, integrating the above differential inequality implies , for all t ≥ 0. This implies the convergence to zero at a finite time not larger than T c .
Proof. First, we show that the equilibriaθ i 's are mutually distinct, i.e., SinceΘ is an equilibrium, it satisfies for every i = 1, . . . , N . If there existed two oscillators having the same equilibriaθ i =θ j , then we would have which contradicts with Ω i = Ω j . We now show the ordering property. From (5.8), we have where the coefficients c j read They are properly defined because all the equilibria are mutually distinct and they are positive because h is strictly increasing in (−θ,θ). Thus, the order Ω i+1 > Ω i yields the order of equilibriaθ i+1 >θ i .
In the subcritical case, we can attain the uniform boundedness of phase differences under sufficiently large coupling strength.
We set indices F and S so thaṫ for each time t. We define the frequency difference so that We note that By taking time derivative on D(Θ), we obtain As long as D(Θ) < D ∞ , we have Thus, we get We combine (5.9) and (5.10) to obtain Setting y(s) := t 0 D(Θ(s)) ds, we can rewrite (5.11) into y (t) ≤ y (0) − Kh (D ∞ )y(t).
Hence, we have .
Since D(Θ(t * )) = D ∞ and K > which is a contradiction. Thus, we have the desired uniform bound for phase difference D(Θ(t)) < D ∞ , for t ≥ 0.
Remark 5.2. Note that, in the preceding proof, the solution Θ = Θ(t) is C 1 but not necessarily C 2 because of the essential discontinuity of h . Then, one cannot directly argue with two time derivatives in the computation of d dt D(Θ). However, the preceding arguments can be made rigorous because the C 1 solution of (3.2) is a piece-wise W 2,1 solution of the augmented model (2.4)-(4.23) as discussed in Remark 4.1 in the preceding Section 4. Other possible approach is to directly show the Gröwall inequality (5.11) in integral form.
In the following result, we show the collision avoidance when the oscillators are initially well-ordered.
In the sequel, we study the stability of the phase-locked state for the system of nonidentical oscillators. We use the center manifold theorem to investigate the stability of linearized system. (1) There exists a center manifold for (5.14), The flow on the center manifold is governed by the n-dimensional system: (2) Assume the zero solution of (5.15) is stable (respectively asymptotically stable/unstable). Then, the zero solution of (5.14) is stable (respectively asymptotically stable/unstable).
(1) If α ≥ 1 2 , then the phase-locked stateΘ is unstable. (2) If α < 1 2 , then the phase-locked stateΘ is stable. Proof. (1) We first linearize the system (3.2): where the elements of matrix A = [a ij ] are determined by If α ≥ 1 2 , we find a ij < 0, for i = j, and hence a ii > 0, for i = 1, . . . , N . This leads the matrix A is a Laplacian type matrix of which all eigenvalues are non-negative. Since the matrix A represents all-to-all connected network, there exists a zero eigenvalue for which the multiplicity is one and all the other eigenvalues are positive which implies the unstability of the equilibrium.
(2) We now assume α < 1 2 . Since the equilibrium satisfies max i,j |θ i −θ j | <θ andθ i =θ j for i = j, the elements of the matrix have signs so that a ij > 0 for i = j and a ii < 0, for i = 1, . . . , N . By similar argument as above, we can obtain that the eigenvalues of A are non-positive and there is a zero eigenvalue with multiplicity 1. Let λ 1 = 0 and λ 2 , . . . , λ N < 0 be the eigenvalues for matrix A and let v 1 , . . . , v N be the corresponding left eigenvectors such that v i A = λ i v i for i = 1, . . . , N.
We note that v 1 = (1, · · · , 1). We set the matrices P and D so that Then, we can diagonalize the matrix A: We change the variables from Θ = (θ 1 , . . . , θ N ) to X = (x 1 , . . . , x N ) such that Then, we can rewrite the system (5.20) in the following form: Consider the center manifold in Lemma 5.3, that can be written as follows and consider the equation By the Center Manifold Theorem, the stability of (5.22) implies the stability of the system (5.21). Since the equality (5.19) yields x 1 = θ 1 + · · · + θ N and we havė Thus, the right hand sideR 1 ≡ 0 and the dynamics of (5.22) is stable. Therefore, the phase-locked stateΘ is stable for α < 1 2 . Finally, we are ready to show the emergence of phase locked state for non-identical oscillators.
If the coupling strength is sufficiently large such that then we can show the emergence of phase-locked state. Moreover, if each oscillator has distinct natural frequency, i.e., Ω i = Ω j for i = j, then, the synchronization occurs asymptotically.
Proof. By applying Gronwall's lemma on (5.10), we have an exponential decay of upper estimate on the frequency diameter: This exponential decay implies the emergence of phase-locked state. Assume the oscillators have mutually distinct natural frequencies. Since Proposition 4.6 gives the structure of phase-locked state, the oscillators draw in descending order of natural frequencies in finite time. After this time, by Lemma 4.7, we have a positive lower bound ε ∆ > 0 of distance between oscillators. Then, we have By Grönwall's lemma, we have an lower estimate on the frequency diameter: Let us now get some insight into the behavior of the Filippov solutions to (3.2) (see Theorems 3.3 and 3.5) in the most singular cases α = 1 2 and α ∈ ( 1 2 , 1). Looking at Remark 5.1 for the dynamics of 2 oscillators, we expect global synchronization in finite time for N oscillators. Specifically, in the supercritical case, the emerged global cluster is hoped to stay stuck independently on the chosen natural frequencies. In the critical case, the sticking conditions (3.12) are required for the cluster to remain stuck. To start with, let us prove the finite-time global phase synchronization of identical oscillators in the critical and supercritical cases. To that end, we need the following technical results.
Proof. The main idea is to handle the approximate sequence {Θ ε } ε>0 obtained as solutions to the regularized system (4.8) and to take limits ε → 0 in the phase diameter estimates. First, notice that by virtue of the assumed initial condition on the diameter one has that d dt D(Θ ε ) ≤ 0 and D(Θ ε (t)) ≤ D(Θ 0 ) < π, for t > 0.
Indeed, note that we can obtain an explicit decay rate for the diameter by mimicking the ideas in Theorem 5.1. Namely, choosing θ * = D(Θ 0 ) and β = 2α in Lemma 5.4, we notice that c(α, β) = 0. Consequently, the lower bound of the kernel h ε is valid in the whole interval [0, D(Θ 0 )]. Then, Let us integrate the above differential inequality. We need to distinguish the cases α = 1 2 and α ∈ ( 1 2 , 1): for every t ≥ 0. Recall that by virtue of Lemmas 4.3 and 4.8, we obtained Θ ε * Θ in H 1 ((0, T ); R N ). In particular, Θ ε → Θ in C([0, T ], R N ). Then, we can take the limit ε → 0 in the above estimates to attain the desired result.
Under the assumptions in the preceding Lemma 5.5 one obtains exponential decay of the diameter in the critical case and algebraic decay in the supercritical regime. However, a finite-time global synchronization is expected. This is the content of the following result.
Remark 5.3. Notice that Theorem 5.4 also works in the supercritical case. However, the same proof as in Theorem 5.5 is not valid to show finite-time complete phase synchronization of identical oscillators for α ∈ ( 1 2 , 1). The reason is that at this point we cannot guarantee whether the Filippov solutions in Θ obtained as singular limit of the regularized solutions Θ ε to system (4.8) in Theorem 4.3 agrees with the solution obtained in Remark 3.6 via the "sticking after collision" continuation procedure of classical solutions. However, if the limiting Θ obtained in Theorem 4.3 satisfies such "sticking after collision" property, we can mimic Theorem 5.5 to show that it exhibits complete phase synchronization at a finite time not larger than .

Appendix A. Regular interactions
In this Appendix, we study the Kuramoto model with regular coupling weights: where we denote c ≡ c α,ζ = 1 − ζ −1/α for simplicity. Recall that such model comes from the choice (1.4) of Γ as the Hebbian plasticity function in (1.5). Since the right hand side of (A.1) is Lipschitz continuous, then the system (A.1) has a unique solution by Cauchy-Lipschitz theory in this case. For positive σ, we get the following bounds for Γ: Note that ε σ converges to zero as σ → 0. We will study the emergence of synchronization for identical and non-identical oscillators and, we will use the idea of [15] for the proof of synchronization.
A.1. Identical oscillators. Consider the Kuramoto model (A.1) for identical oscillators, which have the same natural frequency. Without loss of generality, we may assume Ω i = 0 for all i = 1, . . . , N . The system (A.1) becomes as follows: We can show the complete phase synchronization asymptotically for (A.2) with a constraint on initial configuration. Let us recall the notation θ M (t) and θ m (t) in (5.6) for the indices of largest and shortest phases and D(Θ) for the phase diameter defined in (5.7).
By applying (A.4) and (A.5) to (A.3), we attain the following differential inequality: Grönwall's lemma yields the desired upper estimate. Similarly, from (A.5) and sin x ≤ x for 0 ≤ x ≤ π, we have which gives the lower estimate.
Indeed, we get We first show the boundedness of phase differences.
Lemma A.1. Assume that D(Θ 0 ) < D ∞ , for some small D ∞ < π 2 , and that the coupling strength is sufficiently large so that .
Proof. Assume that there exists a time for which D(Θ(t)) ≥ D ∞ . Then, due to the continuity is positive and finite and D(Θ(t * )) = D ∞ . We set indices F and S so thaṫ By taking time derivative on D(Θ), we get Then, we get the following couple of upper and lower bounds By applying (A.8) and (A.9) into (A.7), we deduce Here, C := K Γ (θ + ) sin D ∞ + Γ(D ∞ ) cos D ∞ and t ∈ [0, t * ]. Then, we find for all t ∈ [0, t * ]. However, since D(Θ(t * )) = D ∞ , we get , which yields to a contradiction. Thus, D(Θ(t)) < D ∞ , for all t ≥ 0.
We are ready to prove the frequency synchronization for non-identical oscillators.
Theorem A.2. Assume that D(Θ 0 ) < D ∞ , for some small D ∞ < π 2 , and that the coupling strength is sufficiently large so that .
By Gronwall's lemma, we achieve the exponential estimates for the frequency synchronization.
Since the decay rate of the asymptotic frequency synchronization is exponential, then the solution Θ shows the emergence of a phase-locked state.

Appendix B. H-representation of the Filippov set-valued maps
In this appendix, we exhibit the proofs of the technical Lemmas 4.4 and 4.9. Recall that such results were respectively applied in Propositions 4.1 and 4.2 in order to characterize explicitly some H-representation of the Filippov set-valued map in the supercritical and critical cases. We introduce some notation that will be used here on.
Definition B.1. Consider n ∈ N. For every i, j ∈ {1, . . . , n} we define the linear operator By definition, the following relations hold true L ik and L = (L 1 , . . . , L n ).
First, we give the simpler proof of Lemma 4.4: Lemma B.1. Consider any n ∈ N and any vector x ∈ R n . Then, the following assertions are equivalent: (1) There exists some Y ∈ Skew n (R) such that (2) The following implicit equation holds true where j stand for the vector of ones.
Proof. Let us define the following linear operator Then, the thesis of this lemma is equivalent to On the one hand, it is clear that the inclusion ⊆ in (B.1) fulfils by virtue of the properties of the skew symmetric matrices. On the other hand, let us define the matrices for every i = j, where {e i : i = 1, . . . , N } is the standard basis of R n and ⊗ denotes the Kronecker product. Notice that Hence, {L(E i,i+1 ) : i = 1, . . . , n − 1} = {e i − e i+1 : i = 1, . . . , n − 1} consist of n − 1 independent vectors. Consequently, L has rank larger or equal to n − 1. Since j ⊥ has rank equal to n − 1 the full identity in (B.1) holds true. Now, we focus on the proof of Lemma 4.9. Our main tool in this part will be the Farkas alternative from convex analysis that we recall in the subsequent result. (1) There exists v ∈ V such that (2) There exists q ∈ R n with q i ≥ 0 for all i = 1, . . . k such that This result has several equivalent representations in the literature and it is sometimes called the Theorem of Alternatives. One clear reference where we can infer our version from can be found in [44,Lemma 2.54]. We are now ready to give a proof of Lemma 4.9.
Lemma B.3. Consider any n ∈ N and any vector x ∈ R n . Then, the following two assertions are equivalent: (1) There exists some Y ∈ Skew n ([−1, 1]) such that (2) There exists some Y ∈ Skew n (R) such that holds, for every Q ∈ M n (R + 0 ) and λ ∈ R n such that q ij + λ i = q ji + λ j .
Proof. For the shake of simplicity in our arguments, we will split the proof into two parts. In the first part, we establish the equivalence of the first three assertions in the statement. The main tool to be used in such part is the above Lemma B.2. In the second part, we will focus on the more convoluted equivalence between the first group of equivalent assertions in the above-mentioned step and the last assertion.
• Step 1: Equivalence of the first three assertions. On the one hand, the first two assertion are perfectly equivalent by virtue of Definition B.1. Then, our problem is a system of affine inequalities in the vector space Skew n (R) of skew symmetric matrices. Hence, by Farkas alternative (see Lemma B.2) such assertions amounts to saying that whenever q ij , q + i , q − i are non-negative coefficients verifying n i,j=1 Defining λ i = q + i − q − i , we can simplify an equivalent assertion: for every Q ∈ M n (R + 0 ) and λ ∈ R n such that Thus, the equivalence with the third assertion follows by evaluating the identity (B.4) on every matrix in the canonical basis of Skew n (R), i.e., {e i ⊗ e j − e j ⊗ e i : 1 ≤ i < j ≤ n} , and noticing that we obtain the condition q ij + λ i = q ji + λ j in such third assertion.
• Step 2: Equivalence with the last assertion. On the one hand, let us assume that the first assertion is satisfied, i.e., x = Y · j for some Y ∈ Skew n ([−1, 1]). Taking any permutation σ of {1, . . . , n} and any 1 ≤ k ≤ n we obtain Since the first term becomes zero (by anti-symmetry) and the second term consists of n(n − k) terms with values in [−1, 1], then k i=1 x σ i ∈ [−k(n − k), k(n − k)].
Conversely, assume that the last assertion is true and let us prove (B.3) in the third assertion. Consider Q ∈ M n (R + 0 ) and λ ∈ R n such that (B.5) q ij − q ji = λ j − λ i .
Without loss of generality we will assume that q ii = 0, for every i = 1, . . . , n (notice that in other case, (B.3) is even larger), and let us split λ i x i =: I 1 + I 2 .
On the one hand, let us rewrite I 2 and notice that for every j = 1, . . . , n. Since the sum of all the x i becomes zero by hypothesis, taking averages with respect to all the indices j = 1, . . . , n we obtain that Finally, changing the indices i with j and taking the average of both expressions we can equivalently write Thus, substituting (B.5) into I 2 and putting it together with I 1 we can rewrite Let us consider a permutation σ of {1, . . . , n} so that we can order the coefficients λ i in increasing way, i.e., (B.7) λ σ 1 ≤ λ σ 2 ≤ · · · ≤ λ σn . Then, It is clear that I 4 is non-negative. Hence, we will focus on showing that so is I 3 too. By virtue of (B.5), it is easy to show that for every i < j. Thereby, where in the last step we have used (B.5) again and the coefficients a k read a k := i≤k j≥k+1 Bearing in mind that the sum of all the x i vanishes by hypothesis, then Then, a k ≥ 0 by hypothesis. Since we have chosen σ so that (B.7) takes place, then the result follows from the above expression (B.8) for I 3 .
Appendix C. Characterizing the sticking conditions Our purpose in this appendix is to characterize explicit conditions for the weights specifying the necessary and sufficient conditions for sticking of particles (3.12) and (3.13) in the Subsections (4.3) and (4.1) respectively. The first part is devoted to the latter condition for the supercritical case and the second part will focus on the former critical case.
Apart form the linear operators in Definition B.1 we will need the following ones. Then, the next result yields a characterization for the sticking condition (3.13) to hold.
Lemma C.1. Consider any n ∈ N and any matrix M ∈ Skew n (R). Then, the following assertions are equivalent: (1) There exists some Y ∈ Skew n (R) such that M = Y · J + J · Y.
(2) There exits some Y ∈ Skew n (R) such that T ij (Y ) ≤ m ij and − T ij (Y ) ≤ −m ij . holds, for every 1 ≤ i < j < k ≤ n.
Proof. First, it is clear that the first two assertions are equivalent. Second, let us briefly show that (C.1) and (C.2) are equivalent. On the one hand, it is clear that (C.1) is a particular case of (C.2). On the other hand, let us assume that (C.1) fulfills. Then, we have in particular the next three equations for 1 ≤ i < j < k ≤ n m 1i + m ij + m j1 = 0, m 1j + m jk + m k1 = 0, m 1k + m ki + m i1 = 0.
Taking the sum of such equations we obtain (C.2) by virtue of the skew-symmetry of M . Hence, let us just concentrate on proving the equivalence between the second and third assertions. By Lemma B.2, the second assertion amounts to saying that whenever Λ ∈ M n (R) verifies Hence, if we define p ij = λ ij − λ ji we can conclude that the second assertion of this Lemma is completely equivalent to the fact that whenever P ∈ Skew n (R) verifies in Lemma 4.4 shows that those matrices P ∈ Skew n (R) fulfilling (C.3) agree with the matrices that lie in the kernel of the operator L = (L 1 , . . . , L n ). Recall that by virtue of such result, L has rank equal to n − 1. Since Skew n (R) is a vector space with dimension d 1 := n(n − 1)/2, then we know that d 2 := dim(ker L) = n(n − 1) 2 − (n − 1) = (n − 1)(n − 2) 2 .
Hence, the following subset P := {P ij : 2 ≤ i < j ≤ n} ⊆ ker L, consists of (n−1)(n−2)/2 different elements, which we can be classified via the lexicographic order of multi-indices (i, j). Let us show that all of them are linearly independent, thus generating the whole kernel. We first consider the basis of skew-symmetric matrices and, again, we can list them ordered with respect to the lexicographic order. Let us consider the matrix M ∈ M d 2 ×d 1 (R) of coordinates of the elements in P with respect to the basis B. Then, by the definition (C.5) one infers that the d 2 × d 2 identity matrix appears as the submatrix of M consisting of all the d 2 rows but just the last d 2 columns. Hence, rank M = d 2 and, consequently, ker L = span(P).
• Step 2: Here, we characterize the condition (C.4), that clearly amounts to show that n i,j=1 p ij m ij = 0, for every P ∈ P. Taking P = P ij for 2 ≤ i < j ≤ n we get n i,j=1 and this concludes the full proof of our result.
Finally, we focus on the sticking condition (3.11) in the critical case. The next result exhibits an explicit characterization that follows similar techniques to those in Lemma B.3.
(3) The following inequality n i,j=1 q ij + 1 2 n i,j=1 p ij m ij ≥ 0, holds, for any i, j = 1, . . . , n, and for every P ∈ Skew n (R) and Q ∈ M n (R + 0 ) such that n k=1 (p ik − p jk ) + q ij − q ji = 0. Proof. The assertions 1 and 2 are apparently equivalent due to the definition of the involved linear operators. Also, both properties 2 and 3 are equivalent by virtue of an application of Lemma B.2 that is analogue to that in the proof of Lemma B.3; hence, we skip the proof for simplicity. Thereby, we will only focus on the equivalence with the former assertion. First, let us assume that for some Y ∈ Skew n ([−1, 1]) the first assertion holds true, i.e., Since it is n times the sum of m(n − m) numbers in [−1, 1], then the condition (C.6) is also satisfied. Conversely, let us assume that both (C.2) and (C.6) fulfill and take any P ∈ Skew n (R) and Q ∈ M n (R + 0 ) such that (C.7) n k=1 (p ik − p jk ) + q ij − q ji = 0, for any couple of indices i, j = 1, . . . , n. Without loss of generality we can assume that q ii = 0, for every i = 1, . . . , n. Also, let us define the coefficients λ i := n k=1 p ik and consider a permutation σ of {1, . . . , n} so that λ σ i are ordered in a non-decreasing way, i.e., (C. 8) λ σ 1 ≤ λ σ 2 ≤ · · · ≤ λ σn .
Using (C.2) in the second term we can write for any k = 1, . . . , n. Let us take the average with respect to k in the above expression for any j = 1 . . . , n, where (C.7) has been used in the last step. Taking the average with respect to j we get to (m σ i σ k − m σ j σ k ) (q σ j σ i − q σ i σ j ). (C.9) On the other hand (C.10) I 2 = j>i q σ i σ j + i<j (q σ i σ j − q σ j σ i ).
Finally, notice that for every i < j, the condition (C.7) entails Here, (C.2) has been used again in the last identity. Since a m are all non-negative by (C.6) and λ σ i are ordered by (C.8), we can conclude that I ≥ 0 and this ends the proof.