Proliferation of non-linear excitations in the piecewise-linear perceptron

We investigate the properties of local minima of the energy landscape of a continuous non-convex optimization problem, the spherical perceptron with piecewise linear cost function and show that they are critical, marginally stable and displaying a set of pseudogaps, singularities and non-linear excitations whose properties appear to be in the same universality class of jammed packings of hard spheres. The piecewise linear perceptron problem appears as an evolution of the purely linear perceptron optimization problem that has been recently investigated in [1]. Its cost function contains two non-analytic points where the derivative has a jump. Correspondingly, in the non-convex/glassy phase, these two points give rise to four pseudogaps in the force distribution and this induces four power laws in the gap distribution as well. In addition one can define an extended notion of isostaticity and show that local minima appear again to be isostatic in this phase. We believe that our results generalize naturally to more complex cases with a proliferation of non-linear excitations as the number of non-analytic points in the cost function is increased.

We investigate the properties of local minima of the energy landscape of a continuous non-convex optimization problem, the spherical perceptron with piecewise linear cost function and show that they are critical, marginally stable and displaying a set of pseudogaps, singularities and non-linear excitations whose properties appear to be in the same universality class of jammed packings of hard spheres. The piecewise linear perceptron problem appears as an evolution of the purely linear perceptron optimization problem that has been recently investigated in [1]. Its cost function contains two non-analytic points where the derivative has a jump. Correspondingly, in the non-convex/glassy phase, these two points give rise to four pseudogaps in the force distribution and this induces four power laws in the gap distribution as well. In addition one can define an extended notion of isostaticity and show that local minima appear again to be isostatic in this phase. We believe that our results generalize naturally to more complex cases with a proliferation of non-linear excitations as the number of non-analytic points in the cost function is increased.
Marginal stability of hard sphere packings at jamming has been the subject of an intensive line of studies in the last twenty years [2,3]. This stream of works has culminated in the exact solution of the statistical mechanics of dense glassy hard spheres in infinite spatial dimensions [4]. This has allowed a detailed description of the critical behavior observed at the jamming transition point. In particular, the critical pseudogaps in the distribution of contact forces between spheres as well as the divergence of the gap distribution for small gaps have been completely characterized in infinite dimension. Remarkably, the mean field predictions have been shown, within numerical precision, to hold down to two dimensional hard sphere packings, see [5] for a review, something that has pushed towards a statement about the upper critical dimension for the jamming transition to be two [6][7][8]. Furthermore, these predictions have been shown to agree with the real space scaling argument description of marginal stability of jammed packings [3].
The critical behavior observed at jamming was believed to be peculiar of the transition point. Instead, very recently it has been shown that there is nothing special about jamming. In [1,9] it was performed a systematic investigation of the properties of soft spheres interacting with a purely linear repulsive potential as well as a mean field version of the same optimization problem, namely the spherical perceptron with linear cost function. In particular it has been shown that in the non-convex jammed phase, where both models display finite energy density local minima, both systems self-organize into marginally stable, critical, configurations. The corresponding properties appear to be remarkably close to the ones of amorphous jammed packing of hard spheres implying that the criticality emerging at the jamming transition is not so special after all. In particular, local minima are described by a set of non-linear excitations. As a difference with respect to jamming, such excitations are richer in nature, yet the critical exponents controlling their density appear to be the same (within numerical precision) to the ones of hard spheres. It follows that jamming criticality is in-herently linked to the non-analyticity of the interaction potential. In the jammed phase, this becomes evident since, despite the fact that the energy is positive, packings sit on minima in which there is an isostatic number of spheres that just touch (contacts). Therefore jamming criticality, meaning the type of marginal stability found at the jamming transition, survives in the whole jammed glassy phase.
In this work we explore what happens if the interaction potential has several linear ramps with different slopes separated by non-analytic points. We show that if we consider a piecewise generalization of the linear potential studied in [1,9], we obtain again that the jammed phase of the corresponding optimization problem is made of marginally stable minima whose properties are again very close to hard spheres at jamming. Remarkably, we get that isostaticity still holds but we need to extend its notion to include the fact that gaps can sit in different non-analytic points of the interaction potential. Furthermore we show that for each non-analytic point of the cost function, two pseudogaps emerge whose critical exponents appear to be the same as the ones controlling the jamming transition. This implies a proliferation of nonlinear excitations that can trigger plastic events when the system is perturbed in some way [10]. Our results reinforce the fact that jamming criticality does not pertain only to the jamming point but it is rather related to two concomitant ingredients: the singular nature of the cost function and the non-convex nature of the problem.

I. THE MODEL
We consider the spherical perceptron optimization problem with a piecewise linear cost function. The model is a mean field model for the corresponding optimization problem for spheres interacting with piecewise linear cost function in finite dimension. Despite the fact that the study we perform here can be extended verbatim to piecewise linear spheres, we leave this for future work. However, given the results of Ref. [9], we expect that the conclusions we will draw from the analysis of the perceptron problem will apply also to finite dimensional spheres.
The perceptron optimization problem [11,12] is defined by an N dimensional vector x which lives on the N -dimensional sphere |x| 2 = N . In addition, one extracts M = αN N -dimensional random vectors ξ µ with µ = 1, . . . , M . Every component of all these random vectors is a Gaussian random variable with zero mean and unit variance. Given the set of random vectors, also called patterns, and the state vector x, one can define a set of gap variables defined as h µ = ξ µ · x/ √ N − σ, being σ and α control parameters of order one. The optimization problem is defined in terms of such gap variables. One constructs the cost function and asks to find the value of x that minimizes it. In this work we consider the piecewise linear cost function defined as where H 0 is a positive constant of order one that is taken to be fixed. In Fig. 1 we sketch the form of the corresponding potential. The model admits a satisfiable phase that happens when, given α, one chooses a sufficiently small σ. In this case one can find a configuration of x such that h µ > 0 for all µ = 1, . . . , M . Conversely, as soon as one increases σ, fixing α, one finds a point (that may be algorithm-dependent) beyond which finding configurations where all gaps are positive becomes algorithmically impossible. This corresponds to the jamming transition of the model. It is clear that the properties of the configurations at jamming are not expected to depend on the properties of the cost function one uses in correspondence of negative gaps, since up to jamming such gaps are not present. For this reason we are not interested in studying jamming which has been fully analyzed in [12,13]. Instead we want to look at the system beyond the jamming point. In this case local minimization algorithms such as gradient descent get stuck in local or global minima, depending on the convexity of the problem. We want to characterize the properties of such minima. We note that the spherical perceptron problem with purely linear potential studied in [1] can be obtained from Eq. (2) by taking the limit H 0 → ∞.
In order to characterize local minima of the energy landscape we look to the distribution of gap variables. In the purely linear perceptron case of [1] it was found that the jammed non-convex/glassy phase contains minima where the distribution of gap variables contains a Dirac delta peak at h = 0. The weight of the peak is equal to N which is the number of degrees of freedom in the problem. This implies that local minima have an isostatic number of gaps that are strictly equal to zero. This is the version of isostaticity that emerges when the cost function is purely linear. The presence of this isostatic peak is accompanied by an isostatic set of contact forces that can be thought as Lagrange multipliers needed to enforce that the corresponding gaps vanish.
In the present case we will show that we get a similar phenomenology. It is clear that in the glassy jammed phase the piecewise linear cost function induces the appearance of two Dirac delta peaks in the distribution of gaps centered in h = 0 and h = −H 0 . Correspondingly one will have two sets of contact forces. The main questions we are interested in are: is the system going to be isostatic? What is the version of isostaticity that applies to this case? What are the properties of the contact forces? And what is the behavior of the distribution of gap variables in the jammed glassy phase of the model?
For what follows it will be convenient to introduce some notation to saparate the different types of gaps and contacts. We define the gaps that are less than −H 0 by

II. NUMERICAL SIMULATIONS
In order to understand the properties of local minima of the model, we performed numerical simulations. We used the algorithm presented in [10] adapted now to the broken linear potential case.
In oder to take into account the gaps that may end up being either exactly in zero or in −H 0 , we define the where we have added the contact forces f µ that take into account the gaps that eventually fall in h = 0 or in h = −H 0 . In addition we have introduced a Lagrange multiplier µ that is needed to enforce the spherical constraint on the vector x. The last term is added to change the control parameter from σ to the pressure p. Given the Lagrangian L, a local minimum satisfies the variational equations with respect to both x as well as the contact forces and σ (which is no more a control parameter in the problem, and it is fixed essentially by the pressure). The constitutive equations for local minima are It is clear from the two slopes of the linear parts of the interaction potential that a physical solution to the variational equations (4) requires that If a solution has contact forces that are outside the corresponding stability intervals, such solutions identify an unstable configuration. We observe that, as it happens for the purely linear case [10], the Lagrangian L is effectively linear in all variables except for the term proportional to µ. Therefore the convexity of the problem is self-generated and is fixed by the sign of µ. If µ < 0 we are in the non-convex phase with multiple minima and a glassy landscape while if µ > 0 we are in a convex phase with just one minimum.
We choose to work at fixed α and to explore the jammed phase by incresing the pressure p. We note that the value we have chosen for α corresponds to the situation in which jamming happens in a non-convex marginally stable situation and therefore it is in the same universality class as hard spheres [12]. Conversely if we choose α ≤ 2, jamming appears to be in a convex regime and is not critical anymore. Since we are interested in the properties of the non-convex/glassy phase, we fix α = 4.
In Fig.2 we plot the behavior of the Lagrange multiplier µ as a function of σ − σ J being σ J the jamming point. It is clear that as soon as we enter the jamming phase, the landscape is strictly non-convex being the Lagrange multiplier negative. When compressing the system further,  Figure 2. The Lagrange multiplier µ as a function from the distance to jamming. The jump observed at σ = σj is due to finite size effects. Indeed in a finite system the configuration at jamming is stable for a finite amount of pressure before being destabilized and entering in the jammed phase [10]. The Lagrange multiplier is negative in the non-convex phase while it is positive when the landscape becomes convex. The figure has been produced simulating the model at α = 4 and N = 256. The errorbars represent sample to sample fluctuations.
it undergoes a topology trivialization transition towards a convex phase where the landscape is made by just one unique minimum and the Lagrange multiplier µ becomes positive. We expect that this transition can also be found by analyzing the problem with the replica method and corresponds to the point where replica symmetry breaking appears (coming from the convex/non-glassy phase). This behavior mirrors the one found for the purely linear case [1,10]. We now focus on the properties of the local minima in the glassy phase. We first measure the cardinality of the sets C 0 and C H0 . This is plotted in Fig. 3. At the beginning of the compression protocol the system contains N gaps in zero and therefore the system is isostatic with a number |C 0 | = c 0 N = N . As soon as we enter the jammed phase, contacts in −H 0 start to appear. Remarkably we find that if we define |C H0 |/N = c H0 we have that c 0 + c H0 = 1 (6) which implies that the system is isostatic only globally. The number of gaps in h = 0 or in h = −H 0 fluctuates but the total sum is equal to the degrees of freedom in the problem. Remarkably the sample to sample fluctuations of c 0 and c H0 seem to be normal yet completely anticorrelated in order to have Eq. (6) satisfied even at finite N . The system self-organizes in such a way that only the sum of the number of gaps in zero and −H 0 is isostatic.
To understand this fact we use the following argument. Let us imagine that we smooth out the non-analytic corners in the cost function of Eq. (2) by two small quadratic interpolation parts. Let us denote by the amplitude of the interpolated region. As for the purely linear case [1], the smoothing removes the degeneracy of the contacts and allows for a real space description of the contact forces that appear as the gap contained within the smoothed regions. Since now the cost function admits an harmonic expansion, we can define the corresponding (rescaled) Hessian. Remarkably it takes contribution from both the contacts in h = 0 as well as the ones in h = −H 0 and it is given by which is, neglecting correlations, a Wishart random matrix shifted on the diagonal [14]. In the glassy phase where µ < 0 we need to have that the Wishart content of the Hessian matrix should be full-rank in order to have stable minima. Therefore we have that If marginal stability holds, the bound is saturated and we get isostaticity [3]. This argument tells that the number of contacts in h = 0 and h = −H 0 can fluctuate but in a correlated way in order to enforce Eq. (8). Now we turn to the analysis of the force and gap distribution. We define the empirical distribution of gap variables as In Fig.4 we plot the histogram of ρ(h) at p = 4 which corresponds to the point where c 0 ∼ c H0 . It is clear from this qualitative picture that the two Dirac delta functions in h = 0, −H 0 are surrounded by four power law divergences. In order to characterize those divergences, in Fig.5 we plot the cumulative distribution function of the gaps, starting from h = 0 and h = −H 0 . Within our numerical precision we clearly see that where the exponent γ 0.41 . . . coincide (within our numerical precision) with the one characterizing the distribution of positive gaps at the jamming transition point [15,16] and the As are constants. Finally we look at the contact forces. In Fig. 6 we plot the empirical distribution of contact forces both in h = 0 and in h = −H 0 . We clearly see that there are four pseudogaps appearing close to the edges of the support of f µ .
In order to quantitatively analyze the behavior close to the four edges of the stability supports, we look at the cumulative distribution functions that we plot in Fig.7. We clearly see within our numerical precision that around the edges of the stability supports the force distribution has four pseudogaps where the Bs are constants of order one and the exponent  θ = 0.42 . . . is close to the one controlling the small forces at the jamming transition point [15,16].

III. DISCUSSION
All in all, our numerical results suggest that again the glassy phase is isostatic and marginally stable. At variance with the purely linear potential and the jamming transition, here we have four pseudogaps characterizing CDF forces Figure 7. The cumulative distribution functions for the contact forces close to the edges of their stability supports. We see that the apparent prefactors look very similar and the dots are rather one onto the other, because we measured the forces at the rather symmetric point where the number of gaps in zero and in −H0 is roughly the same. We do not expect such prefactors to be universal but to depend on the point of the phase diagram where local minima are probed. The plot has been produced looking at minima at p = 4 for α = 4 and N = 2048 Errorbars are obtained from sample to sample fluctuations.
contact forces and four power law divergences in the gap distribution. This implies a proliferation of non-linear excitations due to the fact that any perturbation can open and close all sorts of contacts. We believe that perturbing local minima of the system, as it happens for the purely linear potential case [10], will lead to system spanning avalanches and crackling noise. This is a manifestation of the emergent self-organized criticality of the non-convex/glassy phase. It is clear that our results may be generalized by adding more piecewise linear terms to the cost function. For each point where the potential has a kink, the corresponding gap distribution will get a Dirac delta peak. The argument about the stability of local minima suggests that in the non-convex phase, isostaticity is required to ensure marginal stability of local minima and that the isostatic condition involves a global sum-rule of the number of gaps that end up in one of the kinks of the cost function. This global topological constraint will enforce critical pseudogaps on both forces and gaps for each kink giving rise to a proliferation of non-linear excitations.
We expect, based on our experience with linear spheres [9], that the same results will hold for spheres interacting with piecewise linear potential down to two dimensions. In this case, one has the additional possibility to have localized non-linear excitations whose density decreases when increasing the packing fraction. Moreover we expect that dense piecewise linear spheres, at finite temperature, will display strong Gardner phenomenology [17] upon cooling. Finally it would be interesting to see what happens for deeper models beyond the perceptron architecture [18].
Beyond the isostaticity argument, an analytical understanding of the critical exponents arising in each kink of the interaction potential can go with the replica treatment of the model. Following [1],we expect that as soon as the model has a ground state which has a continuous RSB solution at least close to the leaves of the ultrametric tree of pure states [19], a generalization of the scaling theory developed in [1] for the fullRSB equations should give rise to the exponents of the jamming transition, in agreement with the current numerical simulations. Despite the fact that however this replica approach holds strictly speaking for the ground state of the problem, as it happens for other problems, notably the Sherrington-Kirkpatrick model [19], it gives a prediction for the critical exponents arising in local minima that are obtained with greedy gradient descent algorithms as the ones we are using. While the derivation of such exponents from a purely dynamical perspective is an open problem, message passing algorithms are expected to be tracked by such replica scaling theory [20,21], see also [22], and therefore to show the criticality we found here. This theoretical analysis points to the fact that the critical behavior is inherited from the non-analyticities of the cost function. Finally it would be interesting to understand what happens if one considers cost functions that have different types of non-analyticities and whether this could give rise to different critical behaviors.