SciPost Submission Page
Proliferation of non-linear excitations in the piecewise-linear perceptron
by Antonio Sclocchi, Pierfrancesco Urbani
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Antonio Sclocchi |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2010.10253v2 (pdf) |
Date accepted: | 2020-12-29 |
Date submitted: | 2020-12-18 01:50 |
Submitted by: | Sclocchi, Antonio |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Theoretical, Computational |
Abstract
We investigate the properties of local minima of the energy landscape of a continuous non-convex optimization problem, the spherical perceptron with piecewise linear cost function and show that they are critical, marginally stable and displaying a set of pseudogaps, singularities and non-linear excitations whose properties appear to be in the same universality class of jammed packings of hard spheres. The piecewise linear perceptron problem appears as an evolution of the purely linear perceptron optimization problem that has been recently investigated in [1]. Its cost function contains two non-analytic points where the derivative has a jump. Correspondingly, in the non-convex/glassy phase, these two points give rise to four pseudogaps in the force distribution and this induces four power laws in the gap distribution as well. In addition one can define an extended notion of isostaticity and show that local minima appear again to be isostatic in this phase. We believe that our results generalize naturally to more complex cases with a proliferation of non-linear excitations as the number of non-analytic points in the cost function is increased.
Author comments upon resubmission
Dear Editor,
we would like to resubmit our manuscript entitled: "Proliferation of non-linear excitations in the piecewise-linear perceptron" for publication in Scipost Physics.
We would like to warmly thank the reviewers for the critical assessment of the manuscript. We found all the comments and requested changes extremely useful and we feel that the manuscript has really improved after the revision. We would like to express our gratitude to both of them. In the following we will respond to all the points raised by the reviewers.
REPORT 1.
First of all we would like to warmly thank the reviewer for the det ailed and complete critical assessment of our work. All the comments and requested changes have allowed us to improve the quality of the manuscript as well as the clarity of our presentation. We are very grateful for that.
In the following we will respond to the points raised by the reviewer.
Requested changes:
1) In the introduction we have written "In particular it has been shown that in the jammed phase, when the potential energy is non-convex with respect to the degrees of freedom, both systems \emph{self-organize} into marginally stable, critical, configurations at finite energy density." and added the footnote 1 to explain the concept of isostaticity.
2) We have indicated all parameters in the figures.
3) We thank the referee for pointing this out. At jamming excitations are opening of contacts. In other words when a system of jammed spheres is perturbed (for example strained as was done in Combe and Roux, PRL 85, 3628 (2000)) the dynamics is intermittent and avalanches happen whenever a force crosses zero. This corresponds to having a contact becoming a positive gap and creating a floppy mode. For linear spheres above jamming one can have additional excitations. In particular, in addition to contact becoming positive gaps, one could have contacts becoming overlaps. This corresponds to a force that becomes larger then one). Again, this creates a floppy mode which triggers an avalanche. For the purely linear perceptron case this has been discussed in a recent preprint S. Franz, A. Sclocchi, P. Urbani, arxiv:arXiv:2010.02158. In the present case, with the piecewise linear potential, one can open both contacts in $h=0$ or contacts in $h=-H_0$. These contacts can become either overlaps of all sorts or positive gaps Therefore one can trigger avalanches in many more ways with respect to jamming or the purely linear case. We have included a discussion in the introduction to clarify this point that was previously treated only in the concluding section.
4) We thank the referee, we mentioned that we use the fact that the sign of $\mu$ is directly linked to the convexity of the landscape when discussing isostaticity.
5) We included the number of samples used to produce the statistics.
6) We thank the referee. In the end we found useful to discuss a bit more in depth the smoothed potential since we performed simulations also with the L-BFGS algorithm that runs on that. With this addition in Section 3 (see Eq. (6)) we can now discuss more in detail the form of the Hessian. We hope that this makes the presentation clearer. We also added a reference to our (with S. Franz) previous work on spheres as suggested by the reviewer.
7) For what concerns the exponents in Eq.(11) that appear always the same on all sides of the singularities we note that we have the same as in the purely linear case. In that case, the singularity at $h=0$ has two power laws that are the same on both sides. Here we find the same phenomenology with exactly the same exponents in $h=-H_0$. For what concerns the prefactors: we believe that this numerical symmetry is purely accidental. Indeed we decided to look at the statistics of forces and gaps at a value of the pressure that is such that the number of contacts in $h=0$ is approximately equal to the ones that are at $h=-H_0$. This is because we wanted to have a point in which there is enough statistics to see the behavior of gaps and forces at both corners of the interaction potential. This point is therefore rather symmetric and if one looks at Figure 4 and Figure 6 we can clearly see that. However it is clear that when moving around the phase diagram the density of gaps smaller than $-H_0$ changes and, for example, this decreases to zero when going towards jamming. Therefore we do not expect any universal symmetry for the prefactors. We included in Fig.5 the cumulative distributions of gaps at pressure $p=2$ where the accidental numerical symmetry is lost and the prefactors appear to be different.
8) We thank the reviewer for asking to add the relevant references, we included them in the revisited version of the manuscript.
9) We thank the referee for point out that what we wrote was not clear. The referee is right. One can consider non analyticities that give rise either to discontinuities in the forces or to singularities, for example forces diverging at a point. As an example one could consider models of the following type v(h)=|h|^{1/2}\theta(-h) as well as $v(h)=|\ln|h|| \theta(-h)$. We mentioned that in the conclusions.
10) we apologize for the typos and thank the referee for pointing them out.
11) We did not compute in detail the phase diagram of the model which needs a replica treatment but we expect that it is different from the purely linear case. A simple yet rather clear evidence for this is the following. The topology trivialization transition, for $alpha=4$ happens when $\mu$ changes sign and in the piecewise linear case this happens at $\sigma -\sigma_J\simeq 1.2$. The jamming transition is at $\sigma_J\simeq -0.4$ and therefore the topology trivialization transition is at $\sigma_{dAT} \simeq 0.8$. In the purely linear case the topology trivialization at $alpha=4$ is at $\sigma\simeq 0.6$. Note that the jamming point for the algorithm we are using does not change between the purely linear and the piecewise linear model.
REPORT 2
We thank the reviewer for the critical assessment of our manuscript. The requested changes as well the comments have been very useful to improve the clarity of the manuscript and we are very grateful for that.
Below we give a point by point list of answers to the requested changes.
1) We clarify what is the connection between the power laws and pseudogaps observed in the statistics of local minima and the "proliferation of non-linear excitations, avalanches and crackling noise" we discussed in the text. The main reason why the pseudogaps and power laws in the force and gap distribution are important is that these properties control the response of the system to external perturbations. This has been verified at jamming for hard spheres (see the review by Muller and Wyart, Ann. Cond. Mat. Phys. 2015) as well as in the purely linear perceptron case (see S. Franz, A. Sclocchi, P. Urbani, arxiv:arXiv:2010.02158). Imagine to perform a compression of the system. Isostatic systems stay stable up to the point the force balance condition cannot be verified anymore. This happens when forces exit their stability intervals. In our case this happens when forces exit their stability interval (0,1) and (1,2). This instability is not controlled by the Hessian in local minima (this is why we addressed these excitations as "non-linear", since they are not described by harmonic linear response). When one triggers an excitation, the system looses isostaticity and undergoes an avalanche in which there is a rearrangement of contacts. These avalanches have been shown to be large and power law distributed in the linear perceptron case. We did not investigate them in this work even if we believe that they will show up again for the piecewise linear case because of the same mechanisms described in S. Franz, A. Sclocchi, P. Urbani, arxiv:arXiv:2010.02158.
2) We thank the reviewer for the important remark. Indeed the manuscript was unclear on this point. We improved the presentation in the following way. First we clarify the algorithms we used. We first perform a sort of gradient descent minimization (we use the BFGS library to perform the minimization, the BFGS routine is a particularly efficient version of the conjugate gradient method) on the smoothed cost function. We observe that the minima that we find are always isostatic meaning that the sum of the number of contacts in $h=0$ and in $h=-H_0$ is exactly $N$. The fact that the fluctuations of the total number of contacts are essentially zero is in line with what is observed at jamming (see D. Hexner, P. Urbani, F. Zamponi, PRL 2019) as well as in dense, jammed linear spheres, see S. Franz, A. Sclocchi, P. Urbani, SciPost Phys. 9, 012 (2020). Once we established that minima are isostatic, we give the theoretical argument to show that isostaticity is required for having minima that are marginally stable. Finally we use the compression algorithm developed in S. Franz, A. Sclocchi, P. Urbani, arxiv:arXiv:2010.02158 to produce the plots of the number of contacts and the behavior of the Lagrange multiplier $\mu$ as a function of the distance from jamming. Note that this algorithm finds minima that are isostatic when $\mu<0$ and whose average number of contacts in $h=0$ and $h=-H_0$ are the same of the ones found by the BFGS algorithm.
3) About the weakness related to the lack of the replica treatment of the model: we agree with the referee that having the full replica treatment of the model is important. However we feel that this is not a simple step to do even if we would expect that we can follow the same route proposed in S. Franz, A. Sclocchi, and P. Urbani, PRL 123, 115702 (2019). We first should construct the phase diagram, and then build up the machinery of full-replica-symmetry-breaking (fRSB). Then we should find the fRSB scaling solution along the lines of S. Franz, A. Sclocchi, and P. Urbani, PRL 123, 115702 (2019). The program is clear but not straigthforward and we wanted to focus on the numerical findings. We leave the replica treatment for future work. One of the other reasons why we did not push the replica approach is related to the criticism the referee raises. Indeed our algorithms may find minima that are not close to the ground state (strictly speaking it is unclear where the minima are and a systematic exploration of the landscape is needed in order to make this point clear). However we agree that it may be that the minima we are sampling are far, high-energy, excited states. As we discussed in the discussion section of the manuscript, the replica treatment could be twisted to study excited states since one could show that a family of message passing algorithms is tracked by the replica equations and therefore should land on minima whose properties are the one predicted through replicas, see Ref. [20-22]. However the algorithms we use are not of this type and the referee is right in underlining that the RSB construction would not be enough to reply to the question on why such algorithms seem to find minima with properties described by replicas. We do not have much to say about that, apart that all seems to be as the scaling theory (that encodes the universal features of gaps and forces) that emerges from replicas should describe the landscape that is accessible by local algorithms and therefore should have a strong universal sense. We do not have any proof of this conjecture for the moment. In S. Franz, A. Sclocchi, P. Urbani, arxiv:arXiv:2010.02158 it has been proposed a tentative route to attack this problem.
Published as SciPost Phys. 10, 013 (2021)