SciPost Submission Page
Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks
by Brandon L. Annesi, Enrico M. Malatesta, Francesco Zamponi
Submission summary
Authors (as registered SciPost users): | Enrico Malatesta |
Submission information | |
---|---|
Preprint Link: | scipost_202410_00050v1 (pdf) |
Date submitted: | 2024-10-21 16:26 |
Submitted by: | Malatesta, Enrico |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Theoretical, Computational |
Abstract
We analyze the problem of storing random pattern-label associations using two classes of continuous non-convex weights models, namely the perceptron with negative margin and an infinite-width two-layer neural network with non-overlapping receptive fields and generic activation function. Using a full-RSB ansatz we compute the exact value of the SAT/UNSAT transition. Furthermore, in the case of the negative perceptron we show that the overlap distribution of typical states displays an overlap gap (a disconnected support) in certain regions of the phase diagram defined by the value of the margin and the density of patterns to be stored. This implies that some recent theorems that ensure convergence of Approximate Message Passing (AMP) based algorithms to capacity are not applicable. Finally, we show that Gradient Descent is not able to reach the maximal capacity, irrespectively of the presence of an overlap gap for typical states. This finding, similarly to what occurs in binary weight models, suggests that gradient-based algorithms are biased towards highly atypical states, whose inaccessibility determines the algorithmic threshold.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Report
See attached report .
Recommendation
Ask for major revision
Strengths
1- This work provides a full-RSB treatment of the storage capacity problem for the tree committee machine, improving over the previous literature who had studied only RS and 1RSB upper bounds.
Weaknesses
1- The presentation could be improved, both by extending some discussion and fixing the typos.
2- The paper lacks a discussion on the consequences of their result and a clear "take home message".
Report
Please refer to the report attached for details.
Requested changes
1- Fix all typos listed in the report.
2- Add more details on the CLT argument (eventually also in the appendix)
Recommendation
Ask for minor revision
Strengths
1 - The analysis of the full-RSB phase diagram is a significant result and addresses a technically challenging problem.
2 - The connection with optimization is very interesting and links different research communities.
3 - The paper is clearly written and provides a pedagogical presentation of the results.
Weaknesses
1 - The discussion on optimization algorithms, especially on Gradient Descent, is somewhat limited and could be expanded to give more insights.
Report
The authors investigate two continuous non-convex constraint satisfaction problems: the negative-margin perceptron and the infinite-width tree-committee machine. They perform a static analysis in the full Replica Symmetry Breaking (fRSB) regime, computing the SAT-UNSAT transition line for these models.
For the negative perceptron, they identify a Gardner transition, where the typical solutions develop an "overlap gap." This feature implies that Approximate Message Passing (AMP) algorithms aren't guaranteed to converge in this regime.
Additionally, the authors provide numerical evidence that Gradient Descent (GD) fails to find solutions before the models reach capacity, irrespective of the overlap gap. This observation suggests the presence of an algorithmic threshold that separates the theoretical capacity from practical solvability with GD.
The computation of the transition lines in the fRSB phase is an important technical result, as it requires a non-trivial numerical approach. The implications of the phase diagram for AMP algorithms are very interesting and establish connections with the optimization literature.
The GD data suggesting the presence of an algorithmic gap is also very interesting. However, I think it would benefit from an expanded discussion
to provide additional details and insights.
Here are some questions on Section 6:
- How many runs were performed to estimate the probability of finding solutions with GD?
- How robust are these probability curves to changing GD hyperparameters, such as the maximum number of training epochs? From the left panel of Fig. 6, it is unclear whether fine-tuning the hyperparameters could enable GD to reach $\alpha_c$.
- Is there a way to compute the algorithmic threshold for GD? The introduction mentions that the methods applicable to discrete weights do not generalize to the continuous case; is there an alternative approach?
-Is there a notion of Overlap Gap for the atypical solutions found by GD?
Requested changes
Here are some typos:
- pag. 3, line 1: missing reference
- pag. 4, eq. (3): it's $l \in [K]$
- pag. 7, eq. (15b): it's $\hat{\lambda}^b$ instead of $\hat{\lambda}^b_l$
- pag. 27, before eq. (83): it's "energetic term" instead of "entropic term"
- pag. 31, after eq. (103): it's "Fig. 8" instead of "Fig. 11"
Recommendation
Publish (easily meets expectations and criteria for this Journal; among top 50%)