Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks

Brandon L. Annesi; Enrico M. Malatesta; Francesco Zamponi

SciPost Submission Page

Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks

by Brandon L. Annesi, Enrico M. Malatesta, Francesco Zamponi

This is not the latest submitted version.

This Submission thread is now published as

SciPost Phys. 18, 118 (2025)

Submission summary

Authors (as registered SciPost users):

Enrico Malatesta

Submission information
Preprint Link:	scipost_202410_00050v1 (pdf)
Date submitted:	2024-10-21 16:26
Submitted by:	Malatesta, Enrico
Submitted to:	SciPost Physics

Ontological classification
Academic field:	Physics
Specialties:	Statistical and Soft Matter Physics
Approaches:	Theoretical, Computational

Abstract

We analyze the problem of storing random pattern-label associations using two classes of continuous non-convex weights models, namely the perceptron with negative margin and an infinite-width two-layer neural network with non-overlapping receptive fields and generic activation function. Using a full-RSB ansatz we compute the exact value of the SAT/UNSAT transition. Furthermore, in the case of the negative perceptron we show that the overlap distribution of typical states displays an overlap gap (a disconnected support) in certain regions of the phase diagram defined by the value of the margin and the density of patterns to be stored. This implies that some recent theorems that ensure convergence of Approximate Message Passing (AMP) based algorithms to capacity are not applicable. Finally, we show that Gradient Descent is not able to reach the maximal capacity, irrespectively of the presence of an overlap gap for typical states. This finding, similarly to what occurs in binary weight models, suggests that gradient-based algorithms are biased towards highly atypical states, whose inaccessibility determines the algorithmic threshold.

Author indications on fulfilling journal expectations

Provide a novel and synergetic link between different research areas.
Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
Detail a groundbreaking theoretical/experimental/computational discovery
Present a breakthrough on a previously-identified and long-standing research stumbling block

Current status:

Has been resubmitted

Reports on this Submission

Report #3 by Anonymous (Referee 3) on 2024-12-14 (Invited Report)

Cite as: Anonymous, Report on arXiv:scipost_202410_00050v1, delivered 2024-12-14, doi: 10.21468/SciPost.Report.10331

Report

See attached report .

Attachment

Download attachment

Recommendation

Ask for major revision

validity: -
significance: -
originality: -
clarity: -
formatting: -
grammar: -

Report #2 by Anonymous (Referee 2) on 2024-12-5 (Invited Report)

Cite as: Anonymous, Report on arXiv:scipost_202410_00050v1, delivered 2024-12-05, doi: 10.21468/SciPost.Report.10283

Strengths

1- This work provides a full-RSB treatment of the storage capacity problem for the tree committee machine, improving over the previous literature who had studied only RS and 1RSB upper bounds.

Weaknesses

1- The presentation could be improved, both by extending some discussion and fixing the typos.
2- The paper lacks a discussion on the consequences of their result and a clear "take home message".

Report

Please refer to the report attached for details.

Requested changes

1- Fix all typos listed in the report.
2- Add more details on the CLT argument (eventually also in the appendix)

Attachment

Download attachment

Recommendation

Ask for minor revision

validity: good
significance: good
originality: good
clarity: high
formatting: good
grammar: reasonable

Report #1 by Anonymous (Referee 1) on 2024-12-3 (Invited Report)

Cite as: Anonymous, Report on arXiv:scipost_202410_00050v1, delivered 2024-12-03, doi: 10.21468/SciPost.Report.10271

Strengths

1 - The analysis of the full-RSB phase diagram is a significant result and addresses a technically challenging problem.
2 - The connection with optimization is very interesting and links different research communities.
3 - The paper is clearly written and provides a pedagogical presentation of the results.

Weaknesses

1 - The discussion on optimization algorithms, especially on Gradient Descent, is somewhat limited and could be expanded to give more insights.

Report

The authors investigate two continuous non-convex constraint satisfaction problems: the negative-margin perceptron and the infinite-width tree-committee machine. They perform a static analysis in the full Replica Symmetry Breaking (fRSB) regime, computing the SAT-UNSAT transition line for these models.

For the negative perceptron, they identify a Gardner transition, where the typical solutions develop an "overlap gap." This feature implies that Approximate Message Passing (AMP) algorithms aren't guaranteed to converge in this regime.

Additionally, the authors provide numerical evidence that Gradient Descent (GD) fails to find solutions before the models reach capacity, irrespective of the overlap gap. This observation suggests the presence of an algorithmic threshold that separates the theoretical capacity from practical solvability with GD.

The computation of the transition lines in the fRSB phase is an important technical result, as it requires a non-trivial numerical approach. The implications of the phase diagram for AMP algorithms are very interesting and establish connections with the optimization literature.
The GD data suggesting the presence of an algorithmic gap is also very interesting. However, I think it would benefit from an expanded discussion
to provide additional details and insights.

Here are some questions on Section 6:
- How many runs were performed to estimate the probability of finding solutions with GD?
- How robust are these probability curves to changing GD hyperparameters, such as the maximum number of training epochs? From the left panel of Fig. 6, it is unclear whether fine-tuning the hyperparameters could enable GD to reach $\alpha_c$ .
- Is there a way to compute the algorithmic threshold for GD? The introduction mentions that the methods applicable to discrete weights do not generalize to the continuous case; is there an alternative approach?
-Is there a notion of Overlap Gap for the atypical solutions found by GD?

Requested changes

Here are some typos:
- pag. 3, line 1: missing reference
- pag. 4, eq. (3): it's $l \in [K]$
- pag. 7, eq. (15b): it's $\hat{\lambda}^b$ instead of $\hat{\lambda}^b_l$
- pag. 27, before eq. (83): it's "energetic term" instead of "entropic term"
- pag. 31, after eq. (103): it's "Fig. 8" instead of "Fig. 11"

Recommendation

Publish (easily meets expectations and criteria for this Journal; among top 50%)

validity: high
significance: good
originality: good
clarity: high
formatting: excellent
grammar: good

SciPost Submission Page

Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks

by Brandon L. Annesi, Enrico M. Malatesta, Francesco Zamponi

This is not the latest submitted version.

This Submission thread is now published as

Submission summary

Abstract

Author indications on fulfilling journal expectations

Current status:

Reports on this Submission

Report #3 by Anonymous (Referee 3) on 2024-12-14 (Invited Report)

Report

Attachment

Recommendation

Report #2 by Anonymous (Referee 2) on 2024-12-5 (Invited Report)

Strengths

Weaknesses

Report

Requested changes

Attachment

Recommendation

Report #1 by Anonymous (Referee 1) on 2024-12-3 (Invited Report)

Strengths

Weaknesses

Report

Requested changes

Recommendation

Login to report or comment