Simplified derivations for high-dimensional convex learning problems

David G. Clark; Haim Sompolinsky

SciPost Submission Page

Simplified derivations for high-dimensional convex learning problems

by David G. Clark, Haim Sompolinsky

This is not the latest submitted version.

This Submission thread is now published as

SciPost Phys. Lect. Notes 105 (2025)

Submission summary

Authors (as registered SciPost users):

David Clark

Submission information
Preprint Link:	https://arxiv.org/abs/2412.01110v4 (pdf)
Date submitted:	Feb. 11, 2025, 2:34 p.m.
Submitted by:	David Clark
Submitted to:	SciPost Physics Lecture Notes

Ontological classification
Academic field:	Physics
Specialties:	Condensed Matter Physics - Theory Statistical and Soft Matter Physics
Approach:	Theoretical

Abstract

Statistical-physics calculations in machine learning and theoretical neuroscience often involve lengthy derivations that obscure physical interpretation. We present concise, non-replica derivations of key results and highlight their underlying similarities. Using a cavity approach, we analyze high-dimensional learning problems: perceptron classification of points and manifolds, and kernel ridge regression. These problems share a common structure--a bipartite system of interacting feature and datum variables--enabling a unified analysis. For perceptron-capacity problems, we identify a symmetry that allows derivation of correct capacities through a naïve method.

Current status:

Has been resubmitted

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2025-4-14 (Invited Report)

Cite as: Anonymous, Report on arXiv:2412.01110v4, delivered 2025-04-14, doi: 10.21468/SciPost.Report.11011

Report

This paper presents a clear and unified treatment of the cavity method for high-dimensional convex learning problems. It provides elegant and compact derivations of established results on the storage capacity of the perceptron with both random points and manifolds as inputs, and on the typical performance of kernel ridge regression with random kernel eigenfunctions. The approach nicely highlights the shared bipartite structure across different models, drawing insightful connections that will be particularly valuable to researchers learning these techniques.

Overall, I believe the paper makes a valuable contribution and should be accepted, provided that the authors address the comments below incorporating a few clarifications and missing references.

Comments:

It would be helpful to expand the discussion on extensions to more realistic settings, such as: (i) correlations across input dimensions (e.g., Gaussian mixtures with general covariances), (ii) correlations across patterns, and (iii) unbalanced datasets with unequal label distributions.
It would be helpful to add a comment on the role of convexity in enabling the derivations, and discuss possible extensions to nonconvex settings.
It would be useful to add a comment on the validity of the spectrum assumption (right above Sec. 4.4) in real tasks.
It may be helpful to include a more explicit explanation of why the bipartite structure guarantees that perturbations influence only the variables on the opposite side.
Please double check the a,b,c,d indices on the right hand side of eq. 81.

Some references are missing:

Section 4.1: it would be appropriate to cite the seminal work [1] on the derivation of worst-case rates, and to include citations to [2,3] for the typical case.
Discussion: a very detailed derivation of the dynamical cavity method for the perceptron model is presented in [4].

[1] Caponnetto, Andrea, and Ernesto De Vito. "Optimal rates for the regularized least-squares algorithm." Foundations of Computational Mathematics 7 (2007): 331-368. [2] Spigler, Stefano, Mario Geiger, and Matthieu Wyart. "Asymptotic learning curves of kernel methods: empirical data versus teacher–student paradigm." Journal of Statistical Mechanics: Theory and Experiment 2020.12 (2020): 124001. [3] Cui, Hugo, et al. "Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime." Advances in Neural Information Processing Systems 34 (2021): 10131-10143. [4] Agoritsas, Elisabeth, et al. "Out-of-equilibrium dynamical mean-field equations for the perceptron model." Journal of Physics A: Mathematical and Theoretical 51.8 (2018): 085002.

Requested changes

Address comments and include missing references.

Recommendation

Publish (easily meets expectations and criteria for this Journal; among top 50%)

validity: top
significance: top
originality: high
clarity: high
formatting: perfect
grammar: excellent

Author: David Clark on 2025-09-20 [id 5839]

(in reply to Report 2 on 2025-04-14)

Disclosure of Generative AI use

The comment author discloses that the following generative AI tools have been used in the preparation of this comment:

Claude Sonnet 4 was used to check grammar and format LaTeX.

Dear Referee #2,
Thank you for your comments. We have implemented your helpful suggestions. Please see the attached PDF in which these modifications are enumerated.
Sincerely,
David Clark

Attachment:

Clark_Sompolinsky_SciPost_response_HX1ne6w.pdf

Report #1 by Anonymous (Referee 1) on 2025-2-25 (Invited Report)

Cite as: Anonymous, Report on arXiv:2412.01110v4, delivered 2025-02-24, doi: 10.21468/SciPost.Report.10719

Report

Summary : These lecture notes revisit three celebrated problems in high-dimensional statistical learning, first studied in their respective works [5,8,9] through the lens of the replica method of statistical physics, using a cavity approach. The computation presents the advantage of being less lengthy, and overall more intuitive. It leverages the observation that all these problems admit reformulations with a bipartite structure.

Evaluation : As such, these notes propose a concise and insightful approach, and will prove of interest to researchers working on these topics. The manuscript is very well written, and sufficient discussion of all technical steps is provided. I list a few minor presentation comments below, but recommend that the work be accepted, even in its current state.

Comments: - more explanations on the self-averaging of the self-responses (e.g. below (33)) could prove helpful. - to the best of my reading, the expression (58) for the number of supporting points is not established before (58), and could gain to be briefly discussed. -"due to the bipartite structure, perturbations to other datum variables do not affect the [cavity variable]": is this statement true to leading order or in general ? If the former, it would be clearer to make the precision.

Recommendation

Publish (easily meets expectations and criteria for this Journal; among top 50%)

validity: -
significance: -
originality: -
clarity: -
formatting: -
grammar: -

Author: David Clark on 2025-09-20 [id 5838]

(in reply to Report 1 on 2025-02-25)

Disclosure of Generative AI use

The comment author discloses that the following generative AI tools have been used in the preparation of this comment:

Claude Sonnet 4 was used to check grammar and format LaTeX.

Dear Referee #1,
Thank you for your comments. We have implemented your helpful suggestions. Please see the attached PDF in which these modifications are enumerated.
Sincerely,
David Clark

Attachment:

Clark_Sompolinsky_SciPost_response.pdf

SciPost Submission Page

Simplified derivations for high-dimensional convex learning problems

by David G. Clark, Haim Sompolinsky

This is not the latest submitted version.

Submission summary

Abstract

Current status:

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2025-4-14 (Invited Report)

Report

Requested changes

Recommendation

Author: David Clark on 2025-09-20 [id 5839]

Attachment:

Report #1 by Anonymous (Referee 1) on 2025-2-25 (Invited Report)

Report

Recommendation

Author: David Clark on 2025-09-20 [id 5838]

Attachment:

Login to report or comment