SciPost Submission Page
Simplified derivations for high-dimensional convex learning problems
by David G. Clark, Haim Sompolinsky
Submission summary
Authors (as registered SciPost users): | David Clark |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2412.01110v4 (pdf) |
Date submitted: | 2025-02-11 14:34 |
Submitted by: | Clark, David |
Submitted to: | SciPost Physics Lecture Notes |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approach: | Theoretical |
Abstract
Statistical-physics calculations in machine learning and theoretical neuroscience often involve lengthy derivations that obscure physical interpretation. We present concise, non-replica derivations of key results and highlight their underlying similarities. Using a cavity approach, we analyze high-dimensional learning problems: perceptron classification of points and manifolds, and kernel ridge regression. These problems share a common structure--a bipartite system of interacting feature and datum variables--enabling a unified analysis. For perceptron-capacity problems, we identify a symmetry that allows derivation of correct capacities through a naïve method.
Current status:
Reports on this Submission
Report
This paper presents a clear and unified treatment of the cavity method for high-dimensional convex learning problems. It provides elegant and compact derivations of established results on the storage capacity of the perceptron with both random points and manifolds as inputs, and on the typical performance of kernel ridge regression with random kernel eigenfunctions. The approach nicely highlights the shared bipartite structure across different models, drawing insightful connections that will be particularly valuable to researchers learning these techniques.
Overall, I believe the paper makes a valuable contribution and should be accepted, provided that the authors address the comments below incorporating a few clarifications and missing references.
Comments:
- It would be helpful to expand the discussion on extensions to more realistic settings, such as: (i) correlations across input dimensions (e.g., Gaussian mixtures with general covariances), (ii) correlations across patterns, and (iii) unbalanced datasets with unequal label distributions.
- It would be helpful to add a comment on the role of convexity in enabling the derivations, and discuss possible extensions to nonconvex settings.
- It would be useful to add a comment on the validity of the spectrum assumption (right above Sec. 4.4) in real tasks.
- It may be helpful to include a more explicit explanation of why the bipartite structure guarantees that perturbations influence only the variables on the opposite side.
- Please double check the a,b,c,d indices on the right hand side of eq. 81.
Some references are missing:
- Section 4.1: it would be appropriate to cite the seminal work [1] on the derivation of worst-case rates, and to include citations to [2,3] for the typical case.
- Discussion: a very detailed derivation of the dynamical cavity method for the perceptron model is presented in [4].
[1] Caponnetto, Andrea, and Ernesto De Vito. "Optimal rates for the regularized least-squares algorithm." Foundations of Computational Mathematics 7 (2007): 331-368.
[2] Spigler, Stefano, Mario Geiger, and Matthieu Wyart. "Asymptotic learning curves of kernel methods: empirical data versus teacher–student paradigm." Journal of Statistical Mechanics: Theory and Experiment 2020.12 (2020): 124001.
[3] Cui, Hugo, et al. "Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime." Advances in Neural Information Processing Systems 34 (2021): 10131-10143.
[4] Agoritsas, Elisabeth, et al. "Out-of-equilibrium dynamical mean-field equations for the perceptron model." Journal of Physics A: Mathematical and Theoretical 51.8 (2018): 085002.
Requested changes
Address comments and include missing references.
Recommendation
Publish (easily meets expectations and criteria for this Journal; among top 50%)
Report
Summary : These lecture notes revisit three celebrated problems in high-dimensional statistical learning, first studied in their respective works [5,8,9] through the lens of the replica method of statistical physics, using a cavity approach. The computation presents the advantage of being less lengthy, and overall more intuitive. It leverages the observation that all these problems admit reformulations with a bipartite structure.
Evaluation : As such, these notes propose a concise and insightful approach, and will prove of interest to researchers working on these topics. The manuscript is very well written, and sufficient discussion of all technical steps is provided. I list a few minor presentation comments below, but recommend that the work be accepted, even in its current state.
Comments:
- more explanations on the self-averaging of the self-responses (e.g. below (33)) could prove helpful.
- to the best of my reading, the expression (58) for the number of supporting points is not established before (58), and could gain to be briefly discussed.
-"due to the bipartite structure, perturbations to other datum variables do not affect the [cavity variable]": is this statement true to leading order or in general ? If the former, it would be clearer to make the precision.
Recommendation
Publish (easily meets expectations and criteria for this Journal; among top 50%)