SciPost Submission Page
High-dimensional random landscapes: from typical to large deviations
by Valentina Ros
This is not the latest submitted version.
Submission summary
Authors (as registered SciPost users): | Valentina Ros |
Submission information | |
---|---|
Preprint Link: | scipost_202502_00002v1 (pdf) |
Date submitted: | Feb. 3, 2025, 12:24 p.m. |
Submitted by: | Ros, Valentina |
Submitted to: | SciPost Physics Lecture Notes |
for consideration in Collection: |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approach: | Theoretical |
Abstract
We discuss tools and concepts that emerge when studying high-dimensional random landscapes, i.e., random functions on high-dimensional spaces. As an example, we con- sider a high-dimensional inference problem in two forms: matrix denoising (Case 1) and tensor denoising (Case 2). We show how to map the inference problem onto the opti- mization problem of a high-dimensional landscape, which exhibits distinct geometrical properties in the two Cases. We discuss methods for characterizing typical realizations of these landscapes and their optimization through local dynamics. We conclude by highlighting connections between the landscape problem and Large Deviation Theory.
Current status:
Reports on this Submission
Report
This manuscript deals with the properties of high-dimensional random functions, in particular the number and organization of their stationary points of various indices, and their consequences, in particular for dynamical evolution exploring them. It is well-written with a strong effort to make the presentation as pedagogical as possible, it is therefore complying with the standards of Scipost Physics Lecture Notes and I recommend its publication. I have a series of small remarks and suggestions that could be considered in a revision prior to publication.
-
the two examples of random landscapes treated in Sections 2 and 3 are "planted spin-glasses", in the sense that the distribution of the coupling constants are not simply independent random variables as in usual spin-glasses, but includes a low-rank perturbation to this independent noise background. The motivation for the study of this type of random functions comes from inference problems, as clearly explained in the manuscript, but this connection might be slightly more exploited (in the current version the emphasis is a bit more on the landscape than on its inference interpretation). For instance, the only estimator considered here is the Maximum Likelihood one, corresponding to priors on the signal uniform on the sphere, and with the objective to maximize the probability of the estimated signal in the posterior measure. A few more sentences could be added to discuss the effect of considering more general priors (sparse ones for instance) and of using the Minimal Square Error as a measure of the accuracy of the estimator, which is optimized in a Bayesian perspective by the posterior average. In this more general setting the purely spectral perspective taken in these notes is not necessarily the optimal one, algorithms of the Approximate Message Passing type being able to recover partly the signal even in the absence of outlier eigenvalues in the spectrum of the matrix. The discussion on page 22 is a bit misleading in this sense, as it seems to suggests that the spectral transition is the relevant one for a wide class of priors, whereas several cases escape from this situation. In the same line of thought, the discussion of the dynamics, for instance in Section 2.3.3 for the matrix case, concentrates on the behavior of the energy of the physical model. From the point of view of the inference problem what really matters is the overlap between the current configuration of the dynamics and the planted signal, not the energy per se, maybe this observable and its possible connection with the energy should be discussed a bit more.
-
in the introduction it could be useful to specify whether one should think of the variables as discrete or continuous: when speaking of the "landscape topology" in terms of stationary points one is led to envision continuous ones, but "the total number of configurations accessible to the system" seems to refer to a discrete setting.
-
page 2, "that can be agents in an economy, neurons in biological or artificial neural networks, species in ecosystems, particles or spins in materials and so on", it would sound natural to give here references to review papers on these various applications. Same comment applies after "the emergence of glassy phenomenology in a wide range of fields, including machine learning, quantum optimal control, ecology, and inference" on page 3.
-
page 5, figure 2, why are there two images of "signal + noise" in the leftmost panel? Is "adapted from the web" a sufficiently precise reference?
-
the name "matrix denoising" for the problem handled in section 2 might be slightly confusing, as it is usually associated to extensive-rank signals, maybe "low-rank matrix factorization" is a more common terminology in the inference context.
-
page 6, equation (2), it might be specified that i≤j and k≤l, otherwise the covariance contains additional symmetric terms. Same comment for equation (101) page 33.
-
page 6, after equation (4), the justification that the transformation J→JR has unit Jacobian is slightly more complicated than just detO=1 (and incidentally this determinant can also be -1 for an orthogonal matrix). Indeed the transformation J→JR acts on a number of variables of order N2, its Jacobian is not simply O, but in some sense the tensor product of O with itself.
-
page 6, "one finds that the distribution of the eigenvectors averaged over the GOE ensemble is equivalent to that of random vectors on the sphere", maybe one could be a bit more precise here: a single eigenvector is indeed uniformly distributed on the sphere, but the joint law of the eigenvectors does not factorize as independent random vectors, since they obey an orthogonality condition. Mentioning the Haar measure of the orthogonal group might be a better formulation of this statement.
-
page 6, "we have access to several instances of the noisy matrix M", I am puzzled by the "several", it seems that the inference problem considered here consists in trying to infer the signal v from a single instance of M, not several ones.
-
page 10, equation (18), one could specify that the noise η has zero average.
-
page 10, the articulation between the main text and the box [B3] could be improved, since the box uses equations and concepts (Lagrange multipliers) not yet encountered in the main text at the point where the box is first mentioned. In that box, after equation (28), one could explain that the last component in equation (28) vanishes only if one assumes that s is on the sphere.
-
page 11, after equation (21), one could mention that the eigenvectors u are normalized.
-
page 14, "refer to the cited references for such generalizations", at this point the only reference given on RMT is [10], maybe the other ones should be given here.
-
page 14, in the middle panel it would be more enlightening to have a density with a vertical slope at the edge, since the semi-circle (and most of the other densities encountered in RMT) have root singularities at their edges.
-
page 15, equation (37) the notation C− is not defined, its definition could be given a few lines later when "lower-half complex plane" is mentioned.
-
page 19, between equation (52) and (53), "Gaussian random variable", specify its mean and variance. Between equations (54) and (55), maybe recall the definition of ω which was given before (49). At the end of this paragraph one could mention the paper
S F Edwards and R C Jones, The eigenvalue spectrum of a large symmetric random matrix, Journal of Physics A: Mathematical and General 9, 1595 (1976)
that predates [18] and where the outlier transition for rank one perturbed GOE matrix was found.
-
page 20, in figure 4 the notation gN for the gaps between eigenvalues should be defined (the definition only appears afterwards in equation (66)), and a warning against the possible confusion with the Stieltjes transform could be given.
-
page 20, in equation (56) the Heaviside function θ should be defined. Right afterwards, is it really "straightforward" to obtain this from the r=0 case? Maybe a few additional explanations could be given, starting from the r=0 formula which might not be obvious to all readers.
-
page 20, the sentence before (57) is a bit confusing, the qα do not describe fully the eigenvectors, only their projections in one direction. Also when r=0 w is not really defined, it should be explained that the qα can then be chosen as the projections onto a fixed but arbitrary vector.
-
page 22, in the description of the phase transition, when r<rc(σ) is the phase a spin-glass or a paramagnet? Maybe a few words to justify the "spin-glass" qualification would be helpful.
-
page 25, the formulation "For the model we are looking at, the solution to these equations has been studied in detail in [32]. One finds that in the mean-field limit, the dynamics does not really depend on r" is somehow confusing since [32] did not consider the rank-one perturbation of the GOE. Also it could be useful to recall at this point that the summary of properties stated on pages 25 and 26 correspond to the zero-temperature limit of the dynamics.
-
page 27, could there be a reference for the justification of the result of equation (75), or some explanations (in particular on the exponent 3/2 for the algebraic prefactor) if it can be easily obtained from the results presented before in the manuscript?
-
page 28, does the scaling function of the critical regime defined in equation (76) depend on ω?
-
page 29 and later, in the p>2 case, some more details should be given on the symmetry properties of the tensors, for instance the variance in equation (79) is probably written assuming that all indices are distinct, by analogy with the GOE matrix case the variance should depend on the number of distinct indices among the p ones. Of course these distinctions do not modify the leading order behavior in the large N limit, but if lower order terms are neglected it would be more pedagogical to mention it. A similar comment apply to the third equation of box [B6] page 34, instead of multiplying by p! one should sum over all permutations between the indices i and j, unless you take as understood that i1<i2<...<ip everywhere, and then you should put the factorial in (79).
-
page 31, equation (91), Jensen's inequality does not necessarily imply that the annealed average of N is much larger than the typical value, if the quenched and annealed complexity coincide they can be of the same order.
-
page 34, first equation of box [B6], the slashed notation could be explained.
-
page 35, equation (106), shouldn't the exponent of (1−q2) be (N−4)/2 instead of (N−2)/2?
-
page 36, the justification of (110) by the self-averaging of ρN should be improved: in (109) ρN appears as exp(NρN), hence if the LDP of ρN was with speed N one would not have (110). Fortunately the LDP for ρN has speed N2, which allows to replace it by its average in (109) and thus justify (110).
-
page 36, equation (112), the function I(y) as defined by the first integral seems to be an even value of y, if I'm not mistaken. However the final expression does not seem to have this property, or maybe I'm missing some simplification?
-
page 42, paragraph "An paradigmatic solution", maybe say that what is found in [79] is a solution up to a reparametrization of time, not a fully analytical solution.
-
typos :
-
page 12, box [B3], "is and N−1" -> "is an N−1"
- page 13, box [B3], after equation 29, the "Riemannian Hessian" should be ∇2⊥Er(s) and not ∇2Er(s,λ)
- page 15, "finite N fluctuation are" -> "fluctuations"
- page 16, "obtained ad" -> "as"
- page 16, equation (41), isn't it 1/z2 instead of 1/z inside the square root?
- page 17, "computing its the large-N"
- page 18, "a transitions"
- page 23, equation (63), si and not s in the left hand side
- page 26, "taking in limit"
- page 28, "behaves a power law"
- page 29, I imagine that "As for the spherical case" should read "As for the matrix case"
- page 29, a global - sign is missing in the first line of equation (83)
- page 31, equation (90), there should be a E in the left hand side of the first equality
- page 38, caption of figure 7, "The BB implies"
- page 40, "no isolate eigenvalues" -> "isolated"
- page 42, equation (120), there is limt→∞ missing in the first two terms
- page 48, "initial and finite" -> "final"
- page 51, in the hint on Gaussian integrals, (K−1)lm instead of ˆK, and the exponent of the determinant is −1/2.
Recommendation
Publish (easily meets expectations and criteria for this Journal; among top 50%)
Report #1 by Bertrand Lacroix-A-Chez-Toine (Referee 1) on 2025-3-20 (Invited Report)
Report
These notes describe this topic in a very pedagogic and well-written way. They refer to very recent and top-level literature in this subject without getting too technical. It also provides illustrations and hands-on exercises for students to get familiar with the content and some technical aspects.
I highly recommend these lecture notes for publication in SciPost Physics Lecture Notes once the small list of changes below are implemented.
Requested changes
Find below a list of typos/comments:
1 - p6: "This properties allows us to draw" should be "This property allows us to draw"
2 - p12: "The Riemannian gradient on the hypersphere is and (N − 1)-dimensional vector" should be "The Riemannian gradient on the hypersphere is an (N − 1)-dimensional vector"
3 - p17: "by computing its the large-N expansion" should be "by computing its large-N expansion"
4 - p28: "An “hard" inference problem: noisy tensors" should be "A “hard" inference problem: noisy tensors"
5 - p30: In the section "In the case of quadratic landscape p = 2, the RMT results imply that:", it seems that point (ii) and (iii) are incompatible: If the variable N (ε)/N "converges to its average", its limiting distribution is a delta function which seems incompatible with the fact that it has a "well-defined limit when N →∞"
This point should be clarified in the revised version
6 - p37: "One can check that that there are values" should be "One can check that there are values"
7 - p40: "These stationary points are marginally stabile" should be "These stationary points are marginally stable"
8 - p40: "stationary points at the equator have no isolate eigenvalues" should be "stationary points at the equator have no isolated eigenvalues"
9 - p42: A limit t\to\infty should appear in Eq. (120) (after the limit N\to \infty is taken)
10 - p42: "An paradigmatic solution" should be "A paradigmatic solution"
Recommendation
Publish (surpasses expectations and criteria for this Journal; among top 10%)