SciPost Submission Page
The maximum-average subtensor problem: equilibrium and out-of-equilibrium properties
by Vittorio Erba, Nathan Malo Kupferschmid, Rodrigo Pérez Ortiz, Lenka Zdeborová
This is not the latest submitted version.
Submission summary
| Authors (as registered SciPost users): | Vittorio Erba |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2506.15400v1 (pdf) |
| Code repository: | https://github.com/SPOC-group/maximum-avg-subtensors |
| Date submitted: | June 19, 2025, 3:55 p.m. |
| Submitted by: | Vittorio Erba |
| Submitted to: | SciPost Physics |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approach: | Theoretical |
Abstract
In this paper we introduce and study the Maximum-Average Subtensor ($p$-MAS) problem, in which one wants to find a subtensor of size $k$ of a given random tensor of size $N$, both of order $p$, with maximum sum of entries. We are motivated by recent work on the matrix case of the problem in which several equilibrium and non-equilibrium properties have been characterized analytically in the asymptotic regime $1 \ll k \ll N$, and a puzzling phenomenon was observed involving the coexistence of a clustered equilibrium phase and an efficient algorithm which produces submatrices in this phase. Here we extend previous results on equilibrium and algorithmic properties for the matrix case to the tensor case. We show that the tensor case has a similar equilibrium phase diagram as the matrix case, and an overall similar phenomenology for the considered algorithms. Additionally, we consider out-of-equilibrium landscape properties using Overlap Gap Properties and Franz-Parisi analysis, and discuss the implications or lack-thereof for average-case algorithmic hardness.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Strengths
-
The paper tackles an interesting average-case tensor optimization problem in the Gaussian setting, extending a line of work that had previously been explored only in the matrix regime. The problem itself exhibits rich and intricate algorithmic and geometric phenomenology.
-
The paper approaches this phenomenology from several perspectives, including: analysis of two algorithms (LSA and IGP), replica calculations of the free energy (yielding the freezing point), as well as variants of the Overlap Gap Property (OGP) and Franz–Parisi (FP) landscape analysis.
Weaknesses
-
The analysis of the algorithms is underdeveloped and lacks rigor (see detailed comments below).
-
The landscape-based explanations for algorithmic hardness remain inconclusive and do not provide a definitive account (see below).
Report
The paper considers the maximum k-subtensor problem for a Gaussian tensor, where the goal is to find the k×⋯×k subtensor (henceforth referred to as a solution) with the highest possible average value. The authors focus on the scaling k=m⋅n and then send m→0.
This problem has previously been studied in the matrix case (ignoring concurrent work).
The authors (non-rigorously) identify:
-
the maximum possible value of a solution,
-
the freezing value (above which all solutions with that value are isolated),
-
the performance of two greedy algorithms for the task (LSA and IGP),
-
an annealed variant of the Overlap Gap Property (OGP), and
-
the Franz–Parisi landscape of the problem.
Parts (1), (2), (4), and (5) rely on replica calculations (followed by sending m→0). Part (3) analyzes the algorithms under the assumption that each iteration encounters a fresh Gaussian tensor; the authors suggest that this analysis might eventually be made rigorous.
A key finding is that the freezing value is strictly smaller than the maximum possible value. Traditionally, this would suggest that polynomial-time algorithms should fail to surpass the freezing threshold. However, recent results have complicated this picture. In fact, the analysis of IGP suggests that an efficient algorithm does cross this threshold. The authors attempt to explain this apparent contradiction using landscape-based perspectives, but the discussion remains inconclusive.
Overall, I find the choice of problem compelling, as it highlights the gap between freezing thresholds and algorithmic performance. However, I have several concerns:
- Lack of rigor in the algorithmic analysis.
While the replica-based parts are standard in this line of work, the analysis of algorithms is more problematic. For LSA, the authors rely on [1], which (i) only characterizes a typical local maximum—something LSA is not guaranteed to find—and (ii) applies only when k=O(1), whereas here k=mn with m→0. For IGP, the assumption of independent iterations is too much to just "assume". The claim that ideas from [2] can address this issue is unconvincing, since [2] only applies when k=o(logn) and it is quite tricky whether this generalizes to any other growing k, let alone almost linear. In my view, the authors should make a serious effort to analyze the algorithms on more rigorous grounds (or provide more evidence for them).
- Inconclusiveness of the OGP and Franz–Parisi analyses.
Both analyses leave more to be wanted, as the main point of the paper -- the algorithmic hardness-- is unresolved. Assuming the IGP analysis is correct, one might ask whether the algorithm’s ascending nature—systematically climbing the landscape—explains its ability to surpass the freezing value. The super-level set FP analysis seems to suggest this possibility, but it is unclear whether the authors obtain such an interpretation or not. Achieving such an intepretation would strengthen significantly the paper.
Requested changes
-
Improve the algorithm's analysis to make a more solid claim on the algorithm's performance.
-
Attempt to provide some explanation of hardness, perhaps following my suggestion on trying to understand whether the ascending nature of IGP is leveraging some geometric property.
Recommendation
Ask for major revision
Strengths
- The introduction is well written and explains some of the intricacies involved with predicting typical-case algorithmic hardness.
- The authors introduce a new model, which is natural and merits further exploration.
- The authors give a thorough analysis of the new model from various different angles.
Weaknesses
- For the most part, the findings are qualitatively the same as the previously-studied matrix case, so perhaps not so surprising. Still, this is interesting to know and could not have been predicted ahead of time.
- The results fall short of resolving the major question of pinpointing the algorithmic threshold. Still, I find it valuable to report on the attempts that were tried here.
Report
The introduction gives a nice account of typical-case algorithmic hardness. Various theoretical frameworks exist for predicting which solutions (or which objective values) are accessible by computationally efficient methods, but the relations among these are somewhat intricate and not completely understood. This paper is particularly concerned with exploring whether or not certain physics-style phase transitions such as "frozen 1-RSB" (where most solutions are isolated) are indicators of algorithmic hardness. The authors argue that the Maximum-Average Submatrix problem is a good testbed to explore this, since it is a simple model that was recently shown to exhibit a surprising phenomenon that challenged the conventional wisdom: a frozen 1-RSB phase that is nonetheless algorithmically accessible. This motivated the authors to consider the tensor case, which is a natural extension.
The paper carries out a rather thorough analysis of the phase transitions and algorithmic thresholds, from a few different angles. One finding is that, similar to the matrix case, there is a frozen 1-RSB phase which is nonetheless algorithmically accessible using a greedy algorithm called IGP. On the other hand, a different greedy algorithm called LAS stops working precisely at the freezing threshold. This leaves the question of whether IGP is optimal among efficient algorithms, and whether the threshold achieved by IGP coincides with any physics-style phase transition. The authors explore two different analytic approaches in an attempt to shed light on this. First, the authors obtain some non-matching upper and lower bounds on the algorithmic threshold via the Overlap Gap Property (OGP), and they mention that a concurrent and independent work has used a more refined version of OGP to pin down the precise threshold and prove that IGP is in fact optimal within a large class of methods. Second, the authors use a Franz-Parisi analysis; while this sheds light on the structure of certain rare clusters in the solution space, the authors discover that it does not appear to have any clear connection to the algorithmic threshold.
Requested changes
- pg 2: "an replica symmetric" -> a
- pg 3: "upper bound base on" -> based
- pg 3: "Table 1 subs up" -> sums
- pg 3: "still find [it] surprising that such a simple..."
- pg 4 (and throughout): I believe p would typically be called the "order" (not "rank") of the tensor, since "rank" has a different meaning similar to matrix rank
Recommendation
Publish (meets expectations and criteria for this Journal)
We thank Referee 1 for taking the time to read our work and provide feedback. They raised the following points, to which we answer.
Results are qualitatively the same as the matrix case.
We agree with Referee 1 that the findings are not qualitatively different from the matrix case, but while this is not surprising a a posteriori, it is still non-trivial. For example, it is well known [1] that the classic binary p-spin model at finite magnetisation m has different behaviour at p=2 (RS to full-RSB transition) and p > 2 (RS to 1-RSB to full-RSB transition). Under this light, the fact that the nature of the transitions is the same for p=2 and p>2 for Boolean spins and in the limit m to zero is non-trivial.
Results do not shed full light on the algorithmic properties of the problem.
We actually find this point to be an important observation to share with the community, as it indicates a failure mode of the tools of disordered physics when applied to computational problems.
We find that understanding from first principles when (for which problems, under which assumptions) the standard statistical physics toolbox is predictive, and when it is not is an important open problem, and here we provide an analytically tractable negative result.
We will add further commentary in the resubmitted version of the manuscript (in the introduction) to better highlight these aspects.
Typos and nomenclature.
We will correct all typos, and change the notation from "rank" to the more appropriate "order" one, in the resubmitted version.
References
[1] Gardner E. 1985. Spin glasses with p-spin interactions. Nucl. Phys. B 257 747

Author: Vittorio Erba on 2025-11-12 [id 6022]
(in reply to Report 2 on 2025-10-04)We thank Referee 2 for taking the time to read our work and provide feedback. Across their report, they raised the following points, to which we answer.
The analysis of the algorithms is underdeveloped and lacks rigor.
Convergence of the LAS algorithm. The referee correctly points out that we only adapted a proof of the structure theorem from [1], and not the matching convergence proof by [2], for the LAS algorithm. We clarified this better in the new version of the manuscript (Section 3.2). We do not think adapting the convergence result of [2] would bring any additional insight here, and we expect there would be no roadblocks as it was the case for the structure theorem.
Analysis of the IGP algorithm. The heuristic derivation we provide here for the final energy of the IGP algorithm for $p \geq 2$ matches with a concurrent rigorous derivation for the same value [4,5], holding for $k= \exp(o(\log n))$, thus confirming the validity of our prediction basically up to almost-linear scaling.
On the relationship between the setting $1 \ll k \ll N$ and $k = mN$ with $m\ll 1$. We agree with the referee that a priori the setting of sub-linear sized subtensors $1 \ll k \ll N$ may behave differently from small, linearly-sized subtensors $k = mN$ with $m\ll 1$. We remark that a similar exchange of limits has been used in several other problems (see [3] , appendix C.1 and C.2, for a recent example) where it has been empirically validated by simulations, or by explicit computation of the complementary limit. Here the computational complexity of dealing with tensors scaling as $O(k^p)$, and the necessity of scaling $k$ with $N$, make such numerical verification very hard. Rigorously adapting the prediction from statistical physics tools to the $k \ll N$ regime is hard (we are not aware of any results in this regime, across all spin glass models commonly studied), and adapting the proof techniques used in the $k \ll N$ regime to the linear scaling seems also hard (see for example [5], Remark 2.8, posted on arXiv just after our submission of this work, where the linear $k$ regime is only studied for large enough tensor order $p$ thanks to convergence to a simpler Random Energy Model behaviour). In conclusion, we think that the heuristic level of presentation we provide in this manuscript is the best we could do, and going beyond to provide a proper comparison between the $1 \ll k \ll N$ regime and $k = mN$ with $m\ll 1$ regime would require significant technical advances. We will comment on this in the resubmitted manuscript (Section 2.2).
Inconclusiveness of the OGP and Franz–Parisi analyses.
As discussed also in the answer to Referee 1, we find that the inconclusiveness of our results is a negative result worth sharing, as it highlights the failure of standard statistical physics tools to predict algorithmic properties (at odds with what happens in other models, such as the colouring on random graphs, and hence at odds to the common knowledge). To rephrase, our results could be interpreted as "We exhibit a model where standard clustering (OGP) and Franz-Parisi tools do not shed light on the behaviour of optimal, simple greedy algorithms". We will add further commentary in the resubmitted manuscript (in the introduction) to better highlight these aspects.
Interpretation of the Franz-Parisi analysis.
Referee 2 states: "Assuming the IGP analysis is correct, one might ask whether the algorithm’s ascending nature—systematically climbing the landscape—explains its ability to surpass the freezing value. The super-level set FP analysis seems to suggest this possibility, but it is unclear whether the authors obtain such an interpretation or not." Our Franz-Parisi analysis was indeed motivated by this intuition, i.e. that clusters around sub-tensors of large energy (uniformly sampled under the equilbrium measure) are surrounded by clusters of lower-energy configurations, which may act as "bridges" between easily accessible equilibrium energy states (lower than the freezing threshold) and those that are hard to find (larger than the freezing threshold). What we find though does not support this intuitive picture. The Franz-Parisi analysis indeed shows that there are no such bridges (all clusters around hard to find equilibrium configurations merge at an energy higher than freezing), and in any case all the thresholds we may identify in the Franz-Parisi analysis do not correlate with the IGP convergence energy. This means that surely IGP is not probing configurations that are close in Hamming distance to equilibrium, large energy configurations, making this algorithmic behaviour crucially out-of-equilibrium. We will add further commentary in the resubmitted manuscript (in Section 5) to better highlight these aspects.
References
[1] Shankar Bhamidi, Partha S. Dey, and Andrew B. Nobel. Energy landscape for large average submatrix detection problems in gaussian random matrices. Probability Theory and Related Fields, 168(3):919–983, 2017. [2] David Gamarnik and Quan Li. Finding a large submatrix of a Gaussian random matrix. The Annals of Statistics, 46(6A):2511 – 2561, 2018. [3] Maillard, A., Troiani, E., Martin, S., Krzakala, F., & Zdeborová, L. (2024). Bayes-optimal learning of an extensive-width neural network from quadratically many samples. Advances in Neural Information Processing Systems, 37, 82085-82132. [4] Bhamidi, S., Gamarnik, D., & Gong, S. (2025). Finding a dense submatrix of a random matrix. Sharp bounds for online algorithms. arXiv preprint arXiv:2507.19259. [5] Abhishek Hegade K, R., & Kızıldağ, E. C. (2025). Large Average Subtensor Problem: Ground-State, Algorithms, and Algorithmic Barriers. arXiv e-prints, arXiv-2506.