Generating dense packings of hard spheres by soft interaction design

Packing spheres efficiently in large dimension $d$ is a particularly difficult optimization problem. In this paper we add an isotropic interaction potential to the pure hard-core repulsion, and show that one can tune it in order to maximize a lower bound on packing density. Our results suggest that exponentially many (in the number of particles) distinct disordered sphere packings can be effectively constructed by this method, up to a packing fraction close to $7\, d\, 2^{-d}$. The latter is determined by solving the inverse problem of maximizing the dynamical glass transition over the space of the interaction potentials. Our method crucially exploits a recent exact formulation of the thermodynamics and the dynamics of simple liquids in infinite dimension.


Introduction
The sphere packing problem consists in finding the densest arrangements of equal-sized spheres in the d-dimensional Euclidean space. This is a simply defined geometrical problem with intriguing and sometimes unexpected connections to other areas of mathematics, natural sciences and engineering. In this paper we shall study the sphere packing problem in the limit of large space dimension d. Far from being an abstract problem, this asymptotic limit is known, since Shannon's pioneering work, to be connected to the practical problem of designing error correcting codes in communication technology [1]. Another motivation for this study comes from statistical physics. In this context the large d limit provides a theoretical framework in which computations are simplified, and one can study analytically the complex phase diagram of interacting colloids [2][3][4].
In the sphere packing problem, the degrees of freedom are the positions of the centers of a set of impenetrable spheres. It can be thus naturally viewed as a Constraint Satisfaction Problem (CSP) [5]: the positions of the spheres form the variables, which have to fulfill a constraint for each pair of particles, namely that the distance between their centers is larger than their diameter. The largest packing density separates, in this setting, the satisfiable (SAT) regime in which all the constraints can be simultaneously satisfied (a packing exists) from the unsatisfiable (UNSAT) regime in which they cannot (no packing exists).
The term CSP is usually encountered in the context of problems with discrete variables, for instance graph q-coloring or k-satisfiability. These two examples are NP-complete decision problems [6] (for q, k 3); to characterize their typical complexity an important research effort has been devoted to the study of random ensembles of such CSPs, for instance the coloring of Erdös-Rényi random graphs, or the satisfiability of random formulas [7,8]. These problems can be recast as mean-field spin-glasses models [9,10], the randomness in their construction washing out any Euclidean geometry. The techniques developed in the statistical mechanics of disordered systems community, in particular the replica and cavity method, thus apply to these random CSPs as well, and have brought a lot of important quantitative and qualitative results [11][12][13][14]. In particular, the location of the SAT-UNSAT phase transition can be computed in this statistical mechanics framework. Even more importantly, it is found that other phase transitions, in the SAT phase, affect the structure of the subset of solutions in the configuration space. In particular, starting from an underconstrained situation and increasing the density of constraints, one generically encounters a "dynamic" phase transition where the solution space breaks apart into a very large number of disjoint clusters. At this dynamic transition, uniform sampling of solutions becomes exponentially hard in the number of variables [15].
These same techniques can also be used to study disordered hard sphere packings. At variance with spin-glasses and random CSP, sphere packings have no quenched disorder, but a phenomenon of self-induced disorder upon increasing density allows one to treat them in a similar way [3,[16][17][18]. Another difference is the finite-dimensionality d of the packing problem, to be contrasted with the mean-field character of random CSPs. However in the recent years a controlled d → ∞ theory has been set up [3,4], and the limiting theory was shown to share several qualitative properties with random CSPs. This analogy has been very fruitful and has allowed to treat in a unified way these seemingly different phenomena.
Consider a system of N hard spheres in a d-dimensional volume V with periodic boundary conditions, in the thermodynamic limit where N, V → ∞ at constant packing fraction ϕ = N v d /V (v d being the volume of a unit-diameter sphere), and let us recall the most important results obtained from this approach, in the large d limit (taken after the thermodynamic limit), that are relevant for present discussion: 1. For packing fractions ϕ < ϕ d , with ϕ d = 4.8067... d 2 −d , the liquid can be equilibrated in a time that scales polynomially in the number of particles N [19]. Equivalently, the (exponentially many in N ) available packings at ϕ < ϕ d can be sampled uniformly in polynomial time in N . On the contrary, for ϕ > ϕ d , the sampling takes a time growing exponentially in N .
2. The equilibrium configurations produced at ϕ = ϕ d can still be compressed out of equilibrium, up to a density ϕ j,d where their (out of equilibrium) pressure diverges and further compression becomes impossible (also called the jamming transition). The precise value of ϕ j,d has not been computed, but it can be approximately located around 3. Despite being exponentially hard to sample, disordered packings exist up to a density scaling as ϕ GCP = (ln d) d 2 −d , that corresponds to an equilibrium SAT-UNSAT phase transition (the acronym GCP standing for Glass Close Packing) [3].
It should be noted that, by construction, this approach is focused on disordered packings (defined as those that can be constructed by adiabatic compression from the liquid state), and therefore it does not provide any information on lattice packings [3].
Let us now describe the main idea developed in the present paper. The phase transitions described above explicitly referred to the Gibbs-Boltzmann distribution which is uniform over the solutions of the CSPs, i.e. the configurations that simultaneously satisfy all the constraints. In the particle systems language, this means a purely hard-sphere interaction, all positions of the spheres that lead to an overlap between them are assigned an infinite energy and thus forbidden, but all other configurations have the same vanishing energy. Consider now a situation where the constraints of the CSP are still imposed in a strict way, but the probability measure on the solutions can be biased to favour some solutions and disfavour others (for particles, this means a pairwise potential energy infinite below the diameter of the particles, but arbitrary at larger distance). By definition, the location of the (equilibrium) SAT-UNSAT transition is not altered by this modification, but the other structural phase transitions in the satisfiable phase will in general be. In particular, one can hope to increase in this way the density ϕ d of the dynamic transition, hence to enlarge the regime of parameters where solutions can be obtained efficiently (even if no more uniformly); recent works in the context of discrete CSP can be interpreted along this line [21][22][23]. Here we shall follow this idea in the context of particle systems, and look for the soft part of the interparticle potential that, in the mean-field infinite-dimensional analytical formulation, yields the highest possible packing fraction of the dynamic transition.
The outline of the rest of the paper is as follows. In Sec. 2 we provide a short review of existing results (rigorous and not) on the sphere packing problem. In Sec. 3, we briefly recall the theoretical background and the main formulas relating the dynamical liquid-glass transition to the interaction potential of simple liquids. Next, we describe the numerical strategy we used for solving the "inverse problem" of finding the potential that maximizes ϕ d , and its actual implementation for the special case of piecewise constant potentials. In Sec. 4 we present the main results of our work, and discuss their relevance. In Sec. 5, we close with some remarks and a discussion about improvements and other possible applications.

Review of previous results on the sphere packing problem
For completeness, we present here a brief review of some results, originating both from the mathematical and physical literature, concerning upper and lower bounds on the density of the densest sphere packings. More detailed reviews can be found, respectively, in [1,24], and in [3,4,25].

Rigorous results
The sphere packing problem in physical dimensions d = 2 and 3 is an emblematic example in mathematics of how a seemingly obvious fact may require very elaborated concepts to be actually proven. While the optimal packing fraction in d = 3 can be easily guessed, as Kepler did, by stacking as close as possible triangular layers 1 , it took almost four centuries and a genuine tour de force to prove that the conjecture is indeed correct [27][28][29]. There are further reasons that make the sphere packing challenging. First, as soon as we move above d = 3 there appears a sensitive dependence of the optimal packings on the space dimensionality. Every dimension has its own geometric oddities and, unlike the most favorable case d 3, what is known about the densest packings in a certain dimension d is generally useless for other dimensions, even those very close to d. The recent proof for the densest packing in d = 8 [30] and its extension to d = 24 [31] are a lucky exception in this respect. Second, when d grows large the unusual geometrical features of high dimensional spaces become more pronounced [32]: the volume of the unit diameter sphere vanishes exponentially with respect to the volume of the unit hypercube (see Figure 1 for a simple illustration). The empty spaces become predominant and optimal packings in large dimensions cannot be very dense: in this asymptotic limit only upper and lower bounds are known (see [24,25] for a more detailed review). A "greedy" argument shows that the volume occupied by a saturated packing 2 must satisfy ϕ 2 −d . This lower bound, obtained by Minkowski in 1905, was improved by a linear d factor by Rogers in 1947 [33], and it has been difficult to do much better since then: Vance's 2011 bound is ϕ (6/e)d 2 −d (for d divisible by 4) [34]. Only very recently Venkatesh has dramatically improved on it by a constant of 65963 instead of 6/e, and by a ln(ln d) factor for a nongeneric/sparse set of dimensions [35]. While most of these lower bounds are non-constructive, very recently Moustrou [36] proved that packings with density ϕ 0.89 ln(ln d) d 2 −d can be constructed with exp(1.5 d ln d) binary operations.
One might also be interested in counting the number of possible packings, or better, their entropy. Lebowitz and Penrose in 1964 [37] have proven that the virial series expansion of the entropy is convergent for ϕ < 0.14467... 2 −d , providing in this regime the exact result for the excess entropy per particle (over the one of a Poisson process, or an ideal gas of non-interacting particles) in the Gibbs-Boltzmann uniform measure 3 : This result was extended to higher densities in [38], where it was shown that at packing fraction ϕ = ln(2/ √ 3) d 2 −d , the entropy density of the Gibbs measure is s ex −2 d ϕ. Note Figure 1: The four circle paradox [32]. A small circle of radius √ 2 − 1 can fit the space left by four circles of unit radius. One can repeat the same construction in the d-dimensional Euclidean space by using 2 d hyperspheres of unit radius that surround a hypersphere of radius √ d − 1. Above d = 4 the darker sphere grows larger than the pale ones. In fact, the volume of the room left by the 2 d touching spheres tends asymptotically to 2 d , as for large d the volume of a hypersphere becomes vanishingly small, V d (r) = π d/2 r d /Γ(d/2 + 1), and tends to concentrate on its boundary, that this bound is compatible with the validity of Eq. (1), and the missing factor 2 is likely due to technical details of the proof (see section 2.2).
It is important to emphasize that the known best upper bound [39] on the densest packing fraction scales as 2 −0.5990... d , and therefore its ratio with the best lower bound grows exponentially in d, thus leaving a huge uncertainty in the location of the optimal packing density for large d. Moreover, it is by no means obvious that for large d there exist "universal features" of optimal packings, and that they should necessarily have a lattice structure [25]. In physical language, lattice packings represent crystalline configurations, disordered packings would correspond to liquid or glassy configurations. Disordered packings in large space dimension are therefore very interesting and their study could shed further light on this problem.

Physics results
We will approach the sphere packing problem from a statistical physics perspective. In this context, hard-sphere systems have been used since long to describe the behaviour of gases, most notably by van der Waals in 1873 [40], whose equation of state provided the first example of a thermodynamic phase transition, by Metropolis et al. in 1953 [41], who introduced the famous "Metropolis algorithm" to study the liquid equation of state in d = 2, and by Alder and Wainwright in 1957 [42] who showed the existence of a first order phase transition between the liquid and the crystal states in d = 3. Subsequently, starting from Bernal in 1960 [43], hard spheres have also been used to model the geometric structure of denser phases, such as liquids or crystals.
More recently, physicists have formulated a series of conjectures on the large d behavior of the hard sphere liquid and glass phases, that according to physics standard are believed to be exact, although they have not yet been proven rigorously. Frisch et al. in 1985 [44] (see also [45][46][47]) have investigated the hard sphere fluid in the limit d → ∞ and concluded that its entropy is formally given by the second-virial equation of state, Eq. (1), up to densities as high as ϕ < (e/2) d/2 2 −d . However, a careful study of the liquid dynamics in d → ∞, initiated in [2] and later fully developed in [19,48], shows that the dynamics becomes arrested at a critical packing fraction ϕ d = 4.8067... d 2 −d [19]. Above this critical density, the liquid is a collection of metastable glassy phases, or in other words, the allowed hard sphere configurations are organised in disjoint clusters in phase space, precisely as it happens in satisfiability problems [14]. Based on this analogy and on the results of [15], it is believed that sampling uniformly configurations of N hard spheres takes a time increasing polynomially in N for ϕ < ϕ d , and exponentially in N for ϕ > ϕ d . One can use advanced statistical mechanics techniques (mostly the replica method, see [3,4] for a review of these results), to show that the glassy clusters of configurations keep existing up to a density scaling as ϕ GCP = (ln d)d 2 −d , above which the liquid phase cannot be adiabatically continued to any other meaningful phase. Above this density, either no packings exist, or a first order transition to a non-translationally invariant phase (most likely a crystal) takes place. Note that numerical simulations in spatial dimensions 3 to 12 partially support these conjectures [49,50].
For a related statistical mechanics approach to lattice packings in large d, see [51], and [25,52] for a different statistical mechanics approach to disordered packings.

Setup of the problem 3.1 High-dimensional formalism for pairwise interacting particles
As discussed in the Introduction, the Gibbs-Boltzmann uniform measure over hard sphere configurations can be biased by adding an arbitrary interaction potential to the hard core. From a physical point of view, even a microscopically small bias in the hard-core potential is generally known to induce large-scale non-trivial phase behaviours, such as water-like anomalies and multiple liquid phases [53][54][55][56], reentrancy and multiple glass phases [57][58][59][60][61][62], and instabilities towards disordered or periodic microphases [63][64][65].
In this paper we will restrict ourselves to pairwise isotropic interactions between particles, with a potential energy for two particles at distance r of the form 4 where v + (r) is an arbitrary finite function that converges to 0 faster than r −d−1 when r → ∞, in such a way that the second virial coefficient remains finite. In the limit d → ∞, the interaction potential should be scaled as [66,67] v where h represents the typical O(1/d) fluctuations of the liquid particles during their time evolution. Indeed, it turns out that the distance between neighbouring particles that are effectively interacting is the diameter of the spheres plus this fluctuation. Note that in statistical physics, the inverse temperature β = 1/T multiplies the potential in the Gibbs measure; here, to simplify the notations, we will include this factor in the definition of the potential (hence, our v(r) corresponds to βv(r) in usual notations). Finally, it is convenient to introduce a rescaled packing fraction ϕ according to with ϕ of order one in the dense liquid regime where the dynamical arrest happens [4,19]. With these definitions, it has been shown in [3,19,67] that for an arbitrary extra potential v(h): • The excess entropy of the liquid is given by • The dynamical transition density is defined by the condition with For ϕ < ϕ d , configurations of the Gibbs-Boltzmann measure can be sampled in polynomial time in N , above it, exponential time in N is needed.
• The value of ∆ for which the maximum of F[∆|v(h)] is attained corresponds to the long time limit of the mean square displacement in the dynamically arrested glass phase at ϕ d : where r i (t) is the position of particle i at time t, and the dynamics start in equilibrium at time t = 0 [67].
• The disordered phase exists up to the density ϕ GCP = (ln d)d 2 −d , independently of the biasing potential [3].
The results recalled in section 2.2 correspond to the pure hard-sphere case,v(h) = 0. One can then naturally ask the question whether a judicious choice of the biasing potentialv(h) could push the dynamical transition to higher values, thus facilitating the task of finding hard sphere configurations. This question has already been positively answered in the simplest case in whichv(h) is a square-well short-ranged attractive potential [66,68],  Figure 2 shows the dynamical transition line obtained for this potential, for a value of the attraction range σ 0 = 0.0293 at which the effect of the potential is the most pronounced. One sees indeed that ϕ d has a maximum, as a function of U 0 , that is significantly larger than its purely hard-sphere value (corresponding to U 0 = 0), causing a reentrance phenomenon between the liquid and glass phases.
This effect generally suggests that the introduction of a suitable additional interaction leads the ergodic liquid region to penetrate quite deeply in the nonergodic glass phase, thereby increasing the density at which one can realize hard core configurations (disordered packings), although with a biased (non-uniform) measure. It is interesting to note that the dense packings constructed in this way are quite robust against noise or statistical fluctuations due to their thermodynamic stability and their finite entropy density, according to Eq. (5).
In the following, we show that it is possible to extend further the re-entrance of the liquid region deeper in the glass phase by adding a suitable potential to the hard-core repulsion. This is determined by solving the inverse problem of maximizing the dynamical liquid-glass transition ϕ d , given by Eq. (6), over the set of possible interaction potentialsv(h). By doing so we find that the reentrancy tip can reach a packing fraction value near 7 d 2 −d . Interestingly, the liquid state with the highest density turns out to lie on a multicritical manifold along with several qualitatively different glass states, whose number is close to the number of lengthscales that enter in the definition of the interaction potential.

The numerical strategy
The implicit dependency of ϕ d onv(h) in Eq. (6) makes it difficult to approach the solution of the inverse problem by purely analytical means, so we turn to a numerical optimization algorithm. To handle this variational problem we consider a trial family of potentials, namely the piecewise constant ones with n constant steps, each depending on a pair of variables, the width σ i and the height −U i of the step 5 . Explicitly, we generalize the single-step (n = 1) potential given in Eq.
Increasing n enlarges the family of potentials on which the maximization of ϕ d is performed, hence yields better and better lower bounds on its optimal value, that should be reached in the formal limit n → ∞ that allows to describe arbitrary continuous potentials (as the optimal one is expected to be). There are several advantages in the choice of the variational ansatz presented in Eq. (10). As the widths of the steps are free parameters, unlike setting e.g. a fixed discretization grid, it accounts more easily for both slow and fast-varying regions of the potential: a slow-varying region (of small derivative) can be roughly described as a single step whereas fast-varying regions require a finer description including many steps. Furthermore, this ansatz is "rangeagnostic" in the sense that one does not have to make any assumption about the range of the potential, as the ansatz should adapt itself when converging to the densest potential. Finally, the function q(∆, y) in Eq. (7) is known analytically for this choice ofv(h), in terms of error functions which are readily and efficiently implemented numerically: while the expression of ∂q/∂∆ has a similar form with instead a sum of Gaussian functions. Hence there is no need for an additional numerical integration procedure to compute the function q(∆, y) for each choice of its parameters, which speeds up significantly the numerical study.
In the following we present our results for different values of n, i.e. the largest ϕ d obtained within this variational ansatz and the corresponding potentialv(h). As said above, increasing the number of steps leads to monotonically increasing packing fractions; however, we shall see that within our numerical accuracy no sensible improvement can be actually achieved by going beyond n = 4 or 5. The extrapolated packing fraction saturates at a value near 7 d 2 −d .
Depending on the number of steps n, hence on the number of free parameters 2n (the width and depth of each step) we have followed two numerical strategies to get the highest ϕ d . First, one may perform numerically an extensive search by computing ϕ d for all possible values of each of the 2n parameters of the potential, for a fine enough discretization of the parameter space. This has been done for both single-step and two-step ansätze. For more than 4 parameters the numerical computation turns out to be prohibitively long, and is therefore not suitable to investigate the problem in full generality. This strategy is thus mainly used as a benchmark for the next one at n = 1 and 2. Indeed, in a more systematic way, we can proceed with a gradient descent in the parameter space of the function where the gradient is computed over the 2n parameters definingv(h). With a slight abuse of terminology we shall call in the following "landscape" the function F; the gradient descent procedure relies on the assumption that this landscape is smooth and not riddled with local minima. Note from Eq. (6) that minima of F correspond to maxima of the packing fraction ϕ d .

Case of a single step
The case of a single step coincides with a previous study by two of us [66,68], where the phase diagram of the system around its glass transition was studied in detail. The landscape being two dimensional here, it is easy to visualize. Besides, one can perform a full sampling of the parameters' space due to the small number of parameters.
First, for U 0 < 0 (a single positive step), we have sampled numerically the landscape ϕ d (U 0 , σ 0 ) and it turns out to be monotonically decreasing (as a function of σ 0 for U 0 < 0 fixed) from the pure hard sphere case σ 0 = 0, for which ϕ HS d 4.8067. This potential is known in the colloids' literature as the square-shoulder potential [62]. The function F(∆|U 0 , σ 0 ) has a single maximum over ∆ [66,68], computing the gradient with respect to (U 0 , σ 0 ) at this point yields a simple descent that always converges towards (U 0 , σ 0 ) → (0, 0) when the initial condition satisfies U 0 < 0. The physical interpretation is rather straightforward: the single positive step acts as a strictly repulsive region, thereby increasing the distances between all spheres, thus lowering the overall packing fraction.
For a negative step (U 0 > 0), the potential gets an attractive behaviour at short distances, leaving hope for an increase of the packing fraction. It is more usually called a square-well potential. A fine sampling of the parameters' space is plotted in figure 3. The evolution of the packing fraction is rather smooth, except at one very sharp line (yellow in figure 3) where the global maximum lies. Close to this region, the function F(∆|U 0 , σ 0 ) develops a second maximum, while away from it, it has only a single maximum. This was already noted in [66,68], where the secondary maximum is interpreted as a metastable glass phase (named either attractive or repulsive depending on the respectively small or large value of the corresponding mean-square displacement within a cage, ∆). When there are several maxima of F, the dynamics selects the highest maximum, i.e. the dynamical transition occurs at the lowest possible density [69,70]. Interestingly, the very sharp line of high density is the transition line of the two glass phases (called hereafter glass-glass, or GG line), namely the line where both maxima have the same value, meaning that the fluid to glass transition is here a tricritical dynamical transition line separating the fluid phase and the two glass phases. An important consequence of the appearance of this GG line is that the gradient ∇F in terms of the potential parameters is discontinuous on this line: the line separates locally the phase diagram in two sides where the function F has two maxima. On one side the attractive maximum dominates while on the other it is the repulsive one that dominates. Therefore on each side the gradient is computed on the dominant maximum, resulting in a discontinuity on the GG line where the gradient must jump from one side's value to the other. The landscape is thus singular at this line.
The consequence on a gradient descent is then clear: a standard gradient descent of the parameters (U 0 , σ 0 ) will always reach some point on the line (U (l) 0 , σ (l) 0 ), depending on its initial condition. Yet, it will not be able to find the global maximum since once on the line, the discontinuity of the gradient will make the trajectory point (U 0 , σ 0 ) jump off the GG line, resulting in a decrease of the packing fraction. This emphasizes the need of an improved gradient descent able to follow the GG line, performing a gradient descent within it. Another crucial aspect of this critical region is that the packing fraction varies significantly close to the line, so that this line may be easily missed by a rougher landscape sampling (this phenomenon also shows up in Figure 2 where the tip of the fluid phase is very narrow). This is why a full sampling becomes quickly impractical as the number of steps n grows. In this n = 1 case, both the full sampling and the gradient descent agree on the global maximum, found within the GG line. The gradient descent has been performed in the improved way able to follow the GG line, described in the next section. The values of the best-packing potential are given in table 2. The plot of the potential is shown in figure 5.

The gradient descent
For higher values of n, we tackle the problem of finding the highest packing fraction ϕ d by performing a gradient descent on the function F(M ) = max ∆ F(∆|M ) of the potential parameters M = (U 0 , σ 0 , . . . , U n−1 , σ n−1 ), to find its absolute minimum (if possible). The principle of the standard gradient descent is that at a point in parameter space M , after having computed the location ∆ * (M ) of the global maximum of F(∆|M ), that we assume for now to be unique and which constitutes the value of ∆ in the following, one follows the best direction to minimize F. Expanding the best direction is locally given by δ ∝ −∇ F(M ), thus the gradient descent consists in updating M −→ M − ε∇ F(M ) which decreases the value of F for small enough ε. Iterating this strategy one ends up in a local minimum depending on the initial starting point, if the function is smooth. As one can see in figure 3, the function F varies rather on a logarithmic scale of the parameters, and it is faster to use a gradient descent in terms of the logarithm of the potential parameters, replacing M by ln M (the logarithm being taken componentwise). As mentioned in the last section, there are several difficulties associated to the standard gradient descent: • First, the function may have several local minima, and depending on the initial conditions of the descent, one may therefore not find the global minimum. Besides, performing the logarithmic gradient descent prevents any change of sign of the parameters during the evolution. This is dealt with by running gradient descents with many different initial conditions (including different signs of the U i ), see the algorithm in appendix A.
• Second, we empirically find that the descents always evolve towards manifolds with several maxima of F(∆) with equal value, as mentioned in the case of the n = 1 GG line (with only two maxima) in last section. On these GG manifolds, the gradient is discontinuous, rendering the gradient descent pointless. The way around is that when the gradient descent encounters such discontinuities, it must switch to a gradient descent within this manifold where the largest packing fractions are found, as described below.
The general idea is the following: locally the GG manifold can be defined as being orthogonal to some subspace. The components of the gradients in this subspace calculated in different maxima are different (which is at the origin of the discontinuity). However, because the gradient finds the best slope locally, its components within the manifold (locally an Euclidean subspace with as many dimensions as the number of maxima minus one, as shown in appendix B) correspond to the best slope within the manifold, which we are looking for. The optimal direction within the manifold as well as a summary of the algorithm are detailed in appendices A and B.

Numerical results for several steps
Numerical investigations were performed for n 6 steps. Let us now comment on each case.

Two steps
For two steps, a fine sampling of the parameters' space can still be achieved in a reasonably short time. This method agrees with the best-packing potential found within the gradient descent. Fixing one of the two steps around some point, a plot similar to figure 3 can be obtained, and shows again that there exist very sharp regions of maximal densities, that necessitate a fine sampling to be detected. In parallel, one can use the gradient descent method, first performing a loose sampling of the parameters' space from which 50 highdensity initial conditions are drawn. Here, the gradient descent has 4 outcomes depending on the sign of the two steps U 0 , U 1 at the initial condition. Note that this is due to the fact that the logarithmic gradient descent prevents changes of sign of the potential parameters, and there appears to be a single global maximum for each possibility. The data is displayed in table 1.
One sees from these results and the extensive sampling that the function F develops up to three local maxima. There exist regions with two maxima of F having the same value, and even three in another region. An important remark is that when these regions exist, the descent always converges towards a region with two maxima of same value, and then following the GG manifold, it either finds a maximal density and stops, or finds its way to the coexistence of three maxima of equal height, where it evolves within this new manifold and finds the secondary maximum of the packing fraction's landscape. We note that the n = 2 best packer has a negative first step and a small positive second step. Physically, this means the potential is attractive at short distances while it has some repulsive behaviour at larger distances. This second step has the effect of deepening the well created by the first one, leading to a better "trapping" of the neighbouring spheres. There is a trade-off in this second step between the improved stickiness at short distance and the larger-ranged repulsive effect that must remain small.

More than two steps
For n > 2 the extensive search is computationally too heavy. With the experience gained from the n = 1 and 2 cases, we now rely on the gradient descents only.
The trial potential of Eq. (10), for a given n, contains as special cases (when some σ i 's vanish, or when some U i = U i+1 ) all the potentials with a strictly smaller number of steps. As a consequence its landscape contain local minima arising from these limits, that have a higher value of F (smaller value of the packing fraction) than the global optimum for this value of n. It turns out that, indeed, most of the gradient descent runs from the initial conditions get stuck to potentials that have one very small step for example, or small variations from a n < n potential minimum (or secondary minimum). Nevertheless, several gradient descents for each n could escape from these and find strictly better potentials. The densest packings for each n are displayed in table 2.
The last column indicates the number of equal-height maxima of the function F.  Table 2: Best-packing potentials found by the gradient descent procedure as a function of the number of steps n. The last column indicates the number of equal-height maxima of the function F(∆) for the corresponding potential.
is n = 2, where the densest packing is not found in the region with 3 glass phases. A plot of F for the potential yielding the densest packing found at n = 4 is displayed in figure 4. The shape of the corresponding potentials is plotted in figure 5. Their qualitative features are quite similar: a sticky attraction at short distances that continuously turns into a repulsive behaviour at the largest length scales, arising from a positive "tail" of the potential. As n increases, they seem to converge to a limit shape. In terms of the packing fraction ϕ d at the dynamic transition, we see that already at these small values of n the enhancement quickly converges; the gain with respect to the n = 1 best packer is quite limited and becomes of the order of 10 −2 once n > 3. For these reasons, going to a higher number of steps is rather pointless. Besides, the computational complexity increases (for example, at n = 6 even the loose grid sampling becomes quite inefficient in order to find relevant starting points). Actually, this strategy of running several (≈ 50 − 70) gradient descents from an initial loose sampling of the landscape was only performed for n 5. In the n = 6 case, because the potential has already well converged onto a limit shape, we simply ran 5 gradient descents from an initial potential given by the best-packing n = 5 potential where one step was splitted into two new steps, as it is likely that the best n = 6 potential should be close in shape to the optimal one found with n = 5. v (h) Figure 5: Plot of the different best potentials found by the gradient descent for n = 1 to 6. Inset on the left is zooming on the small distances logarithmically, while the right inset focuses on the tail of the potential.

Summary of our results
In this paper, we have shown that one can endow hard-core spherical particles with a suitable extra soft interaction potential which allows to maximize the dynamical glass transition density ϕ d of the packing. The optimal shape of the potential turns out to display a short-range attraction and weak repulsive tail. This amounts to bias the sampling measure of dense packings with a non-uniform distribution. By doing so we find that in a space of large dimension d a set of disordered liquid-like hard-sphere configurations does exist up to a packing fraction of ϕ d ≈ 7d 2 −d (see table 2 for the precise values). This set of configurations is equilibrated and it is possible to show that the configurational entropy density, which counts the number of distinct glass basins, is strictly positive at ϕ d . Interestingly, the first property implies that these configurations are thermodynamically stable against random errors in their algorithmic construction, while the second implies that the number of distinct glassy states, and of jammed configurations that can be constructed by compressing them, is exponentially large in the number of particles. Besides, we have explicitly checked that for all potentials in table 2, the excess entropy is s ex (ϕ d ) < 0 from (5), which means that the second virial correction to the reduced pressure, dominant in large d and given by (βP/ρ) − 1 = −s ex (ϕ) [44][45][46] is positive and monotonously increasing with ϕ, thus forbidding an instability caused by phase separation. Because the relaxation time of the equilibrium liquid is finite for ϕ < ϕ d , sampling equilibrium liquid configurations of N particles requires a time of the order of N d, the number of degrees of freedom in the system. This is because each degree of freedom must be updated a finite number of times (either by Monte Carlo or Molecular Dynamics) before equilibrium is reached. If the system of N particles is confined in a box of linear size L and volume V = L d with periodic boundary conditions, the resulting packing can be periodically extended to the infinite Euclidean space, for which it would be a periodic packing of period L with N 1 particles in the unit cell. For this procedure to be well defined, however, the linear size L of the system must be at least L = 2 + O(1/d), in such a way that a particle cannot collide with itself due to the periodic boundary conditions. Hence, the minimal number of particles must be of the order of where we used the relation v d = 2 −d Ω d /d for the volume of a particle of diameter 1, the Stirling approximation for the solid angle Ω d ∼ exp[ d 2 ln(2πe/d)], and we neglected all subexponential corrections. Note that this value of N diverges with d, and because the finite size corrections vanish as 1/N in the liquid phase [49], in the large d limit the resulting system is effectively in the thermodynamic limit despite the fact that its linear size 6 is L = 2. The results of our calculations can therefore be applied in this regime. The time needed to sample efficiently the liquid configurations, which scales proportionally to dN , is given by 6 Note, however, that a d-dimensional cube of side L = 2 has 2 d−1 diagonals of length 2 √ d. Hence, its linear size along the diagonals is very large when d → ∞.
where again we only kept the exponential dependency in d. Our main result can thus be summarized as follows: One can construct sphere packings of packing fraction ϕ ≈ 7d 2 −d in a time scaling as in Eq. (14), by sampling equilibrium liquid configurations in a volume V = (2 + O(1/d)) d , using the pair potential given in table 2.
We believe this statement to be exact in the sense of theoretical physics, even if of course we did not provide a mathematically rigorous proof. A reasonable hope that such a proof could be reached in the future relies on the many results that have been first obtained by the same heuristic statistical mechanics techniques we used and then rigorously confirmed (see for instance [71][72][73] for the static properties of mean-field spin-glasses, and [74,75] for their dynamics). With respect to these works, which dealt with models with an intrisic mean-field character in their definition, an additional difficulty that would need to be overcome for a rigorous proof of the above statement is that the mean-field properties of interacting particle systems only arise when the large dimension limit is taken.

Comparison with related results
First, it should be noted that our result is very similar to the (rigorous) one of Moustrou [36], who proved that packings with density ϕ Because in practical applications one would never reach such high dimensions, we believe that our results remain of comparable interest to Moustrou's. Also, our method can produce an exponential number of distinct packings (the complexity, or configurational entropy, being extensive at the dynamic transition [3]), that would also be robust to thermal noise. Second, it should be noted that the equilibrium packings produced by our method have finite pressures, hence the gaps between particles are positive. Such packings can thus be further compressed out of equilibrium, leading to an increase of the packing density. In the pure hard sphere case, this brings the density from ϕ d 4.8067 d 2 −d to ϕ j,d ≈ 7.4 d 2 −d . The potentials introduced in this paper have significantly increased the dynamic transition ϕ d , yet preliminary results seem to indicate that the gain on the final density after compression is quite weak. This could be rationalized through the following observation: the radial distribution function of our liquid packings, simply given here by g(r) = e −v(r) [67,76], exhibits oscillations reminiscent of the ones found for jammed configurations of hard spheres in 3 d 6 [77,78]. The configurations we find here may well be quite close to jamming, although they belong to the equilibrated liquid with potential v(r). We also emphasize that the jamming packing fraction ϕ j,d obtained 7 by compression from ϕ d does depend on the potentialv(h), and more generally on the protocol used to compress the system [4], unlike the equilibrium SAT-UNSAT transition (or GCP).
Third, one should keep in mind that packings do exist up to at least ϕ = 65963 d 2 −d , according to the rigorous result of Venkatesh [35], and up to at least ϕ ∼ (ln d) d 2 −d , according to the non-rigorous results of [3]. However, it is possible that constructing these packings would take a much larger time than the one in Eq. (14).

Perspectives
There are several open problems to be explored. We now discuss some of them along with possible future directions of research.
1. Early work on the square-well interaction potential in infinite dimension has shown that our approach reproduces quite in detail the phase diagram of attractive colloids in finite dimension, previously found in the mode-coupling theory approach, numerical simulations and experiments [57][58][59][60][61]. The phase diagrams of these systems include non-trivial features such as reentrant behaviour, multiple glasses, and higher-order bifurcation singularity [66,68]. The question that naturally arises is whether this correspondence is far more general and can be extended to more complex interaction potential as those studied in the present paper. The extension of the numerical approach to simulations of hard sphere systems up to d = 13 [49,50] should be in principle straightforward and might be useful to clarify this important problem. This would also shed some light on the issue of the number of operations required to construct a dense packing, which appears still quite prohibitive as d increases. For some related works in this direction on soft-core models, see [79,80].
2. The actual design of an experimental soft-matter system interacting with a finitedimensional analog of our optimized potential (displayed in table 2 and figure 5) might be a challenge to experimental physicists. While the simple square-well potential (n = 1) can be seen as a limiting case of the entropy driven (Asakura-Oosawa) depletion force in a strongly asymmetric binary mixture, it is not obvious that something similar can be achieved for a generic n-step interaction potential. Nevertheless, focusing on the relatively simpler n = 2 or n = 3 potentials displayed in table 2 would be already much interesting. In fact, some cases of this type have been already addressed quite extensively at moderate packing fractions, and they are known to exhibit intriguing features such as liquid-like anomalies and multiple liquid phases [53][54][55], and instabilities producing disordered or periodic microphases [63][64][65]. Our work shows that complex glassy features should appear in these systems at higher packing fractions, provided that the extension to finite dimension holds true.
3. From our data one surmises that a two-body interaction potential with spherical symmetry and with n steps (wells or shoulders) should have n + 1 distinct glass phases, each of which is associated to a privileged packing arrangements over a suitable length scale dictated by the range of wells/shoulders and depths/heights entering the potential (notice that the +1 comes from the short range hard core repulsion). Therefore, we naturally expect that the interplay of multiple glass phases gives rise to several glassglass transitions and higher-order critical points, as they occur in simpler schematic models [81,82]. In fact, in our search of the optimum potential we find many such features (see Figure 4), although we did not study them in detail. From a dynamical point of view we believe that the high-density liquid phase should exhibit very special physical properties and novel relaxation patterns, in particular, near the tip of the reentrancy region, which is surrounded by several competing glass phases. For this reason one can arguably expect that the time correlation decay features a multi-step relaxation mechanism (as far as aging dynamics is concerned, that would possibly correspond to a multi-step effective temperature scenario after a subcritical quench).
While most of the above perspectives concerns applications in soft matter, from the point of view of packing the challenge of proving rigorously our results remain, of course, an extremely important one. 5. To optimize the update, a linear search is performed: the values of ϕ with the updated parameters are computed for several decades of ε (i.e. for several points in logarithmic scale on the line given by the chosen direction) and the highest value of the packing fraction is kept as the updated value in parameter space, which constitutes the new estimate of the parameters M . The algorithm ends if no value of the packing fractions computed within the linear search is higher than the one given by the parameters before update. This linear search is parallelized to reduce run time.
B The gradient descent within the transition manifolds between glass phases Here we provide more details on the direction the gradient descent algorithm has to choose when m 2 maxima of F(∆|M ) become equal. We recall that the point M = (U 0 , σ 0 , . . . , U n−1 , σ n−1 ) represents the potential parameters.
In the following we assume that the algorithm arrives close to a region where m 2 maxima of F(∆|M ) become equal. We note the value of F in each of the local maxima Note that, a priori, these functions have no singularities unlike the gradient of F(M ). By substracting any two equations we get that δM is locally (i.e. at first order) orthogonal to any difference of gradients ∇f . . , e m−1 ) is thus the local orthogonal subspace to the manifold in which we wish to perform the gradient descent. We can uniquely expand the gradients as where a k ∈ O(M ) ⊥ is tangent to the manifold L . Since δM · ∇f k (M ) = δM · a k the minimization of f k (M ) inside the manifold, from (17), locally follows the direction of −a k . Since the newly reached point M is also in L , all other maxima will have the same value, it is thus also a direction that decreases F. Next, we notice that ∀(i, j) ∈ 1, m − 1 2 , ∀δM ∈ O(M ) ⊥ , we have a i and a j also in O(M ) ⊥ , and δM · a i = δM · a j , hence actually 8 a i = a j . This means we can take any −a k to compute the correct direction. This direction is easy to compute since a k = ∇f k (M ) − m−1 i=1 λ k i e i . The coefficients are obtained solving a linear equation [83], projecting (18) onto each e j . One has where Q is the symmetric matrix of the overlaps 9 Q ij = e i · e j . 9 If the dimension of the vectors is 1, this matrix is singular. If not, which is the case here since the gradients live in a 2n-dimensional space, there is no a priori reason for a singularity.