Gravity from thermodynamics: optimal transport and negative effective dimensions

We prove an equivalence between the classical equations of motion governing vacuum gravity compactifications (and more general warped-product spacetimes) and a concavity property of entropy under time evolution. This is obtained by linking the theory of optimal transport to the Raychaudhuri equation in the internal space, where the warp factor introduces effective notions of curvature and (negative) internal dimension. When the Reduced Energy Condition is satisfied, concavity can be characterized in terms of the cosmological constant $\Lambda$; as a consequence, the masses of the spin-two Kaluza-Klein fields obey bounds in terms of $\Lambda$ alone. We show that some Cheeger bounds on the KK spectrum hold even without assuming synthetic Ricci lower bounds, in the large class of infinitesimally Hilbertian metric measure spaces, which includes D-brane and O-plane singularities. As an application, we show how some approximate string theory solutions in the literature achieve scale separation, and we construct a new explicit parametrically scale-separated AdS solution of M-theory supported by Casimir energy.


Introduction
The mathematical field of optimal transport was originally inspired by the concrete problem of how to best move a distribution of mass from one configuration to another. In recent years, this field has grown in various directions, incorporating ideas from Riemannian geometry and information theory.
In this paper, we apply ideas from this field to the physics of gravity, and in particular to its compactifications. At a technical level, these applications stem from the fact that the particular tensor (Ric N f ) mn := R mn − ∇ m ∇ n f + plays a role in both contexts. In optimal transport, the function f defines a measure ge f , and Ric N f controls the distortion of measures along Wasserstein geodesics which, as we will see, describe mathematically the optimal way to transport probability distributions. While n is the actual dimension of space, we will see that the number N ∈ ∪ {∞} will play the role of an effective dimension, for reasons related to the Raychaudhuri equation. In gravity compactifications, Ric N f appears in the internal Einstein equations: f is proportional to the warping function multiplying d-dimensional macroscopic spacetime, n is the internal dimension, and N = 2 − d.
The fact that the effective dimension N is often negative for compactifications might look unsettling at first. In an earlier paper [1] we had reorganized the equations of motion so as to use the N = ∞ limit Ric ∞ f , and exploted this fact to find applications of optimal transport to Kaluza-Klein masses (with some initial steps provided in [2]). However, recent mathematical work has shown that the N < 0 case also makes sense and is rich of geometric/analytic consequences [3][4][5][6][7][8]; in this paper we will see that it leads to cleaner and broader results.
Although our motivations come from the study of compactifications of higher dimensional gravitational theories, we stress that our results apply to general warped products in arbitrary number of dimensions, such as (warped) 1+n decompositions of static space-times.

Gravity and entropy
To a distribution of particles on a space M n , we can associate a probability density distribution ρ(x). If the total mass is m, the mass in a region of space U is m U dx gρ(x). One of the most intriguing results of optimal transport in curved spaces regards the behavior of the Shannon entropy for a distribution of particles that move geodesically. The second derivative of S with respect to time evolution turns out to be negative if and only if the ordinary Ricci tensor R mn is positive [9][10][11][12]. In other words, in this situation the entropy is concave as a function of time. 1 The Einstein equations now imply an inequality relating this second derivative to an integral of the stress-energy tensor. This inequality becomes in fact equivalent to the Einstein equations if we add the information that it can be saturated on delta-like distributions. 2 This striking result has been rigorously proved for the Lorentzian vacuum Einstein equations in [14,15]: [14] is focused on the Hawking-Penrose strong energy condition and time-like Ricci lower bounds, [15] treats general upper and lower time-like Ricci bounds and thus the Einstein equations. We rederive such an optimal transport characterization of Einstein's equations formally 3 in Sec. 4, using the tools of optimal transport we recall in Sec. 3.
Our description of this result in terms of d 2 S d t 2 is a bit of an oversimplification. In optimal transport one actually tends to focus on concavity rather than on the sign of d 2 S d t 2 , because it makes sense even when S is not smooth, thus allowing to include in the treatment also spaces with singularities. This is important for physics applications, as we will see below.
As we mentioned earlier, a natural modification in this context is to introduce a weighted measure ge f that differs from the standard Riemannian volume measure g. Concavity of the Shannon entropy now becomes equivalent to positivity of Ric ∞ f . It is also natural to study other notions of entropy considered in the literature. As we review in Sec. 5.1, a famous possibility is the Tsallis entropy [16] obtained by replacing one of the axioms characterizing the Shannon entropy (related to its extensive property) to a homogeneous property in terms of α. Under the choice α = 1 − 1/N , concavity of S α is related to positivity of Ric N f (cf. [17][18][19] for N ∈ (1, ∞), and [5] for N < 0). The aforementioned appearance of Ric N f in the internal Einstein equations suggests that they too might be reformulated in terms of generalized concavity properties for the Tsallis entropy S 1−1/N , with weighted measure ge f . We establish this new result at the formal level in Sec. 5.3, leaving a fully rigorous mathematical proof to a later publication [20]. In Sec. 5.2 we also show that the external Einstein equation can be reformulated in terms of the first derivative of the ordinary Shannon entropy.
This reformulation also provides a rigorous mathematical definition of the low-energy Einstein equations for certain classes of singular space-times where the standard analytical geometrical definition breaks down. In Sec. 6 we showcase some of the advantages of this approach by proving rigorous theorems about the masses of the spin-two fluctuations around backgrounds that include localized classical sources, such as Dp-brane singularities in supergravity. This suggests that this mathematical definition agrees at least partially with the UV completion of classical supergravity provided by string theory.
In physics, it is more customary to define an entropy by integrating a probability distribution in phase space. Our integrals over M n are entropies in the more general sense of information theory: they parameterize our ignorance about the position of particles that propagate geodesically on M n . When the latter is the internal space of a compactification, our entropy measures the ignorance of a low-dimensional observer regarding a particle distribution along the internal dimensions.
There is of course a long history of connections between gravity and thermodynamical ideas, starting with black hole physics. A famous argument derives the Einstein equations from the assumption that the entropy is proportional to the area of any local Rindler horizon [21]. A later argument derived them by using the Ryu-Takayanagi formula [22] for holographic entanglement entropy [23]. An even more ambitious idea views gravity as an entropic force [24]. In contrast, we stress that our reformulation only uses classical physics of probe particles in free fall (that is, only subject to gravity). Nevertheless, it would be very interesting to investigate any relationship with those earlier results.

Bounds on KK masses and separation of scales
A notable and frequent application of global inequalities on the Ricci tensor is to obtain bounds on the eigenvalues of various geometrical operators, such as the Laplace-Beltrami. In gravity compactifications, the masses of Kaluza-Klein (KK) fields are also obtained as eigenvalues of geometrical operators. While unfortunately there is no general expression for those (other than in simple classes such as Freund-Rubin), an exception is the tower of spin-two masses, which is obtained from a version of Laplace-Beltrami weighted by the warping function (which is in turn proportional to the f above).
In our reformulation of the Einstein equations, the second derivative of the entropy d 2 S d t 2 is related to an integral of a certain combination of the internal and external stress-energy tensor. This combination also appeared in [2], where it was shown to be positive for many common forms of matter; this was dubbed there the Reduced Energy Condition (REC).
This observation was already enough in [1,2] to prove several upper and lower bounds on the spin-two masses. However, those results were using the N = ∞ effective dimension, and as a consequence many of the resulting bounds depended on the upper bound sup|dA| of the gradient of the warping function A. In some situations this can get large, and make the bounds less useful. The inequality we obtain here in terms of negative effective dimensions is much simpler: ( with f = (D − 2)A . This makes no reference to the warping, and in turn improves the bounds on the spin-two KK masses. For example, we exploit and generalize a result in the literature [25] to show rigorously that in general the smallest mass satisfies (Th. 6.16): where the diameter diam is the largest distance between any two internal points, and α(diam −K) > 0 is a constant that only depends on the product diam −K, where K < 0 is a lower bound on the N -Ricci curvature. When K = Λ = (1 − d)/L 2 AdS as in (2), diam −K ∝ diam/L AdS . For the applications, an important property of such a constant α is the limiting behaviour (Rem. 6.18): In particular, this proves that in any warped compactification with matter content that satisfies the REC there is a large hierarchy between m 1 and the scale of the cosmological constant when diam/L AdS is small. The lower bound (3) is of course intuitive (at least at the qualitative level), but here we are providing a precise statement with a general rigorous argument, valid for any higher-dimensional gravity with an Einstein-Hilbert kinetic term. Optimal transport plays a key role in the proof of Th. 6.16, based on the so-called "L 1 -localization method": the basic idea is that using L 1 -optimal transport (i.e. optimal transport with cost function given by the distance function), it is possible to partition the (possibly singular) space X (up to a set of measure zero) into geodesics {X α } α , each X α being endowed with a Borel nonnegative measure m α , and to reduce the proof of the desired inequality (3) in X to proving a family of corresponding inequalities on the 1-dimensional weighted spaces (X α , m α ). Such a dimension reduction argument is very powerful, it has its roots in [26], it was formalised in highly symmetric spaces in [27,28] by using iterative bisections, and was then developed via optimal transport tools for smooth Riemannian manifolds in [29] and for (possibly nonsmooth) metric measure spaces satisfying synthetic Ricci lower bounds and dimensional upper bounds in [30].
As we mentioned above, optimal transport can handle certain singularities, called RCD spaces, by focusing on concavity properties for S rather than on its second derivative [17,18,[31][32][33][34]; it turns out that this applies to the famous string theory objects called D-branes, as was checked in [1] for the N ∈ (n, ∞] case and extended here to N < 0 (Sec. 6.4). As mentioned above, the advantage of considering negative N here is that it allows for a neat control of the weighted N -Ricci curvature lower bound (2). So our proof of (3) applies to compactifications with brane singularities as well.
Some interesting compactifications of string theory also contain a type of source called O-plane. Unfortunately these turn out to be outside the RCD class, as we rigorously show in Sec. 6.4.1. 4 To handle them, we consider the broader class of infinitesimally Hilbertian metric measure spaces [32,33]. We are able to show that some of the KK lower bounds appearing in [1] are also valid in this larger class, and hence apply to compactifications with O-plane singularities as well. In particular, we obtain (Thm. 6.6) where h 1 is the so-called Cheeger constant of (M n , f ), which is small when the weighted manifold has a 'neck' almost separating it in two pieces (as reviewed at length in [1]). We also prove some higher order generalization of the previous result, obtaining bounds on the whole tower of spin-two masses (Thm. 6.7 and Thm. 6.8).
While the mathematical study of the N < 0 case is developed enough for us to obtain the results presented here, it is at present not as mature as the N > 0 and N = ∞ cases. Because of this, in this paper we have not improved the upper bound on m 2 1 mentioned in [1, Th. 4.2] and first [35,Cor. 1.2], itself a generalization of the so-called Buser inequality. We hope to return to this in the future, as mathematical techniques improve further.
We end the paper in Sec. 7 with some considerations about the problem of scale separation. First we discuss how the bounds (3) and (4) on m 1 can be used to show that this mass is much larger than |Λ| for certain approximate string theory vacua [36,37]. Second, we show how a simple violation of the REC, Casimir energy, can lead to such a scale separation as well, by constructing a AdS 4 × T 7 solution with a parametric hierarchy between the KK modes and the scale of the cosmological constant. This background appears to be in tension with conjectures in the literature discussing the behavior of the KK spectrum as |Λ| → 0, inviting further study.

Equations of motion and weighted Ricci tensor
We start by showing how the equations of motion for general warped-product space-times are naturally organized in terms of a generalization of the Ricci curvature tensor that better captures the geometrical properties of the space when the warping is non-trivial. The results in this section are partially based on the analysis of the equations of motion performed in [2,Sec. 2], to which we refer the reader for more details.

Effective curvature and dimension
Our setup consists of any possible gravitational theory that at low energy reduces to Ddimensional Einstein gravity, with some prescribed matter content. In particular, our analysis applies to compactifications of string and M-theory but it is not restricted to those.
We normalize the Einstein-Hilbert term in the action as S EH = 1 In such a theory, the Einstein equations for the D-dimensional metric can be written as where T M N := − 2 −g δS mat δg M N is the stress-energy tensor of the D-dimensional theory. We are interested in studying general, possibly warped, d-dimensional vacuum compactifications. That is, the D-dimensional space-time has the form 5 The warping function A only varies over the n-dimensional internal space; ds 2 d is a maximallysymmetric space, with curvature normalized as R (d) µν = Λg (d) µν . Plugging (6) in (5), and specializing to external and internal directions, we get two sets of equations, which can be combined and re-organized as where we have defined the combinationŝ Equation (8) highlights a particular combination of the Ricci tensor and the derivatives of the warping. For a given N ∈ we can define the N -Bakry-Émery Ricci tensor and in terms of this object the equations of motion take the simple-looking form where At this stage, the definition in (10) might appear purely algebraic and far from any geometrical meaning. However, crucially for the rest of our analysis, this generalization of the Ricci curvature tensor has already been considered and extensively studied in the literature of optimal transport, where (10) has been shown to be the notion of curvature that captures the analytic/geometric properties of weighted Riemannian manifolds with an effective notion of dimension N < 0 [3][4][5][6]. We will explore this more in detail in Sec. 3.4. Finally, we stress that, even though our main motivation is the study of vacuum compactifications, all our analysis and results apply to any space-time that can be written in the form (6), including, for example, 1 + n splittings of static space-times.

Reduced energy condition and Ricci lower bounds
From (11b) we see that lower bounds onT mn directly translate to lower bounds on (Ric ) mn , which we can then exploit to derive constraints on physical properties of vacuum compactifications.
At first sight, the peculiar combination of stress energy tensors appearing in the definition (9) forT mn can seem hard to estimate in general, as it might depend on the details of the energy sources. However, in [2] it has been noticed that for a large class of matter content 5 We use upper case Latin letters to denote D-dimensional indices, lower-case Greek letters for indices along the directions of the d-dimensional vacuum and lower-case Latin letters for indices in the n-dimensional internal space, with n = D − d. Also, compared to references [1,2] we are suppressing here the bar on top of g n i.e.ḡ there it is actually non-negative. This condition, being like an energy condition naturally emerging from reducing the theory, has been named Reduced Energy Condition (REC): 6 More precisely, [2] has shown that the REC is satisfied by higher dimensional scalar fields, general p-form fluxes (including 0-forms such as the Romans mass in type mIIA) and localized sources with positive tension. Moreover, one can easily check that any potential for a collection fields ϕ i of the form independent of the metric, does not affect the REC since its contribution cancels from the combination (13).
When the REC is satisfied by the matter content of the D-dimensional theory, we have a simple lower bound for the synthetic curvature: which we can exploit to bound physical properties of gravity compactifications in terms of Λ, as we will do in Sec. 6 where we bound the masses of spin-two Kaluza-Klein fluctuations around general vacua that satisfy the REC. However, many interesting physical sources violate the REC, such as O-planes in string theory or quantum effects. In Sec. 7 we will analyze explicit examples in which these sources allow the construction of scale-separated solutions, i.e. solutions for which the masses of the Kaluza-Klein modes are parametrically larger than |Λ|.

Optimal transport
As we anticipated, the tensor (10) appearing in the equations of motion has a natural interpretation in the field of optimal transport. In this section we will give a brief review of some aspects of this field that are important for the rest of the paper. In particular we will show why (10) is a natural combination, and in what sense N can be considered an effective dimension. In this section we will mostly consider smooth (weighted) spaces, while the non-smooth case will be treated later in Section 6.

Probability and optimal transport
Take a distribution of probe particles on a space X that at time t = 0 has a certain shape µ 0 (x). We can think of it as an actual mass distribution of many particles or as a probability distribution for a single particle; in both cases we normalize X µ 0 (x) = 1. Assume that on X there is a notion of distance and that each bit of the distribution starts moving along a geodesic. The initial shape µ 0 is thus being distorted, and we call µ(x, t) the probability distribution at t ⩾ 0, with µ(x, 0) := µ 0 (x). Since the individual bits of mass are moving along geodesics, certainly µ(x, t) has some information about the geometry of X ; we may wonder if it knows enough, for example to give information about the curvature of X . The answer turns out to be affirmative: specifying the evolution of the relative entropy S[µ] of µ with respect to the volume form on X is equivalent to prescribing the Ricci tensor. This observation can be then exploited to encode all the Einstein equations in an evolution equation for S[µ]; cf. [14,15].
To explicitly derive and state this equivalence we need tools to handle the evolution of probability distributions. Luckily, these have been extensively developed in the context of optimal transport theory. Most of the informal discussion below is based on [19,Chapp. 6,14,15,16]: we adopt the formalism introduced by Otto [ To make contact with this framework, assume we are on a more general metric space (X , d) and that we are given the task of moving mass on X in order to morph an initial distribution µ 0 (x) into a distribution µ 1 ( y), while minimizing the total cost, when the cost of moving one bit of mass from x to y is given by the squared distance d(x, y) 2 . This problem induces a distance on the space P 2 (X ) of Borel probability measures over X with finite second moment, called 2-Kantorovich-Wasserstein, or Wasserstein distance for short: where a coupling is a probability distribution π on X × X whose marginals X d y π(x, y) and X dx π(x, y) are respectively equal to µ 0 (x) and µ 1 ( y). These requirements impose that all the mass that is going to y comes from µ 0 and that all the mass moved out from x goes into µ 1 , i.e. no mass has been lost or created in the process.
When X is an n-dimensional Riemannian manifold equipped with a metric g, which induces the distance d and an invariant measure gdx n , we can formally think of P 2 (X ) as an infinite dimensional Riemannian manifold equipped with a scalar product that induces the distance (15). We can write this metric in terms of its action on tangent vectorsμ on P 2 (X ), which are characterized as follows. Since the total mass is being preserved in this process, a continuity equation holds on X , and we can specifyμ at a given time t in terms of a vector field ξ(·, t) ∈ T (X ) asμ The time-dependent vector field ξ(x, t), which describes the direction on X along which the bit of mass at x is moving at time t, can be written as the gradient of a real function η(·, t).
With this definition, it can be shown that the Wasserstein distance can be represented as Thus, given two tangent vectors (on P 2 (X )) at µ,μ 1 andμ 2 , their scalar product is This defines our formal Riemannian metric on P 2 (X ). Now that we have tangent vectors and a Riemannian metric on P 2 (X ), we can ask when the curve µ(·, t) : [0, 1] → P 2 (X ) is a geodesic with respect to the metric (18), i.e. when it locally minimizes the distance (15). The answer is well-known: such geodesics are characterized in terms of solutions of the Hamilton-Jacobi equation on X . Specifically, µ(x, t) describes a geodesics in P 2 (X ) ifη with ξ and η defined fromμ through the continuity equation (16). In Appendix C we show how (19) implies that each bit of mass composing µ moves along a geodesics on X .

Derivatives of functionals in Wasserstein space
Equipped with the formal machinery developed in the previous section, we are now ready to compute the derivative of functionals on X along geodesics on P 2 (X ). From now on, we will perform formal computations specializing to the cases in which X is an n-dimensional Riemannian manifold equipped with a metric g, which induces the distance d and an invariant measure gdx n . We will return to singular spaces in Sec. 6, where we show that one can rigorously take into account the classical singular backreactions of physical sources. Given a probability distribution µ, we define its density ρ as and represent a generic functional F as 7 When µ changes in time, so will F[µ], and an explicit computation reveals that along the curve described by the continuity equation (16) its rate of change is where we have defined P(ρ) := ρF ′ (ρ) − F (ρ). We can go further and compute the second derivative along the curve µ(·, t), taking into account that this curve is a geodesic on P 2 (X ) and thus imposing (19). Doing so we get where we defined P 2 (ρ) := ρP ′ (ρ) − P(ρ) and ∆ := −∇ 2 . More details on the derivations of (22) and (23) can be found in App. B. We can simplify the quantity in the brackets by using the Bochner formula In Appendix A we review how (24) is a close relative of the Raychaudhuri equation on X , which connects the behavior of families of geodesics to the Ricci curvature. Plugging it in (23) we can finally write Equations (22) and (25) are the main ingredients we need to rewrite the Einstein equations in terms of derivatives of an entropy. We conclude this section by noticing that knowledge of time derivatives of function(als) along geodesics can be used to extract their spatial derivatives, i.e. gradients and Hessians. Indeed, consider first the finite-dimensional case of a function F : M → , with M a smooth manifold. We can extract the gradient of F at x 0 along the direction ξ 0 by evaluating the derivative of F along a curve that at t = 0 passes through x 0 with tangent vector ξ 0 . Indeed, d dt F (x(t)) = 〈ẋ i , gradF 〉 g , where gradF is the gradient vector field of F . Evaluating it at t = 0 results in the expression d dt t=0 F (x(t)) = 〈ξ 0 , gradF (x 0 )〉 g .
In a similar way we can extract the Hessian. This time we need the second derivative of F along x(t), when the latter describes a geodesic on M . This gives Formally, the same relations are true in the Wasserstein space P 2 (X ), so that (22) and (25) evaluated at t = 0 represent the gradient and the Hessian of F at µ along the directionμ.

Weighted measures
The discussion of the previous section can be generalized to the case in which the Riemannian volume form is weighted by a positive function. In this situation the measure that equips our metric-measure space is a more general e f g. 8 Given a probability distribution µ on X it is then more natural to define its density ρ with respect to the weighted volume form as We can then represent a generic functional F as When µ changes in time according to the continuity equation (16) the derivative of F is given by d dt which differs from the unweighted case (22) by the fact that the integral is weighted and the Laplacian is replaced by the weighted Laplacian Equation (30) and the ones that follow are derived in App. B. Taking another derivative and using (19) to evaluate the resulting expression along geodesics in Wasserstein space, after some manipulations we get: To simplify the term in square brackets in (32), we need the weighted analogue of the Bochner equation (24): All in all, in the weighted case, for the second derivative of a generic functional of the form (29) along a geodesic in Wasserstein space we have: In the next section we will obtain a physical picture of the Bochner identities by relating them to Raychaudhuri equations and highlighting the effect of the weight function in introducing an effective notion of curvature of dimension.

Effective dimension
Let us now focus on the term ∇ m ∇ n η∇ m ∇ n η = ∇ m ξ n ∇ m ξ n appearing in both (25) and (34). As we noticed already, the origin of these terms is from the Bochner or the related Raychaudhuri equations (App. A). We can bound this term by using the inequality which follows from the Cauchy-Schwarz inequality by considering the inner product of M and 1 n in the space of n-dimensional matrices. In particular we get Using this, the Raychaudhuri equation becomes where recall θ = ∇ m ξ m is the expansion. In many physics applications, actually the bound is even more stringent. The matrix M mn = ∇ m ∇ n η = ∇ m ξ n can have rank r < n; we can apply Cauchy-Schwarz to M and the projector orthogonal to kerM , which results in in both (35) and (37). For example, in Lorentz signature, for timelike geodesics the matrix ∇ m ξ n is orthogonal to ξ itself, so it has rank r = n − 1. For lightlike geodesics, r = n − 2.
We can achieve an even more dramatic change in dimension. By using the identity with For N < 0 the right-hand side is positive. Combining this information with (36), we can bound the expression appearing in the first line of (34) and in the right-hand side of the weighted Bochner identity (33) as follows: The first term on the right-hand side is (Ric N f ) mn as defined in (10), thus explaining its relevance in optimal transport. In particular the weighted Raychaudhuri (A.11) now implies with θ f := −d † f ξ as defined in (A.10). Comparing with (37), we see that the dimension n has now been replaced by N , which can thus be thought of as an effective dimension.
In other words, N plays the role usually played by d − 1 = 3 for massive geodesics or d −2 = 2 for massless geodesics in applications of the Raychaudhuri equation to d = 4 general relativity.

Shannon entropy and Einstein equations
The Einstein equations can be equivalently rewritten in terms of concavity properties of appropriate entropy functionals S defined on space-time: with saturated inequality in the limit for measures concentrated towards Dirac deltas.
To show the equivalence, we apply the methods in Sec. 4.1. We will first consider 1+n spacetimes and unwarped compactifications, where (41) will take the form of Theorem 4.1. We then review in Sec. 4.2 the more general Lorentzian case (Th. 4.2), before addressing general warped compactifications in Sec. 5.

Time+space and unwarped compactifications
The analysis that follows is a formal re-derivation of the results rigorously proved in [10][11][12] (respectively in [41]) regarding the optimal transport characterization of lower (resp. upper) bounds on the Ricci curvature for smooth Riemannian manifolds. Given a probability distribution µ with density ρ as defined in (20), we can compute its Shannon entropy where γ is a normalization constant. Def. (42) can also be interpreted as relative entropy between µ and the uniform distribution on X , where "uniform" has to be defined with respect to the volume to have a coordinate-independent meaning. We will use these two denominations for the entropy interchangeably. In any case, (42) measures how spread out µ is compared to g. Indeed, (42) reaches its maximum for the uniform distribution while approaching −∞ for a very localized µ approaching a delta distribution.
Specializing (25) to F = −ρ ln ρ, we then have an expression for the time evolution of S: which we will use to obtain the Einstein equations from this notion of entropy. While the discussion so far focused on the Riemannian case, where particles are transported along Riemannian geodesics, and thus it does not describe general gravitational systems, it is nevertheless sufficient to completely characterize the Einstein equations for Ddimensional product space-times of the form M D = M d × X n , with product metrics where M d is a d-dimensional vacuum (AdS d , Mink d , dS d ) with cosmological constant Λ and X n is an n-dimensional space. Here d ⩾ 1, with the d = 1 case corresponding to an 1 + n decomposition of the D-dimensional space-time: In situations like these, the Riemannian geodesics on X n can immediately be lifted to geodesics on M D , either massive or massless, upon an appropriate identification between the "time" coordinate along the Riemannian geodesic and a local time-coordinate on M d . The Riemannian formalism developed so far thus applies directly to massive and massless particles on product space-times, and we use this simplified scenario as a first illustration of the how the Einstein equations follow from entropy concavity, before reviewing the general Lorentzian case in Sec. 4.2 and the extending to general warped products in Sec. 5.
The D-dimensional Einstein equations specialized to space-times of the form (44) are (7) (8) with f = 0, which read: where R mn is the Ricci tensor on X n . Equation (46) determines the d-dimensional cosmological constant, and our goal is to obtain (47) as a concavity equation for S. If we think of a situation like (44) as a compactification on X n (or a more general reduction of the higher dimensional gravitational theory), S has a natural interpretation as a quantification of the ignorance of a lower-dimensional observer about the internal degrees of freedom. Indeed, classically, a d-dimensional observer can localize a D-dimensional particle approximately as a point on M d , but they cannot do the same on X n if the Kaluza-Klein scale is much smaller than the energies they are able to probe; they will describe such a particle in terms of a probability distribution µ on X n , with S quantifying their uncertainty about the internal position. Similarly, not being able to measure the masses of the KK excitation beyond the compactification scale, a lower-dimensional observer can reconstruct a higher-dimensional scalar field only up to a probability distribution in the internal space.
Crucially, the lower-dimensional observer need not to be aware of the gravitational nature of the sector they cannot probe to be able to characterize it completely. Indeed, the internal Einstein equations can be traded completely for an evolution equation for the information the lower dimensional observer has about the system, as in the following Theorem 4.1. Let (X , g) be a smooth Riemannian manifold. The following statements are equivalent:

The metric g on X satisfies the equation of motion
2. i) For any probability distribution µ on X , evolving along a geodesic in the space P 2 (X ) of probability distributions, with tangent vectorμ = −∇ · (µ∇η), its Shannon entropy (42)(the relative entropy between µ and the volume form of g) satisfies ii) In addition, the inequality (49) becomes saturated whenever µ is concentrated at a point, and for a suitably chosen η. Namely, for any point x 0 ∈ X and any tangent vector ξ 0 at x 0 , there exists an η such that ∇η| x 0 = ξ 0 and such that (49) becomes an equality asymptotically for distributions µ very localized at x 0 . More precisely, for every point x 0 ∈ X there exists a function ω with lim ϵ→0 ω(ϵ) = 0 such that the following holds: for any tangent vector ξ 0 at x 0 of unit norm ∥ξ 0 ∥ = 1, there exists a smooth function η with ∇η| x 0 = ξ 0 such that for every probability measure µ supported in B ϵ (x 0 ).
Proof. 1 ⇒ 2: We plug (48) in (43), obtaining Then i) follows from the fact that ∇ m ∇ n η∇ m ∇ n η is non-negative. For ii), in the limit where (50) localizes at x = x 0 ; using Lemma B.1 with f = 0 the second term vanishes and we get the result. 2 ⇒ 1: Combining (49) and (43) we obtain where we wrote (48) as (51) localizes at x 0 . For an arbitrary tangent vector ξ 0 at x 0 , using Lemma B.1 we then get E mn ξ m 0 ξ n 0 ⩾ 0. Since x 0 and ξ 0 are arbitrary this implies Then, since by hypothesis for µ localized at any x 0 the inequality can be saturated by a certain η such that ∇η(x 0 ) = ξ 0 , with arbitrary ξ 0 , for such a choice of η (51) implies But from (52) both terms are non-negative, so for the equality to hold they have to vanish independently. Arbitrariness of x 0 and ξ 0 then ensures E mn = 0.

Einstein's equations in Lorentzian manifolds
In this section we show how also in Lorentzian signature the Einstein equations can be rewritten in terms of concavity properties of entropy functionals, characterizing in this way the whole-space time (and not just the internal part, as in vacuum compactifications). The analysis that follows is a formal re-derivation of the results rigorously proved in [14,15]. On the whole D-dimensional Lorentzian space-time, 9 we seek to reproduce the Einstein equations, in the form (5) from a concavity property of an appropriate notion of entropy for test particles. Since there is no analogue of warping, it is natural to guess that the relevant quantity will be the Shannon entropy, similarly to the unwarped Riemannian analysis of Sec. 4.1. This guess will turn out to be correct, but an important difference due to the signature will arise in the need to restrict the transport only along physical geodesics. In the following we will focus on the massive (time-like) case. In addition, even in this class, it is not guaranteed the squares of tensors appearing in the expression have definite sign (so that they can be discarded in the derivation of inequalities) and this technical difference will require us to carefully define the transport by switching to a more general non-linear framework. Let us see in practice how this works. Given a time-like curve γ : where p ∈ (0, 1). The cost of moving a particle from x to y is then defined to be The minus sign in (55) is introduced so that the cost (56) can still be formulated as a minimization problem. This is just as for the usual particle action in curved space, where MγN 1/2 , which would be recovered for p → 1. Albeit here we are restricting only to p ∈ (0, 1), since for these values of p the A p actions will have good convexity properties that we can exploit in our derivations, this technical choice does not change the physical picture. Indeed the extremizers of A p for p ∈ (0, 1) coincide with the extremizers of A 1 = −τ parametrized such that the tangent vector along a geodesic is parallel transported. This is similar to how in the Riemannian case extremizers of the energy functional E[γ] := 1 0 |γ| 2 coincide with extremizers of the length functional L[γ] := 1 0 |γ| for a preferred parametrization of the coordinate along γ.
As in the definition of Wasserstein distance (15), we can lift the notion of cost (56) for moving massive particles in the space-time to a notion of cost for moving distributions of massive particles, by defining the family of functionals where, as in the Riemannian case (15), a coupling is a probability distribution π on M × M whose marginals are equal to µ 0 and µ 1 , which are Borel probability measures with compact support, µ 0 , µ 1 ∈ P c (M ) in short. Notice that (−C p (µ 0 , µ 1 )) 1/p is non-negative and satisfies a reverse triangle inequality; thus, in a broad sense, it is lifting the Lorentzian distance from M to P c (M ). This kind of p-Lorentz-Wasserstein distances have been studied in [14,15,[42][43][44].
The non-linearities introduced by the choice p ̸ = 1 will enter in the various expressions governing the evolution of a generic probability distribution µ through a non-linear redefinition of the gradient. Specifically, in terms of the conjugate exponent q to p: we define the q-gradient of a function h with time-like gradient as: Then, the continuity (16) and geodesic (19) equations are modified, respectively, aṡ With these tools we can now compute derivatives of functionals along massive geodesics. The derivation is technically more involved as a consequence of the non-linearity, and we quickly sketch the relevant formulas in App. D. Given a probability distribution of massive particles µ on a space-time M , we define its Shannon entropy to be This is a measure of how much the distribution µ is spread in space-time, compared to the uniform distribution −g. Using the formulas in App. D, we obtain for its second derivative along time-like geodesics the expression The main difference compared to the Riemannian formula (43) is the appearance of the nonlinear q-gradients. Equipped with formula (62) we can now characterize the Einstein equation as in the following ii) In addition, the inequality (64) becomes saturated whenever µ is concentrated at a point, and for a suitably chosen η. Namely, for any point x 0 ∈ M and any time-like tangent vector ξ 0 at x 0 , there exists an η such that ∇ q η| x 0 = ξ 0 and such that (64) becomes an equality asymptotically for distributions µ very localized at x 0 .
Proof. The proof closely follows the one of the Riemannian theorem 4.1, and we highlight here only the differences. An important fact we used to prove both implications in the Riemannian case is that the quantity ∇ m η∇ n η∇ m η∇ n η appearing in the integrand of the second derivative of the entropy was manifestly non-negative. In the Lorentzian expression (62) this is replaced by ∇ M ∇ q N η∇ M ∇ N q η, which is not immediately so. However, using Lemma D.1 this term is non-negative for q < 1, and so in particular for p ∈ (0, 1). Moreover, a Lorentzian counterpart of (the Riemannian) Lemma B.1 holds (see [15,

Tsallis entropy and warped compactifications
We are now ready to describe one of our main results: the reformulation of the equations (11) for warped compactifications in terms of optimal transport. For this we will need the notion of Tsallis entropy, which as we review in section 5.1 is a natural generalization of the more usual Shannon one. In section 5.2 we show how to reformulate (11a) in terms of a relative entropy, while in section 5.3 we show how (11b) can be reformulated in terms of Tsallis entropy, along the lines of the previous section. 10

Various definitions of entropy
We have already used the definition (42) of Shannon entropy S associated to a probability density ρ. This is famously related to the Gibbs entropy, to which it reduces when ρ is defined on phase space. However, it is natural to wonder what properties single out S among the possible functionals of the form (21).
A set of such properties was provided by Khinchin [45] and Faddeev [46] for the case of probability distributions p = (p 1 , . . . , p n ) on finite spaces of any cardinality n. Consider a function S(p) = S(p 1 , . . . , p n ). If we demand that 2. (Symmetry) S is a completely symmetric function of its entries (i.e. it remains the same if any two of the p i are exchanged); then it can be shown that S is proportional to the discrete Shannon entropy The constant γ can be fixed by also demanding a normalization, such as S(1/2, 1/2) = 1.
The last property implies the more general S(p 1 q 11 , . . . , p 1 q 1k 1 , p 2 q 21 , . . . , p 2 q 2k 2 , . . . , p n q n1 , . . . , p n q nk n ) The particular case q i j = q j yields the property that the entropy of a direct product of two probability distributions, which describes two independent events, is the sum of the entropy for the two events. This is the usual extensivity property. Symbolically we can write where we defined p × q = (p 1 q 1 , . . . , p 1 q k , p 2 q 1 , . . . , p 2 q k , . . . , p n q 1 , . . . , p n q k ).
The idea of the proof that the axioms above lead to (65) is that S(p 1 , . . . , p n ) can be reduced to the n = 2 case using the Additivity axiom.
(66) also gives S(r/s, 1− r/s) = F (s)− r/sF (r)+(1− r/s)F (s− r), where F (n) := S(1/n, . . . , 1/n). Using (66) and the Continuity axiom one finds F (nm) = F (n) + F (m) and lim n→∞ (F (n + 1) − F (n)) = 0. One can prove that this implies F (n) ∝ log(n) [47, Lemma, Sec. 1]. Collecting all these observations one arrives at the Shannon entropy (65). (67) is weaker than (66) and than the Additivity axiom; indeed there exist additional entropies that satisfy Continuity, Symmetry and (67), such as the Rényi entropy [47] If one replaces the Additivity axiom with [48,49] 3'. (Generalized Additivity) then instead of the Shannon entropy one gets the Tsallis entropy [16] The overall constant is chosen such that the limit α → 1 reduces to (65). Notice that (69) was originally introduced in the hope of describing distributions beyond the usual Boltzmann one, for examples with longer tails. It is extremized in the equiprobable case p i = 1/n; this extremum is a maximum for α > 0, a minimum for α < 0. Notice however that Generalized Additivity means that (66) also needs to be modified by p i → p α i in the second term in the right-hand side, and that in turn means that the extensivity property (67) is no longer satisfied: this is also evident from the relation (70) with the Rényi entropy, which is extensive. Rather we have A reformulation of these characterizations was suggested in [50]. To any f , a probabilitypreserving map between two sets with probability distributions p and q, one associates a number F ( f ) obeying three axioms called Functoriality, Linearity, Continuity. It can then be proven that F ( f ) = S(p)−S(q), where S is again proportional to the Shannon entropy. Thus the function F quantifies the loss of information associated to the map f . If Linearity is replaced by a different Homogeneity axiom, the Tsallis entropy (69) is recovered.

Warping equation and relative entropy
We begin our reformulation with the equation (11a) for the warping.
We can think of the warping as defining a measure on (X , g n ), with density e f with respect to the distribution gdx n . We denote by f := e f g dx n the corresponding measure.
As in (42), we can define its relative entropy compared to g as 11 if f is integrable on (X , f) (we used that ρ = e f ), and +∞ otherwise. Now, assume that f changes in time, with velocityḟ with compact support (or, fast decreasing at infinity). Then, applying equation (22) with Comparing with (18) we see that the right hand side is the scalar product Comparing with (26) we have obtained where ∇ W denotes the gradient in Wasserstein space P 2 (X ). With this relation, we can finally write the warping equation (11a) as To summarize, the warping equation (76) fixes the warping by constraining the gradients of its relative entropy compared to the Riemannian volume form.

Internal Einstein equations as entropy concavity
We now turn to the internal equation (11b). We consider the Tsallis entropy (of µ = ρ ge f with respect to the reference ge f ): Since we integrate only along M n , this is measuring our ignorance about the internal position of a particle. If it is massless and moves geodesically, then its internal trajectory will follow an internal geodesic (App. C). We will show now that (8) is equivalent to an equation about the second time derivative of S α for a probability distribution of such particles.
Theorem 5.1. Let (X , g) be a smooth Riemannian manifold. The following statements are equivalent: 1. The Ricci tensor of g satisfies the equation of motion (11b).

i)
For any probability distribution µ on X moving along a geodesic in the space P 2 (X ) of probability distributions, the Tsallis entropy (77) ii) In addition, the inequality in (78) becomes saturated if µ is concentrated at a point, and for a suitably chosen η. Namely, for any point x 0 ∈ X and any tangent vector ξ 0 at x 0 , there exists an η such that ∇η| x 0 = ξ 0 such that (78) becomes an equality asymptotically for distributions µ very localized at x 0 . More precisely, for every point x 0 ∈ X there exists a function ω with lim ϵ→0 ω(ϵ) = 0 such that the following holds: for any tangent vector ξ 0 at x 0 of unit norm ∥ξ 0 ∥ = 1, there exists a smooth function η with ∇η| x 0 = ξ 0 such that for every probability measure µ supported in B ϵ (x 0 ). (21): From the definitions below (22) and (23), we see that P = −F /N , P 2 = F /N 2 . We need (B.8), the weighted version of (23); replacing in it the weighted Bochner identity (A.9) we obtain )η is the traceless part of the Hessian of η. In the second step we have used the definition (10), and again the identity (40). Recalling that for us N = 2− D < 0, the second and third terms in the parenthesis in (80b) are positive. Using (11b) we arrive at (78). For ii), we use Lemma B.1, which makes the second and third terms in (80b) vanish asymptotically in the limit where p ∼ δ(x − x 0 ) .
2 ⇒ 1: Suppose now we know (78). Using (80b), we obtain where we wrote (11b) as E mn = 0, and X (η) represents the second and third terms in (80b). Now it follows that E mn is semi-positive definite everywhere. (If this were not the case, there would exist a x 0 and aξ 0 such that E mnξ m 0ξ n 0 < 0. By Lemma B.1, there would now existη such that ∇η =ξ and X 2 = 0; taking now ρ ∼ δ(x − x 0 ) we arrive at a contradiction.) Now, again taking the measure to be concentrated at one point x 0 , we take η as in ii). Since by hypothesis the inequality becomes saturated, we can write We know that E mn is semi-positive definite, so all terms are ⩾ 0; it follows that they are all zero. In particular E mn (x 0 )ξ m 0 ξ n 0 = 0. Since x 0 and ξ 0 are arbitrary, E mn = 0 everywhere.

Effective negative dimensions and KK bounds
As another application of negative effective dimensions to warped compactifications, we will now obtain new bounds on their spin-two KK masses. Recall [51,52] that these are eigenvalues of the weighted (or Bakry-Émery) Laplacian with f = (D − 2)A. In [1,2] optimal transport techniques were already used to find bounds on these eigenvalues, but using the N = ∞ effective dimension. A lower bound on Ric ∞, f could be obtained, but in terms of σ := sup|dA|, which unfortunately can get quite large in some solutions. The advantage of considering negative effective dimensions is that the bound (14) is in terms of the cosmological constant Λ alone, thus avoiding the dependence on σ.

Dp-branes
The possibility to work with some non-smooth spaces is quite powerful for string theory, as several important compactifications have singularities in their low-energy description due to the back-reaction of extended objects. Recall for example that O-plane singularities (and/or quantum effects) are necessary in order to obtain dS compactifications [53,54]. D-brane singularities also appear often in AdS vacua, where they are holographically dual to flavor symmetries. We review here the singularities associated to D-and M-branes.
In the supergravity approximation, D-branes play the role of localized sources for the gravitational and higher-form electromagnetic fields. The presence of such a localized object produces a singularity in the classical fields it sources; this is in complete analogy to black holes in pure general relativity or for electrons in classical electrodynamics. While on the one hand, such singularities are expected to be resolved in a full quantum theory, on the other hand they are a general feature of low-energy descriptions. It is thus useful to develop mathematical tools that allow to handle such non-smooth spaces.
In the setting of ten-dimensional supergravity theories, D-branes are identified with a tendimensional Lorentzian metric that, in Einstein frame, has the following asymptotics: Here p ∈ {0, 1, . . . , 7}, r is a radial coordinate in the transverse directions to the singular object, dx 2 p+1 denotes the p + 1 dimensional Lorentzian metric (in case p = 0, simply −dx 2 1 ) corresponding to the subspace appearing in the singular limit r → 0, and ds 2 8−p denotes the round metric on the unit 8 − p-dimensional sphere 8−p ; the function H is harmonic on the transverse space and introduces the singularity.
In order to preserve maximal symmetry in vacuum compactifications, a Dp-brane has to be extended along all the d vacuum directions; however, in addition, it can also be extended in some of the internal directions. Comparing (6) and (82), we obtain that the internal metric ds 2 n has the following asymptotics where dx 2 p+1−d denotes the flat metric of the (p + 1 − d)-dimensional Euclidean space. Again from (6), we also get that the weight function f satisfies Near the singularity, the harmonic function has the following asymptotics: where r 7−p 0 = g s (2πl s ) 7−p /((7 − p)Vol( 8−p )) for p < 7; as usual g s is the string coupling (a value for e φ at a reference point, often infinity) and l s is the string length.
The next definition, where (for some values of p) we allow singularities that are asymptotic to Dp-branes, is slightly more general than the one given in our previous work [1] where we considered exact D-brane singularities. Definition 6.1 (Asymptotically D-brane metric measure spaces). We define an asymptotically D-brane metric measure space a smooth and compact Riemannian manifold (X , g) that is glued (in a smooth way) to a finite number of ends where the metric g is asymptotically of the form (83) in a neighborhood of the closed singular set {r = 0}, depending on value of p in the following precise sense: • Case p = 0, 1, . . . , 5. In the end the metric can be written as • Case p = 7. In a neighborhood {r < ε} of the closed singular set {r = 0}, the metric is of the form where η is a non-negative real valued function.
• Case p = 8. In a neighborhood {|r| < ε} of the closed singular set {r = 0}, the metric is of the form where h 8 > 0 is a positive constant, and the measure is given by where dvol g is the Riemannian volume measure associated to g.
In all the above cases, we endow X with a weighted measure, and view it as a metric measure space (X , d, m) where: • The distance d between two points p, q ∈ X is given by where Γ (p, q) denotes the set of absolutely continuous curves joining p to q.
• The measure m is a weighted volume measure m := e f dvol g , with the function e f smooth outside the tips of the ends and gives zero mass to the singular set.
We say that (X , d, m) is an (exactly) D-brane metric measure space if, for each end, the error ω (resp. η) vanishes on a neighbourhood {r < ε} of the singular set {r = 0}.
Remark 6.2 (Other localized sources). Let us briefly comment on other localized sources. First of all, fundamental strings (F1) and NS five-branes (NS5), have exactly the same expansion as D1 and D5 branes, respectively; this is indeed a consequence of the invariance under type IIB S-duality (or more generally under the SL(2, ) symmetry) of the asymptotic 10-dimensional Einstein metric (82). For M2 and M5 branes in M-theory, the internal metric has again the asymptotic form (83), now with H ∼ (r/r 0 ) q−8 , q = 2, 5. Notice it enters in the first case of Def. 6.1; in particular, for both M2s and M5s, the singularity is at infinite distance.

Some basics on metric measure spaces
Motivated by the appearance of singularities as discussed in the section above, we enlarge the class of spaces under consideration. We thus leave the framework of smooth weighted Riemannian manifold and enter the more general setting of metric measure spaces. Let us start with some basics. (For a longer introduction to some of these ideas see also Sec. 2.3 in our earlier [1].) In the sequel (X , d) will be a complete and separable metric space. By geodesic over (X , d) we mean a constant speed (length minimizing) geodesic, i.e. a curve γ : [0, 1] → X such that d(γ(s), γ(t)) = |t − s|d(γ(0), γ(1)) , The space of all geodesics over a space X will be denoted by Γ (X ). The evaluation map e t : Γ (X ) → X , t ∈ [0, 1], is defined as e t (γ) := γ(t).
The space P(X , d) is the space of Borel probability measures over X . When the distance d is clear by the context, we will simply write P(X ). The space P 2 (X ) ⊂ P(X ) is the subset of probability measures with finite second moment. We endow P 2 (X ) with the 2-Wasserstein distance W 2 defined as in (15).
A measure π realising the infimum in (15) is called an optimal coupling. A measure ν ∈ P(Γ (X )) is called an optimal dynamical plan if the probability measure (e 0 , e 1 ) ♯ (ν) is an optimal coupling between its own marginals, and we denote by OptGeo(µ 0 , µ 1 ) the set of all the optimal dynamical plans between µ 0 and µ 1 .
A metric measure space is a triple (X , d, m) where (X , d) is a complete and separable metric space and m is a non-negative Borel measure which is finite on balls, i.e. m(B r (x)) < +∞ for Denote by Lip(X ) (resp. Lip bs (X )) the space of Lipschitz functions on (X , d) (resp. with bounded support). For a function f ∈ Lip(X ) the slope at a point x ∈ X is defined as , if x is an accumulation point , and |∇ f |(x) := 0 if x is isolated.

Cheeger energy, Laplacian and heat flow
Given a function f ∈ L 2 (X , m), the Cheeger energy Ch( f ) is defined by [55] (see also [56]) with finiteness domain given by the vector space We endow W 1,2 (X , d, m) with the norm ∥ f ∥ 2 The Cheeger energy is a convex, 2-homogenous and lower semicontinuous functional on L 2 (X , m).
For f ∈ W 1,2 (X , d, m), Ch( f ) can be represented in terms of the minimal relaxed gradient The minimal relaxed gradient is a local object in the sense that |D f | = |D g| m-a.e. (namely, almost everywhere with respect to m) on the set For more details on the minimal relaxed gradient we refer to [56].
where the inequality makes sense thanks to the locality of the minimal relaxed gradient. Thanks to [56,Lemma 4.3], for any function f ∈ L 2 (X , m) with |D f | ∈ L 2 (X , m) it is possible to find a sequence ( f n ) of Lipschitz functions with f n → f and |∇ f n | → |D f | strongly in L 2 (X , m). By a standard cutoff argument, we can further assume that ( f n ) ⊂ Lip bs (X ). In other words, the class Lip bs (X ) is dense in energy in W 1,2 (X , d, m).
For every f ∈ L 2 (X , m) the heat flow of f is the unique locally Lipschitz curve t → H t ( f ) from (0, ∞) to L 2 (X , m) such that for a.e. t ∈ (0, ∞) , Here ∂ − Ch ⊂ L 2 (X , m) is the subdifferential of the Cheeger energy, i.e. given f ∈ L 2 (X , m) it we denote by ∆ f the element of minimal L 2 -norm in ∂ − Ch( f ) and we refer to it as the Laplacian of f .

Cheeger bounds in infinitesimally Hilbertian metric measure spaces
The goal of this section is to prove some bounds on the spectrum of the Laplacian in the high generality of infinitesimally Hilbertian metric measure spaces, framework which include several singularities appearing in gravity compactifications (e.g. Dp-branes, O-planes, etc.).

Infinitesimally Hilbertian metric measure spaces
Notice that, in general, the Laplacian is 1-homogenous but may not be linear (for instance in n endowed with a non-euclidean norm, or more generally on a Finsler manifold). This is equivalent to say that the heat flow H t : L 2 (X , m) → L 2 (X , m) in general is 1-homogenous but may not be linear, or, still equivalently, that the Cheeger energy is 2-homogenous but may not be a quadratic form, or, still equivalently, that W 1,2 (X , d, m) in general is a Banach space but may not be a Hilbert space. When the latter of two options is satisfied, i.e. when we have a "Riemannian" behaviour as opposed to a "Finslerian" one, we say that the space is infinitesimally Hilbertian (see [32,33]).
Below is the precise definition.

Ch(
As mentioned above, if (X , d, m) is infinitesimally Hilbertian, then the heat flow and the Laplacian are linear, W 1,2 (X , d, m) is a Hilbert space, and, using (90), the quadratic form defines a Dirichlet form, i.e. a L 2 (X , m)-lower semicontinuous quadratic form that satisfies the Markov property E(ϕ( f )) ⩽ E( f ) for every 1-Lipschitz function ϕ : → with ϕ(0) = 0. By construction, H t and −∆ correspond respectively to the (sub)-Markov semigroup and the infinitesimally generator associated to the form (see for instance [57] as a general reference on Dirichlet forms). Moreover, the heat flow is a self-adjoint operator on L 2 (X , m), as well as the Laplacian ∆ which becomes a non-negative, densely defined, self-adjoint operator. If the measure of the space is finite (or, more generally, if m(B r (x)) ≤ A exp(B r 2 ) for some A, B > 0,x ∈ X and every r > 0) the semigroup H t is also mass preserving, i.e. for every f ∈ L 1 ∩ L 2 (X , m) it holds (see for instance [56,Th. 4.16,Th. 4.20]): Another important property of this class of spaces is the density of Lip bs (X ) in W 1,2 (X , d, m), that follows easily from (91) using the L 2 -lower semicontinuity of the Cheeger energy and the already stated density in energy of the Lipschitz functions. The next proposition will allow to include several interesting singularities (e.g. both Dpbranes and O-planes) in the framework of infinitesimally Hilbertian spaces. • there exists a measure-preserving isometry Φ : where gdx n denotes the Riemannian volume measure of (M , g).
Proof. First of all, since the Riemannian scalar product g satisfies the parallelogram rule, it holds that where ∇u ′ denotes the weak gradient of a Sobolev function u ′ ∈ W 1,2 (M , g, e f g) in the classical distributional sense.
Let now u, v ∈ W 1,2 (X , d, m). We have to check the validity of the parallelogram identity (91). Using the representation formula (89) and the fact that m(Σ) = 0, this is equivalent to show that Since by assumption Σ is a closed set and X \ Σ is isomorphic to the smooth weighted Riemannian manifold (M , g, e f g), we have that the relaxed gradient of a Sobolev function restricted to X \ Σ coincides with the modulus of the classical weak gradient (in Sobolev sense). Thus the validity of (94) follows from (93).
Remark 6.5. The framework encompassed by the assumptions of Proposition 6.4 is very general, and includes most (if not all) the singularities appearing in the low-energy description of string theory as localised sources: for instance singular metrics which are asymptotic to D-branes, M-branes and O-planes near the singular set fit into this setting, since the closure of the singular set has measure zero.

Spectrum of the Laplacian and Cheeger constants in infinitesimally Hilbertian spaces
In this section we assume (X , d, m) to be infinitesimally Hilbertian. We have seen in the previous section that the Laplacian is a non-negative, densely defined, self-adjoint operator, and thus it enters in the classical framework for spectral theory.
The regular values of ∆ are the values λ ∈ such that (λId − ∆) has a bounded inverse. Recall that the self-adjointness of the Laplacian implies that eigenfunctions relative to different eigenvalues are orthogonal. For spaces of finite measure, constant functions are eigenfunctions relative to λ 0 = 0, and thus any other eigenfunction has null mean value.
The infimum of the essential spectrum plays an important role in the sequel, since the set of eigenvalues below inf σ ess (∆) is at most countable and, listing them in an increasing order where V k denotes a k-dimensional subspace of W 1,2 (X , d, m).
The perimeter of a Borel subset B ⊂ X with m(B) < ∞ is defined by Using the notion of perimeter, one can define the k-Cheeger constant (or k-way isoperimetric constant) as where the infimum runs over all collections of k + 1 disjoint Borel sets B i ⊂ X such that 0 < m(B i ) < ∞. Notice that h k (X ) ⩽ h k+1 (X ) for every k ∈ and, when m(X ) < ∞, h 0 (X ) = 0.
We also recall the following characterization of h 1 (X ), valid for spaces of finite measure and that easily follows from the definitions recalling that the perimeter of a set coincides with the perimeter of its complement:

Generalization of Cheeger bounds
First of all, the celebrated Cheeger inequality [58] holds in the high generality of infinitesimal Hilbertian spaces. We recall the statement below. For the proof we refer to [35, App. A]; see also [59] where it is shown that the inequality is strict in a large class of singular spaces.
We will now extend some theorems proved in [1] (after [60,61]) from the class of RCD spaces to the more general framework of infinitesimally Hilbertian metric measure spaces (i.e. without any curvature assumption), and during the proofs we will focus on the modifications needed to address this case. In particular, all the results in this section apply to a very large class of singular metrics including D-branes, M-branes, O-planes (see Remark 6.5).
Proof. We consider a non-null function f ∈ Lip bs (X ) and set In [1] the following bound has been proved and we take it for granted since its proof requires only the variational characterization of the eigenvalues (96) and the co-area inequality for Lipschitz functions, results that hold on infinitesimally Hilbertian metric measure spaces without requiring any curvature bound. First, suppose m(X ) = ∞. Since the class Lip bs (X ) is dense in energy, we can find a sequence of functions ( f n ) ∈ Lip bs (X ) such that f n → f and |∇ f n | → |D f | in L 2 (X , m), where f is an eigenfunction of eigenvalue λ 0 . The result thus follows by applying (101) to such a sequence ( f n ), using the trivial fact h 0 (X ) ⩽ φ( f n ) for any n, and passing to the limit.
If m(X ) < ∞ we argue in a similar way just by applying (101) to two sequences of functions f n , h n ∈ Lip bs (X ) that converge in L 2 (X , m) respectively to the positive and negative parts f + , f − of an eigenfunction f , of eigenvalue λ 1 with |∇ f n | → |D f + | and |∇h n | → |D f − | in L 2 (X , m). The existence of such sequences is again a consequence of the density in energy of Lip bs (X ), noticing that f + , f − ∈ W 1,2 (X , d, m) by (90). Recalling the definition of h 1 (X ) given in (97) and that λ 1 = R( f + ) = R( f − ) (see [62]) the result follows. We can thus appeal to a result of Miclo [61, page 325] (as we did in [1]) and infer that for an absolute constantC > 0, where and we are defining λ 0 (B) as Let B ⊂ X be a Borel set with m(B) ∈ (0, m(X )). We now fix ϵ > 0 and let We fix now a Borel representative of f and we set Reasoning as in the proof of [59,Th. 4.6] (using that f 2 is a BV function to which we can apply the co-area formula and noticing that all the arguments involved do not require (X , d, m) to be an RCD space) we can conclude that . By definition of infimum we find at ∈ (0, ess sup f 2 ) such that Putting (106) and (107) together, it follows that for any Borel set B ⊂ X , m(B) ∈ (0, m(X )), there exists a Borel set Bt ⊂ B, m(Bt ) > 0, such that Since B ⊂ X and ϵ > 0 are arbitrary, by (108) we obtain h 2 k (X ) ⩽ 4Λ k from the definition (97) of h k (X ). Together with (104) this leads to the desired conclusion, where C := 4/C.
• Case m(X ) = ∞: For a Borel set E ⊂ X with finite (non-zero) measure, we introduce the notation H t,E for the heat semigroup restricted to E: where m E := m(E) −1 m⌞ E is the conditional expectation of m with respect to E. Using the assumption (102) and arguing by approximation, one can show that We can then follow verbatim the proof of the corresponding case given in [1,Theorem 4.9], noticing that the variational characterization of the eigenvalues λ 0 , ..., λ k as well as the density of Lip bs (X ) in W 1,2 (X , d, m) still hold under the infinitesimally Hilbertianity assumption.

Curvature-dimension conditions
We have seen in Sec. 2.2 that when the REC is satisfied (and in particular for compactifications of string/M-theory) the weighted Ricci curvature satisfies the simple bound (14), Ric ⩾ Λ. In Sec. 3.4 we saw that the number N = 2 − d could be interpreted as an effective dimension. We will now introduce a class of spaces called RCD(K, N ) (for Riemannian Curvature-Dimension) that reduces to (14) on smooth manifolds for K = Λ, N = 2 − d, but includes more general singular spaces. We will show in Corollary 6.15 that the class of allowed singularities includes those induced by Dp-branes. In our previous paper [1, Sec. 3] we treated the case N ∈ (1, +∞] (paying the price of not explicitly controlling the lower Ricci bound K ∈ ), the treatment for N < 0 presents both similarities and differences. The main advantage of considering negative N is that it allows for a very neat control of the Ricci lower bound, since K coincides with the cosmological constant Λ, when the REC is satisfied.
We say that (X , d, m) satisfies the RCD(K, N ) condition if it is infinitesimally Hilbertian and satisfies the CD(K, N ) condition.

Remark 6.10.
• The definition given in [5] involves a slightly different expression of the entropy, namely X ρ However the two definitions are equivalent.
• If (X , d, m) is a smooth metric measure space, then it satisfies CD(K, N ) in the sense of Def. 6.9 if and only if its N -Bakry-Émery-Ricci tensor is bounded below by K (cf. (10)), as proved in [5,Th. 4.20].
In the sequel, we will consider metrics in polar coordinates with points denoted as x = (Θ, r) ∈ n × (0, ∞), while the origin will be denoted by O := {r = 0}. Proof. Assume by contradiction that there exists a geodesic γ : [0, 1] → X such that γ t = O for some t ∈ (0, 1). Up to restricting and reparametrizing γ, we can assume without loss of generality that where ϵ > 0 is a small parameter to be fixed later in the proof. Since γ passes through the origin O, by triangle inequality we have that Using (114) and assuming ϵ > 0 is small enough, we have that The combination of (116) and (117) contradicts the identity Length(γ) = d(γ 0 , γ 1 ) given by the assumption that γ is a geodesic.
The next proposition is inspired by [66,Th. 4]. We first describe the physics interpretation. On a space of the form given in the proposition below, geodesics tend to be attracted by the origin and bend towards it. Indeed we will check later in this section that D-brane singularities, which have positive tension, are of this form. Two antipodal points can be connected by one of these bended geodesics rather than by one that goes through the origin. Heuristically, this suggests that a distribution of particles moving towards the origin will spread out before refocusing on the other side; moreover, a single particle belonging to it will hit the origin with probability zero. Then for every optimal dynamical plan ν such that Proof. In the proof we will consider points on the end of the manifold, and accordingly we will use the polar coordinates to denote them.
First of all, notice that for every x 0 = (Θ 0 , r 0 ), x 1 = (Θ 1 , r 1 ) we have d(x 0 , O) = r 0 and, as a consequence of the fact that ℓ(r) 2 ⩽ r 2 , we infer that d( is the standard cone distance for C( n ). The result will be a consequence of the following: Claim ( * ): For every r > 0 there exists at most one Θ ∈ n such that γ 0 = (Θ, r) is the starting point of some geodesic γ ∈ supp(ν) ∩ Γ O .
Once the claim is settled, the proposition can be proved by contradiction. Since the restriction of an optimal dynamical plan is still optimal (see for instance [19,Th. 7.30 (ii)]), we can assume that ν is concentrated on Γ O . Using the fact that (e i ) ♯ ν ≪ m, i = 0, 1, and since m gives zero mass to O we can also assume that γ 0 ̸ = O and γ 1 ̸ = O for ν-a.e. γ. Equivalently, that the set of geodesics not starting nor ending in O is of measure zero with respect to ν. The claim ( * ) implies that the measure (e 0 ) ♯ ν is concentrated on a set of the form C h := {(h(r), r) : r > 0} for some function h, and thus m(C h ) = 0 which contradicts the fact that (e 0 ) ♯ ν ≪ m.
Thus, it remains to prove the claim ( * ). We split its proof in three steps.

Remark 6.13. From
Step 1 in the proof, it follows that geodesics passing through the singular point O do not branch; i.e. if two geodesics coincide for a finite time, they coincide for ever. Moreover, since out of O the space is smooth, we conclude that if (X , d) is as in the assumptions of Proposition 6.12, then (X , d) is non-branching.
In the following corollary we specify the previous results to the case of Dp-branes. If, more strongly, (X , d, m) is an (exactly) D-brane metric measure space, then it also satisfies the RCD(K ′ , N ′ ) condition for some K ′ ∈ and N ′ ∈ (1, ∞), Proof. We first observe that, since (by the very definition 6.1) the singular set of an asymptotically D-brane metric measure space (X , d, m) is closed and has zero m-measure, then (X , d, m) is infinitesimally Hilbertian thanks to Proposition 6.4. In order to get that (X , d, m) is an RCD(K, N ) space it is thus enough to prove it satisfies the CD(K, N ) condition. We discuss it case by case below. O takes the form g = dρ 2 + ℓ(ρ) 2 ds 2 1 with ℓ(ρ) 2 < ρ 2 for ρ > 0 small enough. The conclusion follows from Proposition 6.12 and Th. 6.14, together with the bound on Ric N f on the smooth part.
• Case p = 8. In the previous work [1] we have already proved that a D8-brane metric measure space is an RCD(0, N ) space for some N ∈ (1, ∞). The claim follows by the fact that RCD(0, N ) for some N ∈ (1, ∞) implies RCD(0, N ′ ) for all N ′ ∈ (−∞, 0). The second part of the statement follows from the first one, once we recall that the REC (13) coupled with the Einstein equations (5) implies that Ric (14)). The final claim was proved in our previous work [1, Th 3.2]

O-planes are not CD(K, N ), even for negative N
The heuristic reason why O-planes are not CD is in a sense the opposite of that behind Prop. 6.12 (see the informal discussion above it, and Fig. 1). For O-planes, we will see now that in (118) we need to take l(r) > r. One can now check that geodesics tend to be repelled by the origin O, which is intuitively due to O-planes having negative tension. If we send a distribution of particles towards the origin with the aim of making it reform with the same density on the other side, it will actually tend to focus near O, (say at time t = 1/2) before spreading out. As a consequence, two antipodal points are only connected by the geodesic going through the origin, in contrast with the positive-tension case in Prop. 6.12. Since the reference measure (123) of small balls centred at the origin goes to zero as the radius of the ball goes to zero and the entropy is super-linear for N < 0, it will follow that the entropy tends  Figure 2: (a) Geodesics in an Op geometry get deflected more as they get closer to the origin; the only way to get to the other side is to go straight through it. (b) As a consequence, the optimal plan that connects two antipodal distributions that are localized enough consists of straight geodesics going through the origin.
to negative infinity at the time t = 1/2 when the distribution of particles is concentrated near the origin. See Fig. 2 for a visualization of this behavior of geodesics in the Op-geometry. Let us now turn such heuristics into a rigorous argument.
First of all observe that the CD(K, N ) condition (for some K ∈ and N ∈ (−∞, 0), see Def. 6.9) on a metric measure space (X , d, m) implies that there exists a constant C = C(K, N ) > 0 with the following property: for any couple of absolutely continuous probability measures µ 0 , µ 1 ∈ P 2 (X ) there exists a W 2 -geodesic (µ t ) t∈[0,1] such that Recall that the (internal part of the) metric for an O-plane singularity is asymptotic to and the weighted measure is asymptotic to dm(Θ, r) = ρ 0 r 1/3 dr dΘ , Notice that A

A lower bound in terms of the diameter
The statement below was proved in the Ph.D. thesis of E. Calderon [25,Th. 5.2.1] in the framework of smooth weighted Riemannian manifolds. By using 1-dimensional localization we can extend it to the non-smooth metric measure setting that includes Dp-brane singularities. To this aim, we either assume that the metric measure space is asymptotically D-brane (see Def. 6.1), or we will have two RCD conditions: we assume the validity of both an RCD(K, N ) condition for some N ∈ (−∞, −1], K < 0, and an RCD(K ′ , N ′ ) condition for some K ′ ∈ and N ′ ∈ (1, +∞). This should be read as follows: while the former is giving the synthetic curvature-dimension condition we are actually interested in (so the bound we obtain will be in terms of K and N ), the latter should be read as a qualitative regularity assumption on the metric measure space to make the proof work (thus we do not want K ′ and N ′ to appear in the thesis).
Then, the smallest eigenvalue λ 1 of the Laplacian satisfies where α(diam(X ) −K) is the minimum of among the functions ψ which are smooth and have vanishing weighted average Proof. The proof in the smooth weighted setting given in [25,Th. 5.2.1] can be summarised in two steps: first show that the desired bound can be reduced to a family of inequalities on weighted intervals (of topological dimension 1), second establish such a family of inequalities on weighted intervals. The first step goes under the name of 1-dimensional localisation; in the setting of smooth weighted manifolds, such a dimensional reduction was obtained in [29]. The second step is the contribution of [25].
This gives the following. Given u ∈ L 1 (X , m) with X u dm = 0, there exists a partition of X as where • ∪ denotes a disjoint union, • m(N ) = 0, • for m-a.e. z ∈ Z it holds that u(z) = 0, • Q is a suitable set of indices and, for all α ∈ Q, X α is a geodesic in X .
Moreover, associated to the above partition of X , we have a disintegration of the measure m as where • q is a suitable measure on the set of indices Q, • for q-a.e. α ∈ Q, the measure m α is concentrated on the geodesic X α and the onedimensional metric measure space (X α , | · |, m α ) satisfies CD(K ′ , N ′ ), • for q-a.e. α ∈ Q, it holds that X α u dm α = 0.
Using now that the ambient space satisfies also the CD(K, N ) condition for N < 0 in the sense of Def. 6.9, one can follow verbatim the proof of [30,Th. 4.2] and infer that one-dimensional metric measure space (X α , | · |, m α ) satisfies CD(K, N ) as well, for q-a.e. α ∈ Q.
Once these information are at disposal, the proof of the spectral bound under the additional assumption that (X , d, m) is also an RCD(K ′ , N ′ ) space for some K ′ ∈ , N ′ ∈ (1, ∞) is obtained verbatim as in [25] (the modifications for the metric measure setting are now completely analogous to the proof of [68,Th. 4.4], setting p = 2).
Let us now briefly sketch the proof in the case when (X , d, m) is an asymptotically D-brane metric measure space. By the proof of Corollary 6.15, the only case when some geodesic can pass through the singular set is for p = 7 or p = 8. In the latter case we already know that the singularity satisfies RCD(0, N ′ ) for some N ′ ∈ (1, ∞) (at least locally in that end) and thus we can argue as above.
The only case remained to discuss is then for p = 7. From Remark 6.13 we get that the singular space corresponding to an asymptotic D7-brane is non-branching. From [69,Th. 3.3.5], we infer that the transport set (associated to the L 1 -optimal transport problem used in the 1-d localization) is endowed by an equivalence relation whose equivalence classes are the geodesics X α , α ∈ Q, mentioned above. Moreover, since it is easily seen that the cut locus has measure zero, we get that m(X \ α∈Q X α ) = 0. It follows that, up to a set of measure zero, all the transport set used in the 1-d localization is contained in the smooth part of the space. The result follows then by the smooth arguments in [25].
• The assumptions of Th. 6.16 are natural thanks to Cor. 6.15: we proved that the validity of the Einstein equations and of the REC imply that an asymptotically D-brane metric measure space satisfies RCD(K, N ) for K = Λ (the cosmological constant) and N = 2−d < 0 (d is the dimension of the extended space-time); if, more strongly, (X , d, m) is an exactly D-brane metric measure space then it satisfies also RCD(K ′ , N ′ ) for some K ′ ∈ and N ′ ∈ (1, ∞).  (14) holds, so we can choose K = Λ. Defining as usual the AdS radius L AdS via the identity Λ = (1−d)/L 2 AdS , the weight function becomes e f = cos N −1 ( diam L AdS z). Since from (81) the first eigenvalue corresponds to the mass of the first spin 2 Kaluza-Klein mode, in this situation we get the bound While we have not proven this rigorously, in d = 4 the minimum of (128) appears to be attained on ψ = sin(πz); the resulting α is a function of diam(X )/L AdS ∈ (0, π) such that lim monotone decreasing in between. In particular, the bound is most effective when diam(X ) ≪ L AdS , and loses efficacy when diam (X ) → πL AdS . A numerical analysis shows that a similar conclusion also holds in d ̸ = 4.
• When the REC is not satisfied, K is not necessarily equal to Λ, but we can still consider the limit where the internal space is flat (or positively curved). Setting K = −|ϵ| → 0 − , since in this limit α → π 2 , we obtain where L 2 = (d − 1)/|Λ| is the curvature radius of the vacuum (with either sign of the cosmological constant). (133) is now valid also for compactifications where the REC is violated, provided they are either smooth or at most with D/M-brane sources, and it proves that such vacua are automatically scale-separated when diam ≪ L. We will construct such an example in Sec. 7.2.
has scale separation at least in its spin-two tower. While this is intuitively expected,  [37]. This construction originates in IIA, with a T 2 -fibration over T 4 as internal space and intersecting O6-planes, which can be made approximately localized in the limit of small Λ.
However, the Romans mass vanishes (unlike the more famous [36]), so an uplift to M-theory is expected to exist, where the O6-planes become purely geometric features, locally described by the smooth Atiyah-Hitchin metric. (The uplift of an O6 intersection is not known, but one would expect it to be geometrized as well, at worst involving a mild singularity allowed in supergravity, such as that of an orbifold.) The sizes of the T 2 , T 4 and M-theory circle can be made to scale with different powers as Λ → 0; the diameter is expected to scale in this limit as the largest of these three, which can still be much smaller than L AdS . Thus Theorem 6.16 implies that the M-theory version of this solution should have scale separation. It would thus be very interesting to test further the approximations made in finding it. 12 We next consider a similar application for Theorem 6.6. This now implies that any solution is scale separated. Recall that the inequality (99) was also present in [1], but that here we stated it under the weaker assumption that the space is infinitesimally Hilbertian (Def. 6.3). This framework includes O-plane singularities (see Remark 6.5). 13 While the Cheeger constant h 1 looks less familiar than the diameter, just like the diameter it is a non-local quantity that can be at least estimated if not precisely computed. Recall from (98) that we need to find the infimum of Per(B)/m(B). In a solution with singularities, it is natural to first check what happens when B surrounds one of them. In [1, (4.12)] we took B to be a tubular neighborhood of radius R around a Dp singularity, obtaining Per(B)/m(B) ∼ R (5−p)/2 , which is arbitrarily small for p < 5 and large otherwise. A similar computation for an Op (p < 8) singularity gives, in the same notation of [1], where R 0 ∼ l s (g s ) 1/(7−p) is the radius below which the Op metric becomes imaginary and loses meaning (see [71,Sec. 2] for a quick review of Op solutions). We see Per(B)/m(B) diverges when the neighborhood gets close to the Op singularity, so this is not a good candidate to obtain the infimum that defines h 1 . (The p = 8 case has to be treated separately, but the same conclusion holds.) This logic can be applied to the famous proposal in [36]. The KK scale was already estimated there using an effective d = 4 theory as m KK ∼ N −1/4 ≪ 1/L AdS ∼ N 3/4 , where N is the F 4 flux quantum. The geometry of the internal M 6 was given in [72,73], again in an approximation where Λ is small. The overall length scale of M 6 is N 1/4 , which confirms the d = 4 estimate, but one might wonder if the backreaction of the O6s might affect this result significantly.
The result (136) indicates that this does not happen. Since taking B near the O6s gives a large result, h 1 is more likely to be minimized by taking B away from them. In such a region, the metric is approximately Calabi-Yau. For example, for a torus orbifold such as T 6 / 3 which by (99) gives m KK > N −1/4 , in line with the above-mentioned estimates. Of course this result is only relevant at the level of approximation considered in [72,73]; it is in principle still possible that the solution would somehow be destroyed in full string theory, were one able to perform such a computation.

Scale-separated solutions with Casimir energy
In this Section, we construct a new example of a scale-separated AdS solution with energy sources that violate the Reduced Energy Condition (Sec. 2.2). 14 When such sources are present in an AdS compactification (Λ < 0) equation (11a) for Ric where δREC < 0 refers to terms that violate the REC. Compared to a situation where δREC = 0, in which case Ric  16 In the DGKT example analyzed in Sec. 7.1 this is achieved through O6-planes, which violate the REC and stabilize a Ricci-flat internal space. In the following, we construct a new explicit example of an AdS scale-separated solution of the equations of motion with a Ricci flat internal space and with parametrically large ratio between the first Kaluza-Klein modes and |Λ|, by violating the REC through quantum effects.
We work in M-theory through its low energy description in terms of 11-dimensional supergravity, and we aim to construct a semi-classical solution of the equations of motion in which the quantum energy densities generated by the low-energy fields enter as a source in the Einstein equations (5) through the stress-energy tensor The semi-classical approximation consists in choosing a geometry and topology for the spacetime and computing the quantum effects with this assumption. Self-consistency requires then that the chosen space-time solves the equations of motion with the induced 〈T quantum M N 〉. This approach has been employed to construct various semi-classical gravity solutions such as compactifications of the Standard Model [81], traversable wormholes in four dimensions [82] and dS 4 compactifications of M-theory [75].
We consider an AdS 4 compactification on a 7-dimensional torus, so that the 11-dimensional space-time metric has the form ds 2 11 = R 2 4 ds 2 where in this decomposition the metric on the AdS 4 and T 7 factors have unit radii. 14 We thank Eva Silverstein and Gonzalo Torroba for discussions about this solution, its properties and related work [75]. 15 Another possible mechanism is to use codimension-2 sources, which have the appropriate scaling to cancel the internal curvature and achieve separation of scales [76]. 16 It is also known that sources that violate the REC are necessary in order to obtain de Sitter compactifications (Λ > 0) [53,54,80].
The zero point energy of fields in flat space and in curved backgrounds with different topologies can been computed explicitly in many cases (see e.g. [83] for a book-length review). The massless fields of eleven-dimensional supergravity are the metric g 11 , the fourform F 4 and the gravitino ψ, and in order to generate a non-trivial zero point energy we break supersymmetry by imposing anti-periodic boundary conditions for fermions on the torus cycles. The contribution of massive states is exponentially suppressed and we need not consider them. Since we are considering an isotropic and homogenous internal torus, we can constrain the form of the induced effective action energy by requiring that: i) The energy density obtained from it is an eleven-dimensional energy density, depending only on the circle size R 7 and growing when it shrinks ii) It is homogenous and isotropic along the internal directions iii) The overall sign is due to bosons. These requirements are enough to impose that the leading term in the effective action has the form S eff = (2π) −8 |ρ c | M 11 −g 11 R −11 7 , giving the stress-energy tensor where m, n are the internal torus directions, ℓ 9 11 is the eleven-dimensional Planck length and |ρ c | is a positive order one numerical coefficient depending on the topology as well as on the number of degrees of freedom. This coefficient can be computed explicitly with a oneloop calculation of the Casimir energy on a torus (see e.g. [84, Sec. 3] for a computation in general higher-dimensional supergravity theories or [81, App. A]), but it is not important for our purposes since we will obtain parametric control. It is an easy check that (141) violates the REC (13): This property makes it promising for stabilizing a flat internal space, through the mechanism in (138). The Casimir energy (141) tends to make the torus expand, and we can stabilize its effect with an energy contribution of the opposite sign, such as a flux. In M-theory, we can consider a simple homogeneous configuration on AdS 4 for the four-form flux: requires to relate f 4 to an integer N 7 as: Plugging these sources in the equations of motion (11) For N 7 → ∞, the KK modes go to zero very slowly relatively to the cosmological constant, as m ∼ |Λ| α with α = 1/11. This result can be compared with the statement of the AdS Distance Conjecture of [85] that posits α to be an order one number (and 1/2 for supersymmetric solutions). More recent studies [86,87] that suggest α > 1/d (where d = 4 in the present case) are in tension with the solution (145). In particular this has implications for the Dark Dimension proposal of [88], which is based on the bound α > 1/d.
The solution (145) has been computed by exploiting the fact that by tuning the flux integer N 7 ≫ 1 the background is parametrically close to flat space. Moreover, being nonsupersymmetric, it might be unstable for deformations of the torus or other effects. While as a function of the volume modulus this solution is at a minimum (obtained by balancing the negative contribution to the effective potential due to the Casimir with the positive potential from the flux), a more general perturbative analysis would require to compute the form of the Casimir stress energy tensor on the seven-dimensional torus away from the symmetric point. Being in AdS, a parametric analysis of this effect might not suffice, since tachyons of the scale of the cosmological constant could still be allowed if above the BF bound [89]. However, a naive probe computation suggests that it is unstable for nucleation of M2 bubbles in AdS 4 . It would be interesting to assess in detail its stability with a more careful analysis, taking into account possible corrections of the M2 action due to the Casimir effect, and to understand the role of subleading effects.
Finally, for our proofs we also relied on the following lemma [19, p. 402], which is easy to show in normal coordinates.

C Geodesics
We will now show several facts about geodesics that we need in the main text.

C.1 Space-time geodesics from Wasserstein geodesics
We used in the main text that geodesics in the Wasserstein space P 2 (X ) satisfy (19), where ξ = ∇η is the time-dependent velocity vector field appearing in the continuity equation (16) µ = −∇ · (µξ) . (C.1) Thus, the vector field ξ(x, t) describes the motion of the bit of mass at x at time t, and as a consequence of all the bits of mass composing µ moving along to ξ, µ changes in time as in (C.1). We will now show that when the motion is geodesic on P 2 (X ), then the trajectories of the individual bits of mass follow geodesics on (X , g). To see this, consider a bit of mass in µ that at t = 0 starts at x = x 0 . It will then follow ξ(x 0 , 0) and after an amount of time ∆t it will end up in x 1 = x 0 +ξ(x 0 , 0)∆t +O(∆t 2 ). Once there, it will follow ξ(x 1 , ∆t) and so on. Thus, along the trajectory x(t) its tangent vector will be ζ(t) := ξ(x(t), t) .

(C.2)
For each t this is a tangent vector in T x(t) (X ). Lowering an index (i.e. considering the one-form g(ζ, ·)), and taking a derivative along the curve we get d dt ζ m = ∂ t ξ m +ẋ n ∂ n ξ m = ∂ t ξ m + ξ n ∂ n ξ m , where the right hand side is understood at x = x(t), and thus we could substituteẋ n with ξ n . Since η = ∇ξ, we can take a covariant derivative of (19) to obtain This shows that when the probability distribution µ follows a geodesic on the probability space P 2 (X ), the individual particles follow geodesics on (X , g). Tracing back the steps shows also the other implication. A similar analysis can be performed for the Lorentzian case studied in Sec. 4.2. Specifically, a direct computation following the one given above shows that imposing the equations of motions (60) on a distribution of massive particles µ is equivalent to the requirement that each particle in the distribution follows a (time-like) geodesic. This formalism needs to be modified for massless particles since the q-gradient (59) would not be defined for light-like geodesics.

C.2 Internal geodesics
In this section, we will show that massless geodesics on the D-dimensional warped product space-time (6), where A is function depending only on the coordinate on n-dimensional Riemannian manifold X n , can be projected to Riemannian geodesics for X n , and viceversa. This is a direct consequence of the well-known fact that only massless geodesics are mapped into geodesics upon a conformal transformation. [90, App. D], which we quickly review as follows. Take the geodesic equation on M D : where X M are local coordinates on M D and σ is the coordinate along the geodesics. Defininḡ g := e −2A g D , a new coordinateσ =σ(σ) along geodesics with ∂σ ∂ σ = e −2A , and recalling that m 2 = −g QP ∂ σ X P ∂ σ X Q , we obtain This shows that when the warp function A is not constant only massless geodesics on M D are geodesic on the unwarped product space-time. Since the latter is a simple product its geodesics then directly split into geodesics on the d dimensional space-time and geodesics on X n .

D Lorentzian transport
In this Appendix we briefly derive some formulas we need in Sec. 4.2 to prove the entropic reformulation of Einstein gravity in the Lorentzian case.
In particular, we want to compute derivatives of functionals on a space-time M along timelike geodesics. Given a probability distribution µ on M , we write a generic functional as We also introduced the non-linear q-box operator as the natural second order operator associated to the q-gradient (59). We can now take another derivative of F and evaluate it at σ = 0 (which is, without loss of generality, a generic point along the geodesic). After a lengthy computation, in which we also make use of the geodesic equation in (60), we get Finally, we also need the q-analogue of the Bochner equation (24). An explicit computation (see for instance where the q-gradient ∇ q is defined as in (59).