Information geometry in quantum field theory: lessons from simple examples

Motivated by the increasing connections between information theory and high-energy physics, particularly in the context of the AdS/CFT correspondence, we explore the information geometry associated to a variety of simple systems. By studying their Fisher metrics, we derive some general lessons that may have important implications for the application of information geometry in holography. We begin by demonstrating that the symmetries of the physical theory under study play a strong role in the resulting geometry, and that the appearance of an AdS metric is a relatively general feature. We then investigate what information the Fisher metric retains about the physics of the underlying theory by studying the geometry for both the classical 2d Ising model and the corresponding 1d free fermion theory, and find that the curvature diverges precisely at the phase transition on both sides. We discuss the differences that result from placing a metric on the space of theories vs. states, using the example of coherent free fermion states. We compare the latter to the metric on the space of coherent free boson states and show that in both cases the metric is determined by the symmetries of the corresponding density matrix. We also clarify some misconceptions in the literature pertaining to different notions of flatness associated to metric and non-metric connections, with implications for how one interprets the curvature of the geometry. Our results indicate that in general, caution is needed when connecting the AdS geometry arising from certain models with the AdS/CFT correspondence, and seek to provide a useful collection of guidelines for future progress in this exciting area. Copyright J. Erdmenger et al. This work is licensed under the Creative Commons Attribution 4.0 International License. Published by the SciPost Foundation. Received 03-02-2020 Accepted 14-04-2020 Published 06-05-2020 Check for updates doi:10.21468/SciPostPhys.8.5.073


Introduction
Recent progress in understanding the AdS/CFT correspondence has seen an explosion of effort at the interface of information theory and both quantum field theory and gravity. For example, ideas such as quantum error correction appear to play a key role in bulk reconstruction, and tensor networks have become popular toy models for constructing bulk-boundary maps in this language-see, e.g., [1][2][3][4], or [5] for a recent review. Additionally, key advances in our understanding have relied crucially on entanglement-based probes of the bulk, such as Ryu-Takayanagi / HRT surfaces and their quantum extensions [6][7][8], which represent a fundamental link between (quantum) information-theoretic quantities on the one hand, and bulk geometric objects on the other. A further example is given by holographic distance measures that were considered both for pure states [9] and mixed states [10].
In light of the wealth of developments arising from the application of concepts from information theory to AdS/CFT, one may also go one step further, and attempt to use information theory to understand how the duality itself may arise. That is, instead of taking gauge/gravity duality as a fait accompli and then using information theory to fill out the holographic dictionary, one may ask whether the dual gravity theory itself can be understood as the "information space" naturally associated to the field theory. This is similar in spirit to the "It from Qubit" initiative/paradigm, in which the gravitational theory is viewed as emerging from the entanglement structure of the boundary field theory. Here, the idea is slightly broader insofar as we do not limit ourselves to entanglement-based probes, but instead ask whether there is any sense in which a geometric space can be naturally associated to the field theory based on the information content therein. For similar efforts in the context of string theory, see [11].
In fact, the study of the geometry naturally associated with a space of probability distributions is an old subject that predates AdS/CFT by several decades. Known as information geometry, it was primarily developed by statisticians, based on the pioneering work of Fisher [12]. The application of information geometry to statistical physics and thermodynamics was originally pushed by Ruppeiner [13] (see also [14]), but it enjoys a range of applications from statistics to machine learning [15][16][17][18][19][20]. The canonical reference is [21]; see also [22,23] for some historical works, or [24] for an online review. The basic idea is to endow a statistical model with the structure of Riemannian manifold, so that methods of differential geometry can be applied to the study of probability theory and statistics. A central object in this study is the Fisher information metric, which provides a metric on the space of distributions representing the model or theory under consideration. In the interest of making this paper self-contained, we start with a brief introduction to information geometry in section 2.
A number of works have sought to understand AdS/CFT in this context, which requires parametrizing the theory in such a way that this statistical framework can be adapted to quantum field theory. In one approach to this task, the bulk spacetime arises as the moduli space of instantons endowed with the Fisher information metric. Building on earlier results that showed Yang-Mills instantons to be good probes of the bulk geometry [25], Blau, Narain, and Thompson [26] evaluated the information metric on the moduli space of SU (2) instantons and showed that it corresponds to AdS 5 , even away from the large N limit for N = 4 as well as N = 2 SU(N ) super Yang-Mills theory. The instanton correction matches the first-order string theory correction to the supergravity action. An approach to capturing the compact space, the analogue of the S 5 in the best-known example of AdS/CFT, was taken more recently in [27] using the N nonlinear sigma model as a proxy; see also [28][29][30][31] for related works. Yet a different approach was used in [32] to construct a metric in real Euclidean space by considering an RG gradient flow for large N φ 4 theory in d dimensions. A (d +1)-dimensional asymptotically AdS metric was found both in the UV and in the IR, with different AdS radii. To date, it is unclear how to capture the internal S 5 in such an approach, but the appearance of an AdS space has nonetheless generated some excitement. However, a key ingredient in the above is the fact that in the large N saddle point approximation, the classical supergravity action takes on a Gaussian form. Hence, while we do not know the degrees of freedom in the strongly coupled N = 4 gauge theory, the duality implies that they lead to a Gaussian probability distribution. But it has long been known in the information geometry literature, even prior to AdS/CFT, that the information metric associated with Gaussian distributions is hyperbolic space; see section 2 and [21,33] for more details.
In fact, as we shall show in section 3, the AdS metric appears to be merely a reflection of the symmetries of the underlying distribution. This may have important implications in light of recent attempts to connect the appearance of an AdS geometry with holography [34][35][36][37][38]. Furthermore, the map between Gaussian distributions and a hyperbolic Fisher metric is not bijective: other, quite different distributions may also lead to the same hyperbolic space; see section 3 and [39], in which a construction procedure for obtaining probability distributions from a fixed information metric was given. Additionally, we demonstrate in section 7 that the fact that the information geometry metric inherits the symmetries of the underlying probability distribution continues to hold in the quantum case. We do this for the quantum analogue of the Fisher metric on a space of quantum states. In this case, the probability distribution is replaced with the density matrix of the quantum states whose symmetries now take the form of conjugation by an appropriate unitary matrix.
These two issues -symmetry and non-uniqueness -immediately raise two questions: first, how much physics of the field theory is encoded in the information geometry? Second, can we find other meaningful realizations of the information space associated to a given theory, and perhaps even generalize gauge/gravity duality to an "information/geometry" duality applicable to a wider class of theories? The purpose of this work is to present some initial explorations into these questions, as well as to collect some facts about the Fisher metric which, while familiar to experts in information geometry, do not appear to have survived the latter's recruitment into the quantum field theory community.
As mentioned above, we shall begin by reviewing the basic ingredients of information geometry in section 2, and show how AdS appears as the metric on the space of Gaussians as an illustrative example. In section 3, we substantiate the aforementioned claim that the hyperbolic geometry is a reflection of the symmetries of the underlying theory, and discuss some further issues with non-uniqueness in the putative information ↔ geometry map. This motivates us to consider which physical features are faithfully preserved under this relationship.
To that end, we present in section 4 an example of an unstable system -massless φ 4 theory in 3+1 dimensions with an inverted potential -and show that there is an ambiguity in the Hitchin prescription [40] of taking the Lagrangian density evaluated on solutions to the equations of motion as the probability distribution on those solutions. We demonstrate how adding a total derivative to the Lagrangian density, which contributes no boundary terms, allows us to transform from a situation which is unstable from the information geometry standpoint to a stable one.
We then turn to the study of curvature invariants. Insofar as these are fundamental features of the geometry, it is natural to ask which information-theoretic aspects may be encoded therein. Motivated in part by earlier work in condensed matter systems [41][42][43] (see also [44]), we shall examine the 2d classical Ising model in section 5, and show that the Ricci scalar diverges along the critical line. Additionally, using the well-known map between the 2d classical Ising model and a 1d free fermion theory, this example provides us with the opportunity to examine the information geometry on both sides of an existing correspondence between physical theories. We shall find that the geometry of the free fermion theory is onedimensional: the single component of the metric can be parametrized by the fermion mass, and diverges precisely at the critical temperature in a way that matches the behaviour of the 2d theory. Thus, while the correspondence between the 2d classical Ising model and the 1d free fermion theory is not a true duality, the salient geometrical features are captured on both sides. To our knowledge, this is the first such application of information geometry to both sides of an existing correspondence between different physical theories, and it would be very interesting to consider other examples.
We note that an important difference arises when considering the Fisher metric on the space of theories, spanned by its couplings and masses, as opposed to the metric on the space of states. The former was considered for quantum field theories for instance in [45], where the Zamolodchikov metric is used as an information metric, which then changes under RG flow. The latter was considered for instance in the instanton approach of Blau, Narain, and Thompson already discussed above [26], where the information metric is defined on the moduli space of the gauge theory considered (see also [46]). Our treatment of the Ising model in section 5 falls into the former class. To illustrate the difference however, we also discuss the quantum analogue of the information metric on the space of coherent free fermion states in section 6. We compare this with the case of free bosons in section 7 and further show that the metrics are fully determined by the symmetries of the density matrices in each case.
There has been some confusion in the physics literature as to the curvature of the informa-tion space. For example, [42] asserts that the geometry of non-interacting models is flat, while this is clearly false for even the simple Gaussian example mentioned above. We believe this is a confusion of language, stemming from the fact that in information geometry, one typically considers the 1-connection, rather than the 0-connection familiar to physicists; and the associated 1-curvature is indeed zero for a wide class of models, known as exponential families (see sections 2.1). The reason for this stems from the fact that the 1-curvature is more naturally associated with information loss along a curve, as quantified by the Kullback-Leibler divergence or relative entropy, whereas the information-theoretic interpretation of the 0-curvature is less clear. A skim of the information geometry literature will hence turn up statements about flat geometries, without making it immediately obvious that this is not flatness in the physicist's familiar sense. We elaborate on this point in section 8, where we also point out that vanishing 1-curvature corresponds to the trivial solution of the equation of motion arising from a Chern-Simons action. This is reminiscent of the map between field theory and supergravity actions in standard realizations of field theory/gravity dualities, at least insofar that there is an obvious action providing the required dynamics on the gravity side. Finally, we close in section 9 with a concise summary of the lessons learned from the examples herein, as well as some speculations on the relationship between information geometry and holographic RG [45], and the potential for this approach to enable us to compute complexity [47] in strongly coupled / interacting field theories.

Information geometry
To make this paper self-contained, let us begin with a brief introduction to information geometry. We shall present only those ingredients required for subsequent sections, and refer the interested reader to [21] for details.
We are interested in studying the properties of a statistical model S, which is essentially a set of probability distribution functions p : X → satisfying 1 where X is the space of stochastic or physical variables (e.g., m , or some discrete set). Additionally, each p may be parametrized by ξ = (ξ 1 , . . . , ξ n ) ∈ n , so that the model S is where Ξ ⊂ n , and ξ → p ξ is injective and provides a map between the parameter space and the points on the manifold. That is, we regard each point ξ as a different distribution within our model, and take the map Ξ → provided by ξ to be in C ∞ so that we may take derivatives with respect to these parameters. Note that it is the parameters ξ -which represent points on this statistical manifold -with respect to which we will be computing derivatives, not the stochastic variables x. Accordingly, we denote ∂ i ≡ ∂ ∂ ξ i . Now, given a model S, the Fisher information metric of S at a point ξ is the n × n matrix G(ξ) = [g i j (ξ)], with elements where and the expectation 〈. . .〉 ξ with respect to the distribution p ξ is defined as Note that G is symmetric and positive semi-definite by construction. It will prove convenient to rewrite the metric as where on the last line, the first term vanishes by the normalization constraint, i.e., There are numerous equivalent definitions for the Fisher metric. For example, by direction computation, we can show that the Fisher metric is also given by This form of the Fisher metric will prove useful in section 6, where we discuss generalizations to more complicated scenarios in which the probability distribution is no longer just a realvalued function.

Exponential families
While the machinery of information geometry can be applied to any distribution which satisfies the condition (2), we will be particularly interested in a class of models known as exponential families. Suppose an n-dimensional model S = {p θ | θ ∈ Θ} can be expressed in terms of n+1 functions {C, F 1 , . . . , F n } on X and a function ψ on Θ as where Einstein's summation convention is assumed. Then S is an exponential family, and θ are the so-called canonical coordinates, not to be confused with the stochastic variable x. The function ψ is known as the potential, which is fixed by the normalization to Note that the parametrization θ → p θ is 1:1 if and only if the functions {C, F 1 , . . . , F n } are linearly independent, which we shall assume to be the case. Many important models fall into this class which, in addition to some interesting mathematical properties (see sec. 8), have the technical nicety of admitting an expression for the Fisher metric directly in terms of the potential. Observe that where the derivatives are taken with respect to the canonical coordinates, ∂ i ≡ ∂ ∂ θ i . Therefore the metric (7) may be written We emphasize that while (7) is true for general models, the simple form (12) holds only for exponential families. We will rely on this form of the metric extensively below. Another form that is sometimes useful is the expression in terms of the covariance matrix of F i ,

Simple example: AdS 2 from a Gaussian
As a concrete example, let us show how hyperbolic space arises from the normal distribution This clearly falls within the class of exponential families (9), with the identifications: 2 To evaluate the metric (12), we invert these relations in order to express the distribution in terms of the canonical coordinates: Strictly speaking, the potential is all we need to compute the metric, but we can also write out the distribution (14) as a quick check on the consistency of our identifications: Substituting the potential ψ(θ ) in (16) into (12) then gives the metric in canonical coordinates: and hence the squared line element is The metric in terms of the original (physical) coordinates µ, σ is then obtained by performing a simple change of basis via the identifications (15): whence we at last obtain which is none other than Euclidean AdS 2 , with the standard deviation playing the role of the radial coordinate. Note that the Fisher metric is always of Euclidean signature, or possibly degenerate, since the Fisher metric is positive semi-definite. To get a metric of Lorentzian or any other mixed signature requires Wick rotation a posteriori. We will henceforth take this for granted and drop the "Euclidean" qualifier. Loosely speaking, the intuition is that a large standard deviation implies a large overlap between different distributions. Operationally, this means that they are harder to distinguish (e.g., requiring more measurements), and are accordingly considered to be "close" in an information-theoretic sense.
Two immediate observations are worth remarking upon. First, we see that the geometry corresponding to a free theory is clearly not flat, contrary to some claims in the literature that non-interacting theories have vanishing Ricci scalar [42,48]; we shall return to this issue when we discuss connections in section 8. Second, we note that the map between the Gaussian and the hyperbolic Fisher metric is achieved without any reference to the dynamics leading to this particular metric on the gravity side, and in this sense falls short of the original AdS/CFT correspondence. We shall turn to this second issue in the next section, and investigate the extent to which this geometry merely reflects the underlying symmetries of the distribution.
Before doing so however, let us mention a simple example of a distribution which is not an exponential family and yet still yields an AdS 2 metric, namely the Cauchy-Lorentz distribution. For other examples of distributions that yield AdS, as well as other geometries (e.g., the sphere), see [33,39]. 3 When we say that the information geometry merely reflects the underlying symmetries of the distribution, we mean that a symmetry of the probability distribution will manifest itself as a corresponding symmetry of the Fisher metric. Let us be precise about what is meant by a symmetry of the probability distribution. Consider a map

A hyperbolic red herring
Suppose that there exists a mapx : Then, the probability distribution p(x; ξ) is said to be symmetric under the transformation ξ →ξ(ξ). In other words, by a suitable redefinition of the stochastic variable x, we can "undo" the transformation on ξ, thereby putting the transformed probability distribution back into the form of the original probability distribution. Let us take the Gaussian distribution written in terms of the mean µ and standard deviation σ as an example. It is clear that the translation (µ, σ) → (µ+c, σ), where c is any real constant, can be "undone" by a translation x → x + c. Therefore, we say that the Gaussian distribution is symmetric or invariant under a translation of the mean. A more nontrivial transformation is the scaling transformation (µ, σ) → (λµ, λσ), for some real λ. The map x → λx will undo this scaling transformation. Unlike the translation, the scaling transformation does come with a nontrivial Jacobian in x, but this Jacobian is precisely what is needed to maintain the relation (24). Therefore, the Gaussian is invariant under translations of µ and simultaneous scaling of µ and σ. The Fisher metric corresponding to the family of Gaussian distributions inherits these symmetries. But the only metric which enjoys these symmetries is AdS 2 and therefore the Fisher metric could not have been anything else! To reiterate, a symmetry of the probability distribution will also be a symmetry of the corresponding Fisher metric. The converse, however, is not necessarily true-the Fisher metric can exhibit more symmetries than are present in any particular probability distribution from which that metric may be derived. Note that we are careful to say "any particular probability distribution" because, as demonstrated in [39], there are in fact infinitely many probability distributions that give the same Fisher metric. For instance, example 4.8 of [39] derives Euclidean AdS 2 as the Fisher metric of the following three-dimensional probability distribution (in this case, X = 3 ): where , It is clear that this probability distribution exhibits neither the translation invariance in µ nor the scaling symmetry in µ and σ that is enjoyed by the corresponding Fisher metric, which is Euclidean AdS 2 .
Given the above, we believe that care is needed when attempting to identify the appearance of an AdS geometry with some intrinsic aspect of holography. This is not to say that the hyperbolic geometry arising from some theories is unrelated to AdS/CFT, but it is clearly not unique, and may represent a red herring or false conclusion in this context.

Unstable configurations
Here we turn to a simple example demonstrating how the fact that a system has an unstable potential is encoded in the Fisher metric. Our example is constructed on the space of states, in analogy with the stable examples for the Fisher metric of Yang-Mills instantons in [26] and the Klein-Gordon field in [49]. Both of these make use of the proposal of Hitchin [40] to identify the Lagrangian density, evaluated on a family of field configurations, as the probability distribution on those states. The Fisher metric on the space of 4D Yang-Mills instantons turns out to be Euclidean AdS 5 [26]. In that case, the Lagrangian density F 2 evaluated on the instantons exhibits translation invariance in the center and scale invariance in both the center and the width of the instantons.
Our example is similar to the Yang-Mills instanton, but with one crucial difference -that the instantons themselves are in an unstable potential. Of course, the same symmetry argument would make one conclude that the Fisher metric ought to still be AdS. However, we will see how the machinery is smart enough to know when the field theory that one is considering is unstable. The system is taken to be in four Euclidean dimensions with a real scalar field and an action The equation of motion reads In Euclidean signature, the potential is V = −g 2 φ 4 , which is unbounded from below. Therefore, at least classically, the theory is perturbatively unstable about the trivial solution φ = 0. However, there do exist exact instanton solutions to the equation of motion, which are parametrized by a four-dimensional center ξ µ and a width ρ: Nevertheless, quadratic perturbations about these instanton solutions are still unstable in that their squared mass, which is equal to −6g 2 φ inst (x; ξ, ρ) 2 , is negative (and depends on x). 4 The Lagrangian density evaluated on the instanton solution is taken to be the probability distribution: Note that this function is perfectly normalizable and is normalized by setting However, this function is not positive semi-definite and is in fact negative for points x such that |x − ξ| < ρ. Therefore, this is not a valid probability distribution and cannot be used to define a Fisher metric. Nevertheless, it is instructive to see exactly where the naive calculation of the Fisher metric fails to give a well-defined result. If we naively proceed to calculate the Fisher metric from (30), the resulting integrals that appear in the Fisher metric are technically ill-defined since the integrands contain singularities. For example, the non-vanishing components of the Fisher metric read where y is related to x µ by The remaining integrals have a pole at y = 1 and thus the integrals themselves are ill-defined: one must specify a contour of integration and thus a way of avoiding the pole. There are two ways of doing this: either add or subtract iε to y and then take the ε → 0 limit at the end.
The results for the Fisher metric components are denoted with a superscript ± depending on which sign ±iε prescription is used, Even though the 1 ρ 2 factors in the metric component are expected for AdS, the coefficients are not real numbers and the metric cannot be made purely real simply by multiplying by an overall complex factor. Therefore, this metric cannot be considered real Euclidean AdS 5 . This signals the instability present in the field-theory model, and indicates that the information geometry retains knowledge of whether the original field theory is well-defined.
Incidentally, if we were to just integrate the original Lagrangian (27) by parts, we could instead work with the action Now, when the Lagrangian density is evaluated on the instanton solution we can use the equation of motion to simplify the result and we get a well-defined probability distribution This is normalized with the same value of g as in (31), but this time it is positive semi-definite. By symmetry arguments we already know that the corresponding Fisher metric must be Euclidean AdS 5 . Explicit calculation yields the line element This example drives home what ought to be an obvious fact: the Fisher metric depends crucially on the choice of probability distribution. Furthermore, it is possible to arrive at a function that is not positive semi-definite, and therefore not a well-defined probability distribution, by starting from a perfectly well-defined probability distribution and adding a total derivative whose integral vanishes. This is a crucial fact when implementing the Hitchin prescription of taking the Lagrangian density evaluated on a solution to the equations of motion as the probability distribution. Additionally, we have demonstrated how the ambiguity in the choice of Lagrangian density allows us to transform from a situation which is unstable from the information geometry standpoint to a stable one by adding suitable total-derivative terms.

Information geometry on theory space: the Ising model
Here we perform a further investigation of which aspects of the physical theory are captured by the information geometry by considering the 2d Ising model. Studies of aspects of information geometry for the Ising and related spin models have been performed before in the literature (see, e.g., [42,44]). Here we focus on the 2d Ising model, which has the feature of admitting a map to an ostensibly quite different physical theory, namely a 1d free fermion field theory. In the first subsection, we shall examine the Fisher metric and its Ricci curvature for the 2d Ising model on the theory space spanned by its two couplings, and show that the geometry correctly captures the divergence along the critical line. We will then reproduce this behaviour in the second subsection for the 1d free fermion theory.

2d classical Ising model
Let us consider the 2d classical Ising model on a square lattice of spins σ i, j = ±1, with vanishing external magnetic field. Denoting the horizontal and vertical couplings by J and K, respectively, the Hamiltonian for an N ×N lattice may be written where we have identified both directions to form a torus, i.e, σ N +1, j = σ 1, j and σ i,N +1 = σ i,1 . Note that the state (i.e., spin configuration) satisfies a Boltzmann distribution at inverse temperature β, and thus we may write where the partition function is Hence, by exponentiating this normalization factor, we may express the distribution in the form of an exponential family: cf. (9) where the canonical coordinates θ ∈ {β J, β K}. Note that for distributions in the form (39), ln Z plays the role of the potential ψ, which means that in order to compute the metric (12), all we need is an expression for the free energy. In particular, in the thermodynamic limit N → ∞, the reduced free energy per site where we have defined κ ≡ csch(2β J) csch(2β K), and φ is some auxiliary angular parameter. This will play the role of the potential, i.e., The integral over the auxilliary parameter φ cannot be performed analytically, but we can nonetheless proceed to determine the form of the metric, and then evaluate the curvature numerically. To compute the derivatives in (43), we first re-express the free energy in terms of the canonical coordinates. By inspection, we identify Note that F 1 and F 2 are linearly independent, as required, since they correspond to couplings along different axes. In practice, we will be interested in evaluating the curvature for a range of couplings at fixed temperature, so we may equivalently absorb β into the couplings (i.e., β =1), whereupon the canonical and physical variables coincide. The metric (43) is rather unwieldy, so we do not write out the full expression here. However, we can still proceed to compute curvature invariants -in particular the Ricci scalar -to see which physical aspects of the 2d Ising model are reflected in the geometry. To evaluate the Ricci scalar, it is convenient to use the following expression in terms of the reduced free energy [42]: where g = detg i j . This provides a (very lengthy) expression for the Ricci curvature, which can be evaluated numerically for a range of couplings J, K. Results are shown in figure 1.
Of course, one of the most important features of the 2d Ising model is the presence of a finite-temperature phase transition along the critical line which manifests in the information geometry as the discontinuity in the curvature observed in fig. 1. It is encouraging that this important physical feature is captured by this approach,  (46). One sees clearly that the curvature diverges in opposite directions depending on which side one approaches from. We shall see below that this corresponds to the sign of the mass in the corresponding free fermion theory on either side of the phase transition, cf. fig. 2.
though it remains an open question as to which other aspects of the geometry are faithful representations of the original physics. For example, in the context of our earlier remark that noninteracting theories do not necessarily imply R= 0, here we have an example of the converse, namely that the vanishing of the Ricci scalar away from the critical line (and the divergence near infinity) clearly does not imply that the model ceases to be interacting. Thus, while the curvature clearly does capture important physical features (e.g., critical points), a complete mapping between the physics of the underlying model and the curvature of the geometry requires a more careful study. Indeed, as we discuss in section 8, the notion of curvature studied above is not necessarily the most appropriate for capturing certain information-theoretic notions.

1d free fermion theory
It is a well-known fact that the 2d classical Ising model can be mapped to a theory of noninteracting fermions in 1d. There are however many different ways of actually constructing the resulting field theory which, while they reproduce the correct critical behaviour, give slightly different expressions for the fermion mass in terms of the original 2d couplings; for a selection of different results, see [50][51][52][53]. Since we are primarily interested in comparing the critical behaviour, we shall proceed with [50], which -in the notation above -corresponds to setting J = K = 1, i.e., we treat β as the coupling and examine the geometry along the line of J ↔ K symmetry in fig. 1. Accordingly, we expect that the information geometry for the 1d free fermion theory will diverge at the critical (inverse) temperature cf. (46) with J = K = 1 and β = β c . We shall now see that the information geometry indeed reproduces this feature. For the isotropic case with J = K absorbed into β, we shall write the partition function (40) as where the sum in the exponential is over nearest-neighbor pairs (i j) and where we have included an explicit normalization by the total Hilbert space dimension 2 V , where V = N 2 is the total number of sites, for consistency with [50]. In this notation, the corresponding reduced free energy is 5 In this case, we have only a single parameter β, so the information metric is one-dimensional, cf. (43) with i = j = β. The second derivative of the reduced free energy with respect to β is essentially the specific heat, which is given in [50], The regular part consists of terms which are polynomial in β − β c and thus vanish as β → β c ; since we are interested in the dominant behaviour near criticality, we shall discard this piece henceforth. We will also disregard the overall prefactor, since this does not alter the divergence structure of the metric. Thus, for our purposes, we may summarize the result more compactly as which diverges to +∞ as β → β c . We wish to express (50) in terms of the mass parameter m that appears in the corresponding free Majorana fermion field theory. This model is also described in [50], with where t c ≡ 2−1 and t ≡ tanh β (the lattice spacing has been set to 1, so m is dimensionless). Solving this expression for β perturbatively around the critical point yields which we can then substitute into the metric component (51) to find where we have again dropped the regular piece and overall numerical prefactor. This result is plotted in fig. 2. We would now like to reproduce this result from a calculation in the low-energy effective field theory whose action is given, in Euclidean signature, by 5 Note that [50] writes this as F (β), as if this were the free energy. This is simply a matter of nomenclature. However, the standard definition of the free energy is F = − 1 β ln Z, which explains why we put a relative factor of −β between F and the reduced free energy f . At the end of the day, we want to take two derivatives of ln Z to get the metric. Dividing by the volume before taking derivatives ensures a well-defined continuum limit V → ∞.  (47), corresponding to the phase transition in the 2d model where one expects to find a conformal field theory. The mass also takes opposite signs on either side of the critical line in fig. 1 that matches the direction of divergence in the 2d curvature, i.e., we find m>0 on the side nearer the origin and m<0 on the side nearer β J = β K → ∞.
The lightcone coordinates (z,z) are related to the Cartesian coordinates (x, y) via and the lightcone derivatives are defined as To calculate the metric g mm for this theory, we first note that in the field theory analogue of the probability distribution (39), one identifies the probability of a particular field configuration with the integrand of the normalized path integral (in Euclidean signature) evaluated on that field configuration. One can then show (cf. eqs. (11) and (12) in [54]) that the Fisher metric is given by where the expectation values are taken with respect to the standard path integral, the derivatives are taken with respect to the coupling constants of the theory, and the volume divergence is explicitly divided out as usual. In the particular case when the coupling constant is a mass parameter, the first expectation value above reduces to a four-point function and the second to a product of two-point functions. In the special case of a free field theory, of which (55) is an example, one can use Wick's theorem to reduce the four-point function to sums (or differences) of products of pairs of two-point functions. For the example at hand, one finds The set of four possible two-point functions is given by [50] 〈ψ 1 ψ 2 〉 ψ 1 ψ 2 where the subscripts indicate the coordinate x i = (x i , y i ) at which the field or derivative is evaluated. Plugging these into the expression for g mm , and focusing just on the divergent part, simplifies the result to where we have introduced an ultraviolet cutoff because the integral is logarithmically divergent. Assuming Λ |m|, we get Renormalization will simply replace the cutoff with a renormalization scale, µ, which, in terms of comparing with g ββ in (50) and (54) is just part of the "regular part". As far as the divergence at criticality is concerned, we get the same behavior, Comparing this result with g ββ in (54), we see that both sides of the mapping retain the salient features, namely the location of the divergence (at m=0) and the type or degree of the divergence (logarithmic).

Information geometry on state space: coherent fermions
In this section, we will discuss the information geometry metric on a space of states of a quantum system. Our goal is to compute the information geometry metric on the space of coherent states of a quantum field theory of two free Majorana fermions (cf. [55]). It turns out that the definition of the Fisher metric with which we have been working so far does not capture the fully quantum nature of these states. In this case, instead of working with realvalued probability distributions, we must work with density matrices. We will introduce the concept of a Bures distance between two density matrices which we will use to define the Bures metric, the quantum analogue of the Fisher metric. Then, we will show that the Bures metric on the space of fermionic coherent states is the metric of a 2-sphere. So far, with regards to the 2d Ising model, we have been discussing the Fisher metric on the space of theories parametrized by J and K, or β (in the isotropic case J = K = 1). For the associated free fermion theory there is a single parameter, the fermion mass m, which may be obtained as a function of J and K when performing the map. However, as we did for the moduli space of scalar field instantons, inspired by the work in [26] on Yang-Mills instantons, we can also consider the information metric on a space of states.
In the case of the Yang-Mills and scalar instantons, Hitchin's proposal [40] of using the Lagrangian density as the probability distribution was used, which places those examples squarely within the scope of our introduction to information geometry and the Fisher information metric. However, when we are talking about the states of a system, particularly a quantum mechanical system, we are naturally led to generalize our definition of the Fisher metric. For example, we would like to be able to deal with mixed as well as pure states, which requires us to generalize the discussion of information geometry to include probability distributions that are matrix-valued, rather than simply real-valued.
To this end, we first point out that the Fisher metric can alternatively be defined in terms of so-called divergence functions. These divergence functions are actually bi-functionals D(p||q) taking as their inputs two different probability distributions p and q. As explained in [21], one can introduce a class of such distance-like measures called α-divergences parametrized by a real parameter α. We will not write down the general formula for the divergence as a function of α here; cf. (103) for the expression when α = ±1. However, for example, when α = 1 one obtains the familiar Kullback-Leibler divergence, i.e., the relative entropy, Due to the asymmetry between p and q, the divergence is not a true distance metric, but can be seen to be intimately related to the Fisher metric by considering the divergence between infinitesimally separated distributions p = p(x; θ ) and q = p(x; θ ), where where ∆θ i = θ − θ i is some infinitesimal change in the i th direction. The first ∆θ i -dependent term in the expression for relative entropy occurs at second order, and reads where turns out to be the Fisher metric introduced previously. 6 We stress that the Fisher metric can be derived in this way from an α-divergence for any real value of α (see appendix A); the Kullback-Leibler divergence is merely one example in this regard.
When it comes to generalizing the discussion of information geometry to include, for example, matrix-valued probablity distributions, one standard starting point is a divergence of some sort. The inputs of a divergence can be generalized from a standard real-valued probability distribution to a matrix-valued one. A matrix-valued probability distribution must still be positive semi-definite, meaning that its eigenvalues must be real and non-negative. The normalization condition of the probability distribution is replaced with a unit trace condition on the matrix-valued probability distribution, where this trace includes both the matrix trace as well as possible additional integrals (e.g., spacetime integrals). These properties precisely describe a density matrix. Therefore, we simply replace p and q in the divergence with ρ and ρ , which are two different density matrices. One can also include fermionic statistics by including Grassmann-valued probability distributions, or, as is often more convenient, defining states using creation and annihilation operators which satisfy anti-commutation rather than commutation relations.
For a general quantum field theory, the quantum analogue of the Fisher metric when working with density matrices is the Bures metric which is defined in the following way [55]. One first defines the Bures distance 7 between two density matrices ρ 1 and ρ 2 , and then expands this Bures distance to lowest nontrivial order in dρ, where ρ 2 = ρ 1 + dρ. The lowest order is second order, which then defines a line element and a metric, called the 6 Note that our definition of D(p||q) is actually related to the one in [21] by switching the order of p and q. This switch generates what is called a "duality". See [21] for further discussion of this duality and so-called dual connections. Suffice it to say that, had we not made this switch, then the Kullback-Leibler divergence would correspond to α = −1, rather than α = 1. The duality between α and −α makes this point moot and largely an aesthetic matter. 7 Note that the definition in [55] does not have the absolute value signs. Also note that an alternative convention has the Bures distance defined to be the square root of (68). This makes no practical difference in the subsequent derivation of the Bures metric for free fermionic coherent states.
Bures metric. For pure states ρ 1 = |ψ 1 〉 〈ψ 1 | and ρ 2 = |ψ 2 〉 〈ψ 2 |, the Bures metric reduces to the Fisher metric 8 (up to an overall factor) and is simply given by We now consider this metric on the space of coherent pure states. For a single spin, this was calculated in [56]. Here, the spin is parametrized in terms of a normalized three-dimensional vector (x 1 , x 2 , x 3 ) = (sin θ cos φ, sin θ sin φ, cos θ ) as where and the Fisher metric is found to be that of a two-dimensional sphere, An equivalent result is obtained for the Fisher metric of coherent pure states of two free Majorana fermions [55]. Let a and b be the annihilation operators for the two Majorana fermions, which satisfy anticommutation relations 9 and where all other anticommutators vanish. The coherent state is given by where λ is a complex parameter and |Ω〉 is the unentangled IR state which is annihilated by a and b, For the states (74), the Fisher metric is found to be that of a two-dimensional sphere as well, As an interesting fact we note that according to [33], within information geometry, so-called categorical distributions, which are different from the exponential families discussed in section 2.1 above, lead to spherical Fisher metrics. In information theory, categorical distributions are generalized Bernoulli distributions describing a discrete random variable with more than two possible outcomes with fixed probability. However, as again discussed in detail in [39], the map between probability distributions and metrics is not bijective, and also other distributions may lead to spherical Fisher metrics.

Symmetries of the Bures metric
In the previous section, we found that the Bures metric on the space of coherent states of a theory of two free Majorana fermions is the metric on a 2-sphere. In contrast, in the theory of one free complex bosonic scalar field, the coherent state is given by where the anticommutation relations in (73) are replaced with commutation relations and the Bures metric on these states turns out to be which is the metric on a 2-dimensional hyperbolic space.
In fact, we can show that these results for the Bures metric for the free fermions and bosons actually follow from the symmetries of their corresponding density matrices. This argument is directly analogous to the discussion in section 3. There, we showed that if a transformation on the parameters ξ of the probability distribution can be undone or compensated for by a transformation on the random variable x (cf. eqn. (24)), then the Fisher metric must be invariant under this transformation. In cases with a high degree of symmetry, such as for Gaussian distributions, these symmetries are enough to pin down the Fisher metric without any actual computation.
In the case when the family of real-valued probability distributions p ξ is replaced with a family of density matrices ρ ξ , the symmetries of these density matrices take the form of conjugation by some special unitary matrix, where I is the identity matrix. It is important to note that the converse of the above statement does not hold in general: it is possible that a unitary transformation of ρ ξ takes us out of the family of density matrices that we care about. That is, there can be unitary matrices U such that Uρ ξ U † cannot be written as ρ ξ for some ξ . In fact, this turns out to be the case for the bosonic coherent states. Interestingly, however, the case of the fermionic coherent states is an exception. This is because the fermionic coherent states are superpositions of just two states |Ω〉 and a † b † |Ω〉. Taking these two states as the basis, the density matrix reads where I is the 2 × 2 identity matrix, σ is the vector of Pauli matrices andn λ is the unit vector defined aŝ Any 2×2 special unitary matrix can be written as U = e iθn· σ for some real number θ and some unit vectorn. One can show that conjugating ρ λ by this U is equivalent to a rotation ofn λ by an angle θ around the axisn. In other words, the symmetry group of ρ λ can be equivalently thought of as SU(2) or SO(3), which is enough to argue that the Bures metric must be the metric on a 2-sphere. In fact, we can write the Bures metric in terms ofn λ as which makes the SO(3) symmetry manifest. The bosonic case is more complicated because the coherent states are superpositions of infinitely many states, namely 1 n! (a † b † ) n |Ω〉 for n = 0, 1, 2, . . .. In this basis, the (m, n)component of the density matrix, which we denote by (ρ λ ) mn , is given by Clearly, a transformation of λ will, in general, transform all of these components and therefore a unitary transformation that acts, for example, only on the first two rows and columns of ρ λ is not a symmetry of this family of density matrices because it cannot be written as ρ λ for some λ . It turns out that only three unitary transformations form bona fide symmetries of this family of density matrices and they form the symmetry group SO(2,1). We can describe the action of these transformations on ρ λ implicitly by defining their action on the vector which is a unit vector with respect to the SO(2,1) inner product, namely the inner product with signature (+, +, −). The SO(2,1) transformations ofm λ are simply rotations in the (1, 2)plane and boosts in the (1, 3)-and (2, 3)-planes. The corresponding transformations of ρ λ are somewhat more complicated, but one can nevertheless show that they are equivalent to conjugation by appropriately defined unitary matrices. 10 Therefore, the Bures metric must be SO(2,1)-invariant and the only possibility is 2-dimensional hyperbolic space. In fact, we can write the hyperbolic metric in terms ofm λ as which makes the SO(2,1) symmetry manifest. To summarize, the Bures metric on the space of coherent states of two free Majorana fermions is the 2-dimensional sphere, and for one complex bosonic scalar it is 2-dimensional hyperbolic space. In this section, we have shown that these metrics are determined quite elegantly by the SO(3) and SO(2,1) symmetry of the respective density matrices. We therefore observe that the fact that the information metric inherits the symmetries of the probability distribution in the classical case continues to hold in the quantum case with the appropriate quantum analogues. 10 For example, a rotation in the (1,2)-plane by an angle θ simply multiplies λ by a phase e iθ and thus multiplies (ρ λ ) mn by the phase e i(m−n)θ . This is equivalent to conjugating ρ by the unitary matrix whose (m, n)-component is U mn = e imθ δ mn , which can therefore be undone simply by conjugating by U † . The unitary transformation that undoes the effect of an infinitesimal boost with rapidity parameter ε 1 in the (1,3)-plane to linear order in ε is U = I + iε 2 u with u mn = inδ m+1,n − imδ m,n+1 and for an infinitesimal boost in the (2,3)-plane the matrix u is given by u mn = nδ m+1,n + mδ m,n+1 .

Different notions of curvature
In this section, we begin by clarifying the various notions of curvature that appear in the literature; in particular, the statement that non-interacting theories are flat generally refers to flatness with respect to the 1-curvature, not the 0-curvature leading to a metric connection. The failure to appreciate this distinction seems to have lead to some erroneous and potentially confusing claims. The following will draw primarily from [21]; see also [24] for a brief introduction.
For maximum clarity, let us start by recalling some basic differential-geometric notions. Recall that the covariant derivative ∇ may be expressed in local coordinates as where The vector Y is said to be parallel with respect to the connection ∇ if ∇Y = 0, i.e., ∇ X Y = 0 ∀X ∈ T S; equivalently, in local coordinates, If all basis vectors are parallel with respect to a coordinate system [ξ i ], then the latter is an affine coordinate system for ∇. A connection ∇ which admits such an affine parametrization is called flat, i.e., the manifold S is flat with respect to ∇. Now, with respect to a Riemannian metric g, one defines which defines a symmetric connection, i.e., Γ i j,k = Γ ji,k . If, in addition, ∇ satisfies or, equivalently, where g i j = 〈∂ i , ∂ j 〉, then ∇ is a metric connection with respect to the Riemannian metric g.
Connections which are both metric and symmetric are Riemannian. The above describes the familiar 0-connection, henceforth denoted ∇ (0) , with associated connection coefficients Γ (0) i j,k . The significance of such connections in physics -and Riemannian geometry more generally -is due to the fact that under a metric connection, parallel transport of two vectors preserves the inner product. However, the natural connections on statistical manifolds are generically non-metric, as we shall now explain.
If S = {p ξ } is an n-dimensional model as above, we may define the n 3 functions Γ where α ∈ . This defines an affine connection ∇ (α) on S via where g = 〈·, ·〉 is the Fisher metric (4). ∇ (α) is called the α-connection, and accordingly terms like α-flat, α-affine, α-parallel, etc. denote the corresponding notions with respect to this connection. Note that when α = 0 we recover the familiar metric connection above. Indeed, observe that cf. (91). Thus, while ∇ (α) is symmetric for any value of α by definition (cf. (93) and (89)), only the special case α = 0 defines a Riemannian connection ∇ (0) with respect to the Fisher metric.
In general, we note that Γ (α) i j,k can be obtained from the third-order term in the expansion of the α-divergence; see appendix A.
Physically, the significance of the 1-connection lies in the fact that it is intimately related to the Kullback-Leibler divergence or relative entropy. That is, one can obtain the Fisher metric for any α (see appendix A), but in the case of α = ±1, it is the Kullback-Leibler divergence which naturally induces the metric and the associated ±1-connections. 11 Additionally, the 1connection is intimately associated with the exponential families introduced in section 2.1, in that the canonical coordinates [θ i ] provide a 1-affine coordinate system, with respect to which S is 1-flat. To see this, observe that This implies that such that the curvature vanishes identically in this case. We stress that an α-family is not α-flat in general; rather, this property is special to α = ±1 [21]. It is also simple to show that for the exponential family Γ (α) i j,k is proportional to the thirdorder moments of F i : which indeed vanishes identically for α = 1. As mentioned above, non-interacting theories are only flat in the sense of 1-flatness, cf. the Gaussian example in section 2.2. But this same flatness holds for any model that can be put in the form of an exponential family, including the Ising model on theory space spanned by its couplings that we discussed above. It remains an open question as to precisely what information about the underlying physical theory is encoded in the different curvatures. For example, the divergence in the 0-curvature of the Ising model correctly captures the phase transition; however, note that the 1-curvature remains zero even along this critical line, and is therefore completely insensitive to the critical behaviour. An important question for the future is thus to determine which physical behaviour is captured by the different curvatures.
As a noteworthy fact in view of establishing new field theory/gravity dualities, we point out that for non-metric curvatures, in 2+1 dimensions a gravity action may be obtained from the Chern-Simons action for which the equation of motion implies that the covariant derivative of the curvature vanishes, Obviously, the case (96) in which the 1-curvature vanishes itself is a special solution to the more general equation of motion (99). This suggests a possible duality between field theories leading to an exponential family and gravity actions involving non-metric curvatures.

Discussion
In this paper, we have collected and discussed a number of general lessons that we feel are important in the application of information geometry to quantum field theory and to the AdS/CFT correspondence. Our discussion was framed around some simple examples: exponential families of probability distributions (of which the Gaussian is a representative member), scalar field instantons, and the 2d classical Ising model on a square lattice and its mapping to the theory of free massive Majorana fermions. For clarity, we enumerate these general lessons here: 1. Infinitely many different probability distributions give the same Fisher metric. The Fisher metric inherits the symmetries of the probability distribution, but the probability distribution does not necessarily need to enjoy all the symmetries of the Fisher metric.
In many of the cases studied in the literature, the probability distribution enjoys precisely the translation and scaling symmetries that suffice to force the Fisher metric to be AdS. In the light of our investigations, it is conceivable that there are other dualities relating quantum field theories to geometries. However, the point raised above has to be taken into account when studying these, and of course the most relevant question is what determines the dynamics of the dual gravity theory.
Additionally, we demonstrated in section 7 that this fact continues to hold even in the quantum case of the Bures metric on a space of quantum states. The examples we studied were free fermionic and bosonic coherent states. In this case, the probability distribution is replaced with a density matrix and its symmetries now take the form of conjugation by an appropriate unitary matrix. The symmetry groups turned out to be SO(3) for the fermions and SO(2,1) for the bosons, which implies that the Bures metric must be the 2-sphere and 2-dimensional hyperbolic space for the fermionic and bosonic cases, respectively.
2. There are two basic ways of applying information geometry to quantum field theories: one can compute a metric on the space of theories parametrized by coupling constants or on the space of states of a given theory with a fixed set of coupling constants.
For example, saying that the Fisher metric on a free real massive scalar field is AdS 2 is ambiguous and potentially misleading. This happens to be the Fisher metric on a particular set of coherent states parametrized by one complex coefficient [55], but is unrelated to the Fisher metric on the space of such theories parametrized by the mass. Which prescription one uses depends on what one wishes to study, and it is important to keep the distinction in mind.
3. The Fisher metric on a set of states of a quantum field theory is sensitive to whether or not the theory has a stable potential.
We demonstrated this phenomenon with the example of a massless real scalar field in four Euclidean dimensions with an inverted φ 4 potential. We considered the moduli space of instantons for this theory, parametrized by the center and width of the instanton. Symmetry arguments along the lines of point 1 above imply that the Fisher metric ought to be AdS 5 . However, there is an important ambiguity in the Hitchin prescription [40] of taking the Lagrangian density evaluated on the solutions to the equation of motion as the probability distribution on the space of those solutions. We showed two different Lagrangian densities, which were related by a trivial total derivative, one of which gave a well-defined probability distribution while the other did not.

4.
There are many different connections one can define in information geometry, most of which are not metric compatible with respect to the Fisher metric.
This technical distinction is obscured in some of the existing literature, both in recent physics works and older works in statistics while the basics of information geometry were still under development. It is relevant in light of claims that free theories lead to flat geometries, and a finer appreciation of these various curvature notions may be important for determining precisely what information about the underlying physical theory is encoded in the geometry.
With regard to the 2d Ising model, we note that the analysis presented here may be straightforwardly extended to the 3d Ising model as well. As in two dimensions, we expect the Fisher metric to diverge at the critical point determined numerically, for instance in [58], or using the conformal bootstrap as in [59]. The Fisher metric approach may also be of relevance for interpreting the 3d Ising model as a string theory, as proposed by Polyakov [60] and recently considered by Iqbal and McGreevy [61]. Here, the domain walls that separate up from down spins are interpreted as string worldsheets. As argued in [61], the identification of the world sheet target space with AdS 4 does not lead to a conformally invariant worldsheet sigma model. We expect that an interesting avenue to proceed in this context is to the determine the probability distribution of the domain wall worldsheet sigma model and to calculate the associated Fisher metric.
Finally, let us remark on a couple interesting and potentially fruitful connections between information geometry and holography, namely holographic RG and complexity:

Holographic RG
Intuitively, the flow along the RG can be thought of as a coarse-graining of degrees of freedom from the UV to the IR. Consequently, if one considers two nearby theories in the UV, more and more measurements will be necessary to distinguish them as one flows to the IR, cf. the intuition below (21). That is, the inability to probe fine-grained correlators implies a loss of distinguishability between nearby theories. In [45], this idea was made more precise by calculating the distance between quantum field theories using the Zamolodchikov metric, which is proportional to the Fisher metric studied here. In the context of the emergent spacetime or "it from qubit" paradigm, in which one takes the boundary CFT as ontologically prior and attempts to derive the bulk AdS along with its dynamics, this line of reasoning suggests that the classical spacetime deep in the IR may result from a coarse-graining procedure over suitablyparametrized UV theories. A related observation was made in [36] where it was suggested that the expectation value in the Fisher metric may be thought of as a statistical average over quantum fluctuations that gives rise to the classical spacetime. Thus, despite the cautionary lessons above, we regard it as a very interesting open question as to whether the connection between the information content of the field theory and the geometry resulting from the Fisher metric (or its quantum analogues) can be utilized to shed further light on gauge/gravity duality, and perhaps lead to "information/geometry" dualities in a wider context.

Complexity
Another potential application is in generalizing holographic complexity [47] to interacting and ultimately strongly coupled theories. While a great deal of exciting progress has been made in free theories (see [62][63][64][65][66][67][68][69] and related work), and attempts have been made to go beyond this restriction (see in particular [70], as well as [71][72][73][74]), a satisfying prescription for defining and working with complexity in holographic CFTs remains elusive. However, the basic idea underlying existing approaches is to geometrize the problem, and define complexity in terms of the distance between quantum states. Insofar as information geometry already provides what is in some sense the intrinsic geometry for a given distribution, it is therefore natural to ask whether this framework can be used to define complexity in general theories, including holographic CFTs.
In principle at least, one can already do this for any of the models considered above. For example, we could use the results of section 5 to define complexity for the 2d Ising model as the minimum geodesic distance between theories with different couplings. Practically however, the metric is so unwieldy that we could only proceed with our curvature calculation numerically, and even an approximate analytical expression for the geodesics seems beyond reach. In the case of AdS/CFT, one again encounters the question of how to suitably parametrize the boundary theory in order to apply this framework, which may largely determine the physical meaning of the results. One would also need to contend with the first lesson in the list above, namely that very different distributions -and hence, states/theories -may yield the same geometry, and hence the same complexity. Whether this is because the Fisher metric is not a sufficiently refined means of probing these theories, or hints at some deep connection between them, remains to be seen. Nonetheless, in light of the significant efforts to quantify complexity seen in the past couple years, it seems worth investigating whether methods from information geometry can be fruitfully applied to go beyond the limits of current approaches.

Acknowledgements
We are grateful to Jan de Boer for discussions and hospitality at the University of Amsterdam. We also thank Souvik Banerjee, René Meyer and Alexander Krikun for discussions. K. G. acknowledges funding through a Hallwachs-Röntgen fellowship. J. E. and K. G. are supported by the Würzburg-Dresden Cluster of Excellence on Complexity and Topology in Quantum Matter-ct.qmat (EXC 2147, Project-id No. 39085490). R. J. is a member of the Gravity, Quantum Fields and Information group at AEI, which is generously supported by the Alexander von Humboldt Foundation and the Federal Ministry for Education and Research through the Sofja Kovalevskaja Award.

A Derivation of the Fisher metric and connection coefficients for all α
In this short appendix, we present derivations for two important quantities, namely the Fisher metric and the α-Christoffel symbol, by expanding the α-divergence in a small perturbation of the distribution. We stress that these expressions hold for all α, not just the special case of α = ±1 considered in the main text.