Local optimization on pure Gaussian state manifolds

We exploit insights into the geometry of bosonic and fermionic Gaussian states to develop an efficient local optimization algorithm to extremize arbitrary functions on these families of states. The method is based on notions of gradient descent attuned to the local geometry which also allows for the implementation of local constraints. The natural group action of the symplectic and orthogonal group enables us to compute the geometric gradient efficiently. While our parametrization of states is based on covariance matrices and linear complex structures, we provide compact formulas to easily convert from and to other parametrization of Gaussian states, such as wave functions for pure Gaussian states, quasiprobability distributions and Bogoliubov transformations. We review applications ranging from approximating ground states to computing circuit complexity and the entanglement of purification that have both been employed in the context of holography. Finally, we use the presented methods to collect numerical and analytical evidence for the conjecture that Gaussian purifications are sufficient to compute the entanglement of purification of arbitrary mixed Gaussian states.

We exploit insights into the geometry of bosonic and fermionic Gaussian states to develop an efficient local optimization algorithm to extremize arbitrary functions on these families of states. The method is based on notions of gradient descent attuned to the local geometry which also allows for the implementation of local constraints. The natural group action of the symplectic and orthogonal group enables us to compute the geometric gradient efficiently. While our parametrization of states is based on covariance matrices and linear complex structures, we provide compact formulas to easily convert from and to other parametrization of Gaussian states, such as wave functions for pure Gaussian states, quasiprobability distributions and Bogoliubov transformations. We review applications ranging from approximating ground states to computing circuit complexity and the entanglement of purification that have both been employed in the context of holography. Finally, we use the presented methods to collect numerical and analytical evidence for the conjecture that Gaussian purifications are sufficient to compute the entanglement of purification of arbitrary mixed Gaussian states. Gaussian states form one of the most prominently used and best understood families of quantum states. The standard definition covers bosonic [1][2][3] and fermionic [4] Gaussian states both pure and mixed. They naturally appear as ground and thermal states of quadratic Hamiltonians in physical systems and are hence ubiquitous in non-interacting quantum many-body systems in the condensed matter context and as vacua in free field theories. Bosonic Gaussian states are heavily used in the study of bosonic systems with negligible interactions, such as Bose-Einstein condensates [5], instances of systems of cold atoms in optical lattices [6] and to a very good approximation photonic systems [7]. Their fermionic counterparts are equally important for the study of fermionic quantum many-body systems, including systems captured by the Bardeen-Cooper-Schrieffer (BCS) theory [8] or the Hartree-Fock framework [9] that can be seen as a variational principle over Gaussian fermionic states. Other applications range from field theories [10], continuous variable quantum information [1][2][3], relativistic quantum information [11] and quantum fields in curved spacetime [12].

CONTENTS
Mathematically speaking, pure Gaussian states can be seen as forming Kähler sub-manifolds of the projective Hilbert space, i.e., they have a natural notion of distance (Riemannian manifold with metric) and the structure of a classical phase space (symplectic manifold with symplectic form). This mathematical structure will be heavily relied on in this work, where pure Gaussian bosonic and fermionic states are the focus of attention. For systems constituted of N modes, the manifold of pure bosonic states M b and of pure fermionic states M f can be constructed as a symmetric space, i.e., as a quotient of two Lie groups, namely where Sp(2N, R) is the symplectic group, O(2N ) the orthogonal group and U(N ) the unitary group. When restricting to one superselection sector of the parity of the fermion number, we can restrict to the special orthogonal group SO(2N ). Gaussian manifolds come with a natural group action of the respective groups, which we can exploit when performing local optimization. We optimize over bosonic and fermionic Gaussian manifolds by taking the natural geometry into account, i.e., the notion of distance between different quantum states as measured by the Fubini-Study metric [13,14]. Given a Riemannian manifold M with local coordinates x = (x µ ) and positive-definite metric g = (g µν ), such that v · u = g µν v µ u ν , the gradient descent vector field F µ of a function f is where G µν is the inverse of g µν with G µσ g σν = δ µ ν . Typically, the inverse metric G µν needs to be reevaluated at every point of the manifold, but for Gaussian states we can explicitly construct a basis in which the matrix representation of G µν is constant. This provides a crucial speedup of the underlying algorithm. The goal of this manuscript is two-fold: First, we demonstrate how the rich geometry of pure bosonic and fermionic Gaussian states can be exploited to find extremal points of arbitrary real functions without dealing with redundant directions or parametrizations. Second, we present a unified framework to describe pure bosonic and fermionic states and carefully review how to convert between other representations of Gaussian states. This ensures that a reader can seamlessly apply our methods to their problem of choice.
A crucial motivation for our work stems from the goal to compare entanglement of purification (EoP) and complexity of purification (CoP) in free quantum fields, which recently attracted increasing interest in the context of applying quantum information methods to holography and field theory. We provide the GaussianOptimization.m Mathematica package as a simple implementation of our methods, which is used successfully in [15] to study EoP and CoP in quantum field theory.
This manuscript is structured as follows: In section II, we review a unified formalism to treat pure bosonic and fermionic Gaussian states and compute the resulting Kähler geometry (positive-definite metric, symplectic form) on the resulting state manifold. In section III, we provide a comprehensive treatment of the most commonly used parametrizations of pure Gaussian states and how to convert between them. In section IV, we use the geometry of the pure Gaussian state manifold to develop a gradient descent algorithm with an efficient evaluation of G µν which avoids over-parametrization of tangent directions. The following section V is devoted to applications, including the well-known problem of finding approximate ground states, computing Gaussian entanglement of purification (EoP) and Gaussian complexity of purification (CoP), for which a given function f is optimized over all Gaussian purifications of a given mixed Gaussian state. In section VI, we use our methods to collect numerical and analytical evidence for two conjectures stating that for mixed Gaussian states, the Gaussian EoP is actually optimal (and thus coincides with regular EoP), as well as stating which system decompositions are necessary to reach this optimum. Finally, we conclude with a discussion of our results in section VII.

II. REVIEW OF GAUSSIAN STATES
We introduce bosonic and fermionic, pure and mixed, Gaussian states with a particular emphasis on the geometry of the state manifold. While standard reviews of Gaussian states include ref. [1] based on covariance matrices, we follow the conventions of refs. [16,17] based on linear complex structures that provides a basis independent and unified treatment of bosons and fermions.

A. Quadrature operators and Majorana modes
Bosonic and fermionic quantum systems with N modes can be constructed from N creation or annihilation operatorsξ a,a † ≡ (â 1 , · · · ,â N ,â † 1 , · · · ,â † N ) , which satisfy commutation relations [â i ,â † j ] = δ ij for bosons or anti-commutation relations {â i ,â j } = δ ij for fermions. Instead of (4), we can choose a basis of 2N Hermitian operatorŝ ξ q,p ≡ (q 1 , · · · ,q N ,p 1 , · · · ,p N ) , which are related to the first by the equationŝ with [q i ,p j ] = i δ ij for bosons and {q i ,q j } = {p i ,p j } = δ ij for fermions. Most readers will be familiar with this notation for bosonic systems, where the Hermitian basis from (5) are called quadratures. However, we will use the same naming convention for Hermitian fermionic operators, which often go by the name of Majorana modes, just as our creation and annihilation operators from (4) referred to both bosonic or fermionic variables. The goal of these conventions is to treat bosons and fermions in a unified framework. All of our formulas containing indices a, b, c will be manifestly independent from the chosen basisξ, but when giving concrete examples, we will typically provide the explicit matrix representations for the two bases from (4) and (5). The position of the index indicates if the corresponding matrix row or column refers to the classical phase space (upper index) or its dual (lower index). We use Einstein's summation convention where we implicitly assume to sum over repeated indices, where we are only allowed to pair an upper and a lower index. 1 This formalism is heavily used in general relativity and high energy physics literature, but is particularly suitable for the unified treatment of bosonic and fermionic Gaussian states. We will use the symbols q,p ≡ and a,a † ≡ to indicate that the RHS of the equation gives the explicit matrix representations in the basis (5) and (4), respectively.
As we will see, Gaussian states are uniquely specified by their two-point correlation functions in the fundamental operatorsξ. For any state ρ, we denote the expectation value of an operatorÔ as Ô = tr(ρÔ). We may separately consider the symmetrized and antisymmetrized part of these correlations, given by the two real bilinear forms G ab = ξ aξb +ξ bξa (7) Ω ab = − i ξ aξb −ξ bξa , (Requirement: z a = ξ a = 0) , where we restrict to z a = 0 for all a for the purpose of this manuscript to present bosons and fermions in parallel. 2 For bosons, the symplectic form Ω is fixed by canonical commutation relations (CCR), while the positive-definite metric G contains the physical correlations; for fermions, the situation is reversed, with G fixed by canonical anticommutation relations (CAR) and Ω describing the physical correlations. In summary, we have With respect to our bases, we have the state-independent expressions When having chosen a set of creation and annihilation operators, our Hilbert space H is spanned by the orthonormal basis of number eigenvectors |n 1 , . . . , n N witĥ n i |n 1 , . . . , n N = n i |n 1 , . . . , n N , wheren i =â † iâ i and n i ∈ N ≥0 for bosons and n i ∈ {0, 1} for fermions. We now consider the state-dependent bilinear forms containing the physical correlations. These are contained in the covariance matrix Γ ab , defined as 3 We can combine the state-dependent and stateindependent parts into a single expression. Due to the fact that G ab is always positive-definite, we can invert it to define its inverse g ab = (G −1 ) ab with G ac g cb = δ a b . Similarly, we define ω ab = (Ω −1 ) ab satisfying Ω ac ω cb = δ a b . This enables us to define the linear map

Real basis Complex basis
Bosons Phase space (qj ,p k ) Also: (xj,p k ) Fermions Majorana modesma Also: γa, ca, (cj,c k ) Overview of notations for operator bases. Listed are real (self-adjoint) and complex operator bases for bosons and fermions, as well as a unified notation used throughout this work. For an N -mode quantum system, indices are in the range j, k ∈ {1, . . . , N } or a, b ∈ {1, . . . , 2N }. The creation and annihilation operators in a complex basis satisfy canonical commutation/anti-commutation relations (CCR/CAR). Commonly used alternative notations are also listed, some omitting the hat notation for Hilbert space operators.
which depends on the state under considerations. We will see in (17) that for Gaussian states the two formulas in (15) coincide, completely specifying all correlations for both bosons and fermions.

B. Definition of pure Gaussian states
Up to this point, the quantum state ρ has been assumed to be an arbitrary quantum state in the Hilbert space (with z a = 0). In what follows, we put a specific emphasis on the set of pure Gaussian states. There are many equivalent definitions in the literature: One may define Gaussian states as those satisfying Wick's theorem, as ground states of non-interacting (i.e., quadratic), non-degenerate Hamiltonians, or as states vanishing under a full set of specific annihilation operators. Here, we use yet another equivalent, though very compact definition based on (15), which says for a state ρ that If this holds, both formulas in (15) coincide 4 , such that ρ is pure Gaussian ⇔ −G ac ω cb = Ω ac g cb . (17) As a pure state, we can write ρ = |ψ ψ| for a normalized state vector |ψ . One can show that this state vector |ψ is uniquely determined (up to a complex phase) by either the covariance matrix Γ ab or equivalently by the complex structure J, which we use a label to write |ψ = |J = |J . 5 4 We can prove this by computing (Gω) −1 = ω −1 G −1 = Ωg and vice versa. Moreover, (16) implies J −1 = −J. The two relations together imply (17). 5 If we allow for bosons z a = ξ a = 0, we would need to include this in our label of the state vector to write |ψ = |J, z = |J, z .
An alternative and completely equivalent definition of pure Gaussian states can be phrased in terms of J, where |J is the solution of the equation This definition is based on the observation that the eigenvectorsξ a ± of J with eigenvalues ± i are given by 6 The variablesξ a ± behave in many ways as creation and annihilation operators, but do not require a specific basis in phase space, which enables a compact covariant proof of Wick's theorem. Moreover,ξ a ± spans the Ndimensional complex eigenspaces V ± i of J, which are the spaces of creation or annihilation operators associated with |J , respectively. We refer toξ a ± as phase space covariant creation and annihilation operators, which satisfy the following commutation (bosons) or anti-commutation (fermions) relations: where we introduced the 2-point function For a given state vector |J , we can always choose a basiŝ in which Ω and G simultaneously take the standard forms In contrast to (10) or (11), where only the respective background structure (Ω for bosons or G for fermions) takes this form, while the other may take any allowed form, we have now chosen the basis {ξ a } attuned to |J , so that also Γ (G for bosons, Ω for fermions) takes the above standard form. In this basis, we find 7 6 It is easy to verify J a bξ b Most of the relevant intuition for Gaussian states for N modes comes from considering one bosonic or two fermionic modes, as reviewed in the following examples. We also consider one fermionic mode to understand why such a system is almost trivial. We will further see explicitly that the families of fermionic Gaussian states consist of two disconnected components.
We consider a single bosonic mode withξ a,a † ≡ (â,â † ). With respect to the number eigenvectors |n , the most general Gaussian state vector with z a = 0 is (25) where φ ∈ [0, 2π] and ρ ∈ [0, ∞). With respect to above bases, one finds In summary, Gaussian states of a single bosonic mode form a two-dimensional plane parametrized by polar coordinates (ρ, φ).
Example 2 (Single and two mode pure Gaussian fermionic states). We consider a single fermionic mode withξ q,p ≡ (q,p) a,a † ≡ (â,â † ). There are only two distinct pure Gaussian states, which are characterized by the state vectors whose covariance matrix and complex structures In summary, there are only two distinct Gaussian pure states for a single fermionic mode. Consequently, we consider two fermionic modes withξ , where the most general Gaussian state vectors are with θ ∈ [0, π] and φ ∈ [0, 2π]. Their covariance matrix and complex structure are with δ ± = 1±1 2 , i.e., δ + = 1 and δ − = 0. In summary, Gaussian states of two fermionic modes form two disconnected unit spheres parametrized by angles (θ, φ), where we further distinguish the Gaussian state vectors of type |J + and |J − . The two sets are distinguished by the parity operatorP = exp(i πN ), as the total number operator N = iâ † iâ i is even for |J + and odd for |J −

C. Gaussian transformations
In this section, we will introduce a special set of unitary transformations that map Gaussian states into Gaussian states. They are generated by operators that are quadratic inξ a . We define the Lie group G as linear transformations on the classical phase space V that preserve the symplectic form Ω ab for bosons or the metric G ab for fermions which we represent as matrices M : The associated Lie algebras g are then defined as 8 We can construct a (projective) representation of these Lie groups as unitary operators U(M ) on Hilbert space by exponentiating quadratic operators. For this, we first define an identification between Lie algebra elements K a b ∈ g and anti-Hermitian quadratic operators K with which is uniquely fixed by the requirement For any M = e K , we define the squeezing operator where ∼ = implies equality up to a complex phase. For fermions, products of M = e K for K ∈ so(2N, R) will only generate the subgroup SO(2N, R), whose group elements satisfy det M = 1. To generate other group ele- representing where the equality turns out to hold up to an overall sign, i.e., the complex phase is 0 or π. Furthermore, we can read off the group element M from S(M ) by its action onξ a via the relation Every Gaussian state vector |J has a stabilizer subgroup which preserves Γ and J. Similarly, the associated unitary transformation U (M ) will preserve the quantum state vector |J up to a complex phase, i.e., we have U (M ) |J ∼ = |J for all M ∈ U(N ). This defines the Lie subalgebra Similarly, we have K |J ∝ |J . Given a Gaussian reference state vector |J 0 , we can reach any other Gaussian target state vector |J via The solution of the equation M Γ 0 M ⊺ = Γ is not unique, as we can always multiply by u ∈ U(N ) associated We can fix a special solution T by imposing the condition T Γ 0 = Γ 0 T ⊺ leading to the simpler equation J = T J 0 T −1 = T 2 J 0 , which is solved by T 2 = −JJ 0 . We define this as the relative complex structure 10 It captures the full basis independent information about the relationship of the two Gaussian states J and J 0 . We have the following properties: • Bosons. The spectrum of ∆ consists of pairs (e 2ri , e −2ri ) with r i ∈ [0, ∞), such that T = √ ∆ has eigenvalues (e ri , e −ri ). ∆ is diagonalizable and a symplectic group element.
• Fermions. The spectrum of ∆ consists of quadruples (e i 2ri , e i 2ri , e − i 2ri , e − i 2ri ) with r i ∈ (0, π 2 ) or pairs (1, 1) or (−1, −1), which correspond to r i ∈ {0, π 2 }. If the number of pairs (−1, −1) is even, i.e., the eigenvalue −1 appears with multiplicity divisible by four, J and J 0 lie in the same topological component of fermionic Gaussian states, i.e., they can be continuously deformed into each other. Otherwise, i.e., if the number of eigenvalue pairs (−1, −1) is odd, J and J 0 live in separate components. T = √ ∆ is only well defined in the former case and has quadruple eigenvalues (e i ri , e i ri , e − i ri , e − i ri ) for r ∈ (0, π 2 ). If there are eigenvalue quadruples (−1, −1, −1, −1), there are different, but equivalent ways 11 to define T as a real linear map with T 2 = ∆ in this sub block corresponding to choosing different eigenvectors for the quadruple of eigenvalues (i, i, − i, − i).
We can bring ∆, T and K = log T into block-diagonal form. We find 2 × 2 one-mode squeezing blocks for bosons and 4×4 two-mode squeezing blocks for fermions. The parameters {r i } from above correspond to ρ 2 in our bosonic Example 1 and θ 2 in our fermionic Example 2.
10 Sometimes also referred to as relative covariance matrix [19,20]. 11 In essence, T describes half way on the shortest path between Γ 0 and Γ. The eigenvalues (−1, −1, −1, −1) imply that Γ 0 and Γ are on opposite poles of spheres, in which case all the points on the equator are equivalent choices of being half-way.
From the relative complex structure ∆ = T 2 = −JJ 0 , we compute the generator such that |J ∼ = e K |J 0 . We can always change basis to reach a standard forms φ = π 2 , where we can read off the eigenvalues (e ρ , e −ρ ) of ∆.
Example 4 (Fermions revisited). We reconsider Example 2. For a single fermionic mode, we choose the reference state vector |J 0 with The stabilizer subgroup U(1) consists of the same elements as in (44), which coincides with the group SO(2, R). Consequently, the only group elements that transform |J 0 = |J + into |J − lie in the disconnected component. We also reconsider two fermionic modes with reference state vector |J 0 given by There is a 4-dimensional subspace of these generators also satisfying [K, J 0 ], which generates U(2) ⊂ O(4, R). We can reach the most general complex structure J + by a continuous path generated by for ∆ = −J + J 0 . To reach state vectors of the form |J − , we must also apply an additional transformation We can always change basis to reach a standard forms φ = 0, where we can read off the eigenvalues (e i θ , e i θ , e − i θ , e − i θ ) of ∆.

D. Geometry of pure Gaussian states
The family of pure Gaussian states forms a differentiable manifold M. It provides a versatile tool for analytical and numerical studies of bosonic and fermionic quantum systems with applications ranging from condensed matter [5,[7][8][9] and quantum information [1,3] to quantum optics [7] and field theory [12]. Mathematically, M is a symmetric space (type CI for bosons and DIII for fermions) and has the properties of a so-called Kähler manifold. The latter makes Gaussian states particularly suitable for variational studies, where ground states and time evolution are approximated on a suitable subset of Hilbert space. In the following, we will discuss the rich geometry of this manifold, which will play an important role when one wishes to locally optimize a function on it. We closely follow the conventions of ref. [17], which contains a comprehensive review of the geometry of variational families, which in turn builds upon ideas of the time-dependent variational principle [21,22].
We recall our definition of Gaussian state vectors |J as normalized vectors in Hilbert space, such that their linear complex structure J a b satisfies J 2 = −1. Knowing Γ does not fix the complex phase of the Hilbert space vector, i.e., Γ actually describes elements of a projective Hilbert space P(M) which we could represent as (pure) density operators ρ Γ = |J Γ| rather than Hilbert space vectors |J . However, we often prefer to think of pure quantum states as state vectors |ψ rather than density operators ρ = |ψ ψ| and accept that we need to keep in mind that these vectors are actually only defined up to a complex phase, i.e., |J ∼ = e i ϕ |J .
Given a covariance matrix Γ of a pure Gaussian state vector |J , we are only allowed to change it in such a way that respects symmetry (symmetric for bosons, antisymmetric for fermions) and preserves purity (J 2 = −1). For the infinitesimal change δΓ ab , we thus find the constraints δΓ ab = δΓ ba , δΓJ ⊺ = JδΓ , (bosons) Knowing the change of the covariance matrix Γ does not uniquely fix the change of the state vector |J , as we could also change the complex phase. Such change would be proportional to i |J . To remove such pure change of gauge, we require that the tangent vector |δΓ = δΓ ab |V ab is orthogonal to |J itself, i.e., Γ|V ab = 0. Under this condition, one can derive [17] |V ab = .
This allows us to compute the inner product between two different variations δΓ and δΓ as where g and ω are real linear forms on the tangent space, i.e., the space of allowed variations δΓ ab subject to (54).
Given a Gaussian state vector |J and a Lie algebra element K ∈ g, we compute the induced variation This is the linear map δΓ K : g → T Γ M : K → δΓ K . Its kernel consists of all Lie algebra elements that do not change the covariance matrix Γ and is thus from (45). We define its orthogonal complement 12 which is isomorphic to the tangent space T Γ M. This will allow us to exploit the group structure of Gaussian states to compute gradient descent with respect to g and symplectic evolution with respect to ω without needing to evaluate them at every step.
We reconsider a single bosonic mode from example 4 at the state vector |J 0 with Γ 0 and J 0 defined in (52). The tangent space can be parametrized as The associated Hilbert space vector |δΓ = δΓ ab |V ab is We further find δΓ|δΓ = aã+bb 12 It is the genuine orthogonal complement on the Lie algebra g with respect to the Killing form K(K,K) = 2N tr(KK).
Example 6 (Fermionic tangent spaces). We reconsider Example 4. For a single fermionic mode, he tangent space is trivial, i.e., zero-dimensional, because the set of pure Gaussian states consists of two discrete elements. We therefore directly consider two fermionic modes with reference state vector |J 0 defined in (52). The tangent space is then parametrized as The associated Hilbert space vector |δΓ = δ ab |V ab is We further find δΓ|δΓ = aã+bb

E. Parametrization of Gaussian states
In the previous sections, we saw that a Gaussian state vector |J is uniquely (up to a complex phase) characterized by its complex structure J. For our purpose, it is more efficient to parametrize Gaussian states by first choosing a reference complex structure J 0 and then label the Gaussian state vector |J M by the group transformation M , such that J M = M J 0 M −1 . While M is not unique for a given J M , i.e., the map M → J M is not injective, it suffices for the purpose of optimization if we can efficiently compute gradients on the group manifold.
We will decompose the space of directions on the group into redundant directions (not changing the state) and non-redundant directions (that change the state). This space is called tangent space T JM M and it can be described by the allowed variations δJ M of the complex structure J M . These variations are not completely free, because we only allow variations that respect the conditions on a complex structure J, i.e., J 2 = −1 and that it is compatible with symplectic form Ω or metric G.
For a reference J 0 and a group element M , we can associate to every Lie algebra element K ∈ g the variation This is the change induced from moving along M e tK away from J M .
In some situations, we may not wish to parametrize the full manifold of Gaussian states, but only a subset. This applies in particular when optimizing over Gaussian purifications |J of a mixed Gaussian state ρ A , i.e., we i.e., the set of purifications of ρ A is generated from |J by the subgroup This group only affects the Hilbert space H B , such that the reduction of |J ′ onto H A will not change and thus stay to be ρ A . In summary, we will consider a subalgebra g ′ ⊂ g that generate the allowed transformations, e.g., the ones only changing the subsystem H B . Analogous to the decomposition g = u(N )⊕ u ⊥ (N ), we can then define In this case, a basis of h ′ ⊥ consists of a maximal set of generators Ξ µ ∈ g ′ that lead to linearly independent changes of the state J 0 .
In summary, our parametrization of Gaussian states or subfamilies is based on the following ingredients, which are illustrated in fig. 1.
• Reference state vector |J 0 . We specify a pure Gaussian state vector |J 0 as reference by its complex structure J 0 .
• Subalgebra g ′ of allowed transformations. We specify a subalgebra g ′ ⊂ g that we use to generate any other allowed state.
• Generated subgroup G ′ . The Lie subalgebra g ′ generates the Lie subgroup G ′ ⊂ G. • Stabilizer h ′ of J 0 . We define the subalgebra • Tangent space T JM M. We span the tangent space at a given Gaussian state We therefore choose a basis Ξ ≡ (Ξ 1 , . . . , Ξ m ) of h ′ ⊥ and then compute We can simplify this to find where we have unified the expressions for bosons and fermions.

F. Purification of mixed Gaussian states
An important class of sub-manifolds of pure Gaussian states that are related by the action of some subgroup G ′ ⊂ G are Gaussian purifications of a given mixed Gaussian state ρ. Various measures of quantum correlations, such as entanglement of purification (EoP) or complexity of purification (CoP), are defined as some critical value on such manifolds, which we review in section V. Here, we discuss the properties of the underlying manifold of Gaussian purifications.
In section II B, we only considered pure Gaussian states, which we introduced as those states, for which the complex structure J satisfies J 2 = −1. A mixed Gaussian state ρ is still fully characterized by J as computed in (15), but which now satisfies the condition This implies that the eigenvalues J appear in conjugate pairs ± i c i with c i ∈ [0, 1] for bosons and c i ∈ [1, ∞) for fermions. We do not refer to such J as complex structures (unless J 2 = −1, i.e., all c i = 1, in which case ρ is pure), but rather as a restricted complex structure. As all the eigenvalues are imaginary, we can diagonalize J only over the complex numbers. If we only use real transformations, we can only bring J into a block-diagonal form with antisymmetric 2 × 2 blocks.
In contrast to pure Gaussian states, it is not sufficient to require (77) to ensure that ρ is Gaussian. Instead, we need to require that where q ab is a positive-definite bilinear form, i.e., ρ is the exponential of a quadratic operator (c 0 fixes the normalization Tr(ρ) = 1). We will later see in formula 8 how J, q and c 0 are related.
We now consider purifications of Gaussian states. For this, we refer to the original Hilbert space as H 1 with associated operatorsξ 1 ≡ (q 1 ,p 1 , . . . ,q N1 ,p N1 ) and classical phase space V 1 . We consider a mixed Gaussian state ρ 1 in this system with complex structure J 1 . From our previous discussion of the eigenvalues ± i c i of J 1 , we can find for every fixed basisξ 1 a group transformation T 1 ∈ G 1 (symplectic or orthogonal transformation on V 1 ), such that with respect toξ 1 . We have the mixed state standard (80) with squeezing parameters r i and the 2 × 2 matrix It is well-known that such a mixed Gaussian state ρ 1 can be purified by adding more degrees of freedom. For this, we extend the Hilbert space from . The purification of ρ 1 is then a state vector |J in the larger Hilbert space H ′ , such that This requires H 2 to be sufficiently large, such that all mixed modes can be purified. In light of our previous considerations involving the squeezing parameters r i , system H 2 must contain at least as many modes N 2 as there are non-zero parameters r i associated in the standard form of J 1 to ρ 1 . The Gaussian purification can then be inferred by constructing the complex structure J on the larger phase space V 1 ⊕ V 2 , such that the restriction [J] 1 to V 1 yields J 1 . Put differently, every non-zero squeezing parameter r i corresponds to an individual bosonic or fermionic degree of freedom that can and needs to be purified by correlating it with an additional auxiliary degree of freedom. Of course, we are always free to add even more auxiliary modes that are not correlated. The resulting purified form of J with respect to an enlarged basisξ ′ = (ξ 1 ,ξ 2 ) is given by for arbitrary T 2 ∈ G 2 , i.e., any such T 2 will lead to a valid purification |J . The purified standard form J p sta has been derived in [23] as where we used the 2 × 2 matrices Let us now discuss how unique a chosen purification is. From the perspective of pure states, we can act with arbitrary unitary operators U = 1 1 ⊗ U 2 on the state vector |J , i.e., |ψ U = U |J , while preserving the property ρ 1 = tr H2 |ψ U ψ U |. This is well-known in the context of the Schmidt-decomposition of |J . However, acting with such a general U will generally lead to a non-Gaussian state vector |ψ U . Instead, we restrict to Gaussian unitaries of the form S(M ) = 1 1 ⊗ S 1 (M 1 ) with M 2 ∈ G 2 , where S(M ) has been introduced in Section II C. Here, we used the representation theory of the Lie group G, i.e., the symplectic group for bosonic systems and the orthogonal group for fermionic ones. From the requirement that S(M ) must act as the identity on H 1 , we have We therefore recognize exactly the setup described in Section II E with the subgroup G ′ from (72). We can use the standard form of the purified J find the subalgebra h ′ ⊂ g ′ that preserves J. More precisely, we have If the restriction J 2 = [J] 2 would describe a pure Gaussian state, the resulting algebra h ′ would be isomorphic to u(N 2 ). However, for a mixed state complex structure J B , the algebra h ′ will be smaller, which consequently means that its orthogonal complement h ′ ⊥ of those Lie algebra elements that change the state vector |J J| will be of higher dimension than u ⊥ (N 2 ). It turns out that we have h ′ = 0 if there are no r i = 0. Otherwise, we have h ′ ≃ u(N 0 ) if there are N 0 distinct parameters r i = 0, which correspond to the allowed Gaussian unitaries that change the pure Gaussian state contained in ρ B . Put differently, we have where ρ r is generally a mixed Gaussian state of a single bosonic or fermionic mode with ρ r = 1 sinh r cosh r e −n ln coth r (bosons) sin r cos r e −n ln tan r (fermions) , wheren is the number operator of a single bosonic or fermionic mode. In summary, provided that all r i = 0, we have h ′ = 0, such that the set of orthogonal generators is given by where g 2 are the generators of G 2 . In this specific case, this orthogonal set forms itself a Lie algebra.

III. REPRESENTATIONS OF GAUSSIAN STATES
The literature on quantum physics is for good reason full of Gaussian states for bosonic and fermionic systems Characteristic function (III C) Characteristic function (III C) Quasiprobability distribution (III D) Quasiprobability distribution (III D) Characteristic function (III C) Characteristic function (III C) Quasiprobability distribution (III D) Quasiprobability distribution (III D) Wave function (III I) and they appear under various names and they a multitude of different forms. In this section, we attempt to provide a comprehensive dictionary that collects the commonly used notions of characterizing Gaussian states and explains how to convert between them. Table II provides a summary of these notions, which we review in the following sections including compact conversion formulas.
All conversions are based on standard linear algebra operations, i.e., matrix addition and multiplication, evaluation of eigenvalues and so on. Our formulas will include matrix functions f (M ) which can be either evaluated as power series or by applying f on the eigenvalues of M . Note that we do not require M to be symmetric or Hermitian, as it suffices that M is diagonalizable for f (M ) to be well-defined.
In our formulas, we take great care to make any additional structures explicit. For example, instead of writing a formula where we implicitly assume that the respective basis to be orthonormal with respect to some reference inner product, we will write a basis independent (covariant) formula, which explicitly includes the relevant inner product. If we then express the formula with respect to an orthonormal basis, the matrices representing the inner product are just the identity.

A. Covariance matrix
Given any basis {ξ a } of (possibly complexified) linear observables, we compute the bosonic covariance matrix G ab and the fermionic covariance matrix Ω ab of a Gaussian state vector |J with Γ|ξ a |Γ = 0 as defined in (21) based on the following formula.
Formula 1 (Covariance matrices of pure Gaussian states). A Gaussian state vector |J is fully characterized by its bosonic or fermionic covariance matrix defined as We clearly have G ba = G ab and Ω ba = −Ω ab . The covariance matrix is the (anti-)symmetrised autocorrelator of the quadrature operators and a key characteristic of Gaussian states is that they are unambiguously defined by their first and second moments (the displacement in phase space and covariance matrix) only [1][2][3][4].

B. Linear complex structure
An alternative to parametrizing Gaussian states by their covariance matrix is to use the so called linear complex structure. It is less commonly used in quantum information and condensed matter, but has been extensively studied in the context of quantum field theory in curved spacetime [12].
Formula 2 (Linear complex structure). Given a bosonic Gaussian state vector |G, z with symplectic form Ω ab or a fermionic Gaussian state vector |Ω with metric G ab , the associated linear complex J is We will see that the compatibility of these three structures shown in (92) imbues the manifold of pure Gaussian states with the structure of a Kähler manifold.

C. Characteristic functions
Given a quasiprobability distribution W : V → C on the classical phase space, its characteristic function 93) is defined as the Fourier transform. We therefore note that there are as many characteristic functions, as there are quasiprobability distributions. We recall from Section III D that quasiprobability distributions are labelled by a reference covariance matrix Γ 0 (bosonic or fermionic) and a real parameter s ∈ [−1, 1].
We can compute the expectation value of an arbitrary polynomial in linear observables {ξ a } as where (. . . ) s refers to the respective ordering with respect to a Gaussian reference state vector |0 = |J 0 , i.e., symmetric ordering for s = 0, normal-ordering for s = 1 and anti-normal ordering for s = −1.
Wigner. For s = 0, the quasiprobability distribution is independent of Γ 0 and called Wigner distribution ξ → W 0 (ξ). The Wigner characteristic function χ 0 (w) of an operatorÔ can be either computed via the Wigner function using (93) or directly from the operatorÔ as Glauber. For s = 1, we have the Glauber-Sudarshan P characteristic function, which is the Fourier transform of the Glauber-Sudarshan P quasiprobability distribution ξ → P (ξ) = W −1 . The characteristic function is Husimi. For s = −1, we have the Husimi Q characteristic function, which is the Fourier transform of the Glauber-Sudarshan P quasiprobability distribution ξ → P (ξ) = W −1 . The characteristic function is defined as

Formula 3 (Characteristic functions of Gaussian states).
The general formula for the characteristic function of a pure or mixed Gaussian state with respect to the reference state vector |0 = |J 0 is As can be seen from (93), the characteristic function χ is defined on the dual phase space V * , while the quasiprobability distribution W is directly defined on the phase space V . We can, however, use the isomorphism w a ⇔ ξ a given by w a = ω ab ξ b for bosons and w a = g ab ξ b for fermions to map the characteristic function χ s (w) into the phase space function χ s (ξ), such that One can show that for pure Gaussian states W (ξ) = χ(ξ) for all ξ.

D. Quasiprobability distributions
Bosonic and fermionic quantum states can also be represented as quasiprobability distributions on the classical phase space, i.e., real valued functions W : V → R satisfying d 2N ξ W (ξ) = 1. In contrast to regular probability distributions, W (ξ) is allowed to also take negative values and in fact, this negativity can be directly linked to the non-classicality of the respective quantum state [24,25]. For Gaussian states, all quasiprobability distributions are themselves Gaussian functions and in particular positive, i.e., classical in the sense of ref. [24]. More generally, operators O on Hilbert space can be related to a phase space distributions W : V → C, which may not be normalized. To define W (ξ a ), we need to write the operator O as a power series in terms of linear observablesξ a (or as limit of a sequence of such series). Clearly, the sequence is not unique, because we can use commutation or anti-commutation relations to change the ordering ofξ a ,ξ b and so on in (100), which will create additional terms. For example, we havê qp =pq + i. Given a Gaussian reference state vector |0 = |J 0 , we can express everything in terms ofξ a ± , which are defined with respect to |0 , and then use commutation relations to bring them into some standard ordering. The most common orderings are symmetric ordering (s = 0): where the parameter s ∈ [−1, 1] describes a continuum of intermediate orderings, as introduced in ref. [26]. Let us emphasize that we bring the power series 100 by using the canonical commutation or anti-commutation relations and not by just reordering the terms by force, which would change the resulting operator O. We can then express ξ a ± in terms ofξ a via (19) to find the coefficients (t s n ) a1...an of a series expansion with ordering s. Plugging in the variables ξ q,p ≡ (q 1 , . . . , q N , p 1 , . . . , p N ) a,a † ≡ (a 1 , . . . , a N , a † 1 , . . . , a † N ) rather than operatorsξ then defines the phase space distribution i.e., the trace of the product of two operators can be computed by just integrating over the pointwise product of the respective phase space functions. Note that this formula does not generalize to computing the trace of a product of more than two operators.
Formula 4 (Quasiprobability distributions). For a Gaussian state with covariance matrix Γ, we have the quasiprobability distribution with respect to the reference state vector |0 = |J 0 .

E. Gaussian unitaries
We can parametrize Gaussian states also by the unitary Gaussian transformation U that takes us from a reference state (vacuum |0 ) to the state under consideration, i.e., |G or |Ω . This unitary is not unique, because we can always compose U with some other Gaussian unitary satisfying u |0 = e i ϕ |0 .
We have the reference covariance matrix Γ 0 associated to the vacuum |0 and a target state with covariance matrix Γ. Using the relative covariance matrix With this, we can define the group element T = √ ∆, satisfying J = T J 0 T −1 , from which we can deduce the Lie algebra generator The unitary transformation S satisfying |J = S |0 is If we know the anti-Hermitian quadratic operator K = − i 2 h abξ aξb for bosons or K = 1 2 h abξ aξb for fermions, we can compute the associated generator from which we find the transformed covariance matrix as In summary, we have the following formulas.
Formula 5 (Pure Gaussian state transformations). Given a reference Gaussian state vector |0 with covariance matrix Γ 0 , we can compute for every Gaussian state vector |J the quadratic operator K, such that where ∆ a b = Γ ac (Γ −1 0 ) cb . Vice versa, for the same setup (reference state vector |0 with covariance matrix Γ 0 ), we compute for every Gaussian unitary e K the covariance matrix Note that all equalities of quantum state vectors are only up to a global complex phase.

F. Squeezed vacuum
Given a bosonic or fermionic Gaussian state vector |0 together with a complete set of annihilation operatorsâ i satisfyingâ i |0 = 0, a Gaussian state vector |J can be described by a squeezing matrix γ, which is a complex N × N matrix that is symmetric for bosons and antisymmetric for fermions. For bosonic systems, we can thereby reach any covariance matrix G ab , while for fermionic systems we can reach any covariance matrix Ω ab with the same parity as explained around (40). In the following, we will derive the relations between K, γ, Γ and Γ 0 .
We choose our standard bases such that Γ 0 and Ω for bosons and G for fermions takes their standard forms from (23). We consider |J ∼ = S(T ) |0 , where T 2 = ∆ = ΓΓ −1 0 . By construction, we have S(T ) = e K with K = 1 2 log ∆. Both K and L take the standard forms which both anti-commute with J 0 . Note that the decomposition into K 1 and K 2 still has a U(N ) redundancy, i.e., we would preserve the standard forms (23) of J 0 , Γ 0 and Ω for bosons or G for fermions, while K 1 and K 2 will mix with each other.
A matrix u ∈ U(N ) satisfies [u, J 0 ] = 0 and is Under a change of basis K → uKu −1 , we thus have the change . Mathematically speaking, we have the complex Ndimensional vector space V − i of annihilation operators and V + i of creation operators. The two spaces are embedded in the complexified phase space V C and can be canonically identified using complex conjugation on V C . Our goal is to find a compact expression of |J . We consider K with {K, J 0 } = 0, which satisfies .
We can simplify e K based on the known relations which are derived in ref. [27]. Using them and the definition L = tanh K, we find the covariant expressions where we emphasize that they only apply to algebra elements K ∈ g with {K, J 0 } = 0. When applied to |0 , we find where we used e ± 1 8 tr log(1−L 2 ) = det ± 1 8 (1 − L 2 ). The relevant linear map L = tanh K takes the form which is analogous to (112). We find where we defined in this basis the complex matrix The matrix representations L 1 and L 2 are symmetric for bosons and antisymmetric for fermions. This leads to where the sign changes due to the anti-symmetry of L i for fermions. Using the fact that the spaces of V ± i of creation and annihilation operators are Hilbert spaces with Hermitian inner product, we can use γ as bilinear form γ ij , which satisfies .
This finally proves our final formula, which we will summerize again.
Formula 6 (Parametrizing squeezed state vectors). Given a Gaussian reference vacuum |0 with covariance matrix Γ 0 and creation operatorsâ † i , we can parametrize the squeezed state vector |J by an arbitrary symmetric complex matrix γ, such that We can seamlessly convert between the matrix γ and the covariance Γ. In particular, we have which can be used to compute γ from ∆ a b = Γ ac (Γ −1 0 ) cb or vice versa. Note that the splitting from (126) only requires the standard forms of (23) of the state vector |0 .
For fermions in the real (Majorana) basis, we have In this basis, we find The inverse operation is given by where the fraction A/B denotes AB −1 . Equivalently, for bosons we can write The elements of this bosonic covariance matrix can again be directly expressed in terms of γ: Again, this relationship can be inverted, leading to . (132)

G. Bogoliubov transformation
The transformation from one Gaussian state to the other is sometimes encoded in a Bogoliubov transformation. This is an indirect way to describe the transformation from a reference vacuum |0 annihilated byâ i to the new Gaussian state vector |J annihilated byb i witĥ where we sum over the repeated index j. We will see that the information contained in α ij and β ij is equivalent to the one contained in a group transformation M ∈ G. In fact, a Bogoliubov transformation is nothing else than a symplectic or orthogonal group transformation expressed in a complex basisξ a .
Formula 7 (Bogoliubov transformations). Given a Gaussian reference state vector |0 with covariance matrix Γ 0 and annihilation operatorsâ i , we reach any Gaussian state vector |J by a Bogoliubov transformation such thatb i |J = 0. We compute Γ from α and β via

H. Thermal states
Every mixed Gaussian state can be written as a thermal state ρ = e −βĤ /Z, whereĤ is a quadratic Hamiltonian with a unique ground state and Z = Tr(e −βĤ ). Without loss of generality, we can assume β = 1 and Z = 1 by redefiningĤ. With this choice,Ĥ is also known as the modular Hamiltonian. A general quadratic Hamiltonian can be written aŝ where q ab is symmetric for bosons and anti-symmetric for fermions. Note that there is no factor of 1 2 as in (37), which will simplify later conventions. BecauseĤ has a unique ground state, it follows that there exists a basiŝ ξ = (q 1 ,p 1 , . . . ,q N ,p N ) such that where ω i > 0 andn i with respect toξ a is given bŷ In this specific basis, the density operator ρ decomposes into a tensor product over single modes, from which we can derive the respective standard forms of J a b , G ab , Ω ab and q ab listed in table III. , where q ab and c 0 diverge for J 2 = −1 in such a way that the limit of ρ describes the projector ρ = |J J|. These relations can be easily inverted to compute J and Γ in terms of q ab .

I. Wave functions
Most physicists encounter Gaussian states for the first time when studying the quantum harmonic oscillator. The ground state is a Gaussian state with Gaussian wave function q → ψ(q), where q ∈ Q is a vector in position space Q. In this section, we show how every pure bosonic log (cos r i sin r i ) TABLE III. We show the real standard forms of J, G, Ω, q and c0 for a mixed Gaussian state ρ = exp(−c0 − q abξ aξb ).
Gaussian state can be represented as Gaussian wave function, either as pure wave function q → ψ(q) or as mixed wave function (q,q) → ρ(q,q), and how to convert between wave functions and covariance matrices. In order to write down a wave function, one needs to make a choice by splitting the classical phase space V into the direct sum V = Q ⊕ P with dim Q = dim P = N , such that the symplectic form vanishes on Q, P ⊂ V . More precisely, we find the block form where we have q ∈ Q and p ∈ P . The phase space decomposition V = Q ⊕ P induces a dual decomposition V * = Q * ⊕P * . The off-diagonal blocks in Ω and ω induce isomorphism Q ≃ P * and Q * ≃ P . 13

Pure states
We write the most general pure Gaussian state as Note that the determinant of the bilinear form A implies that the wave function is not a scalar function, but rather a scalar density of weight 1/2. This ensures that the square modulus of the wave function can be integrated over Q to give probabilities. We decompose the bosonic covariance matrix G ab and the symplectic form Ω ab based on our decomposition of the phase space V = Q⊕P , such that Note that the only requirement for the respective decomposition V = Q ⊕ P is that the restrictions Ω αβ and Ωαβ vanish.

Formula 9 (Conversion formulas). Given a bosonic
Gaussian state vector |G and a phase space decomposition V = Q ⊕ P , we can convert between the covariance matrix decomposed in the blocks (143) and the wave function representation from (142) containing the bilinear forms A αβ and B αβ using Vice versa, we can solve these equations for A and B in terms of G to find Note that G −1 αβ is the inverse of the N × N block G αβ satisfying G αβ G −1 βγ = δ α γ over Q ⊂ V which should not be confused with the full 2N ×2N inverse g ab of G ab with G ab g bc = δ a c over V .

Mixed states
Similarly, we can also write out the most general mixed Gaussian state in the position representation as where q = (q 1 , . . . , q N ),q = (q 1 , . . . ,q N ) and the normalization is given by Again, the wave function representation of the mixed state ρ is density of weight 1/2. As before, we would like to relate the bilinear forms A, B, C and D in terms of the covariance matrix We find the following relations.
Formula 10 (Relationship between blocks). The different blocks of the covariance matrix G ab are related to the matrices A, B, C, D via which can be inverted to give (150) In our standard basis, we will have Ω αβ q,p ≡ 1, Ωα β q,p ≡ −1, ω αβ q,p ≡ −1 and Ω αβ q,p ≡ 1, which simplifies above expressions further Note here the signs. Notice also that formula (149) reduces to formula (144) if the state is pure, i.e., C = D = 0.

IV. OPTIMIZATION ALGORITHM
Finding the minimal value of a function f on some large manifold M is in general a hard problem and the main goal in the field of mathematical optimization. One distinguishes between global and local optimization, i.e., if one is able to find the global minimum or if one may get stuck in a local one. We present in this section a schematic overview of our approach to efficient local optimization over the class of pure Gaussian states, based on the geometric considerations in Section II D. We also allude to the flexibility of our optimization algorithm in finding global minima and avoiding the pitfalls of poor convergence. We use a rudimentary gradient descent implementation [28], but use the natural geometry of Gaussian states and exploit the Lie group structure of G.

A. Gradient descent on matrix manifolds
Given a function f : M → R on some manifold M, we can find its minimum from some starting point using gradient descent. At any point x ∈ M in the manifold, the range of possible directions of motion can be expanded in a basis of the vectors in the tangent space to M at x, denoted T x M. Gradient descent is one of the most basic methods of finding a minimum by moving iteratively in directions which locally decrease the function. This means picking out suitable vectors in T x M directed along those directions which minimise the function value. Specifically, these are the components of the gradient descent vector field on the manifold, which is given by i.e., it associates with each point x ∈ M the directional derivative of f . The inverse metric G µν is included in this definition to remove the sensitivity of the gradient to the choice of local basis x µ . The analytical solution to the gradient descent problem is the integral curve associated with the vector field (151). In a numerical realization of gradient descent, we approximate this continuous curve by sufficiently small discrete incremental steps. This notion of moving a certain distance along one of the tangent vectors in T x M while remaining on M, which is trivial when M is a submanifold embedded in Euclidean space, is realised for the general case by a so-called retraction map. This is a map R : T M → M, with restrictions to the domains T x M given by the maps which satisfy the properties where id TxM denotes the identity mapping on the tangent space. It is important to note that the retraction map is not unique and that the most convenient choice of retraction map will be that which minimises the computational effort while remaining a sufficiently accurate approximation to the continuous integral curve.
Example 7 (Retraction map). For M = R n , an intuitive choice of the retraction map is since in this case the manifold and its tangent space are globally isomorphic. This is the familiar notion of moving forward by some step u from a point x.
If we optimize over a Lie group G, a natural choice for the retraction map is the exponential map, which, for a matrix Lie group, is simply given by the matrix exponential, since for Lie groups tangent vectors and Lie algebra elements K are equivalent. When optimizing over a Lie group G, we divide the integral curve into continuous segments, curves γ(t) = M e tK , t ∈ [0, 1] connecting subsequent points M and M ′ = M e K in the group manifold.
Here, K denotes the tangent vector to the group manifold at M corresponding with motion to M ′ and this motion is realised by the exponential (retraction) map. In practice, computing the full power series in (155) is expensive and we would prefer a more computationally viable approximation to the exponential. In principle, we can consider small Lie algebra elements K under the norm ||K|| 2 = Tr(KK ⊺ ) and then perform a reasonable truncation of the power series. The issue here is that a power series approximation of the exponential will not lie in the desired Lie group i.e., it cannot serve its purpose as a retraction map. There is, however, another approximation to the matrix exponential, which does fulfill this criterion: For algebra elements K, we have which will always map into the associated Lie group. It is important to note that evidently, if one of the eigenvalues of ǫ 2 K is 1, then the expression in (156) cannot be inverted. We avoid this by choosing ǫ sufficiently small.

B. Optimization on the Gaussian state manifold
As discussed in Section II, the Gaussian state manifold is equipped with a Riemannian metric g µν with inverse G µν and therefore the vector field (151) can be naturally defined for both the bosonic and fermionic state manifolds (1) and (2). In general, the inverse metric G µν needs to be re-evaluated at every point of the manifold, but as suggested previously, by moving into one of the standard basis choices in (23), the matrix representation of the inverse metric is constant. This is the first of the properties of Gaussian states which allows for a particularly computationally efficient implementation of gradient descent optimization over this class of states. Section II E outlines in some detail the parametrization of the Gaussian state manifold in terms of transformations M of some reference state complex structure J 0 . Based on this parametrization, when optimizing over the manifold M of Gaussian states we may equally say that we are optimizing over the matrix groups (35), which form manifolds of dimensions This means that in practice, the vector field F µ is computed not with respect to the local basis of a tangent space to M at a state J M , but rather with respect to an orthonormal basis Ξ µ of the Lie algebra g or, more precisely, of the subspace h ′ ⊥ ⊂ g introduced in (73) which  (46), with respect to some reference state J0. The blue surface indicates a tangent space to M at state J1. The vector fieldX in this tangent space is also included, as well as the retraction map which is shown as a projection of the vector field onto M to visualise the notion of moving in the direction of a tangent vector but in the manifold. The associated group G is also shown and crucially it should be noted that the tangent spaces at each point in the group are aligned with the manifold tangent spaces (to highlight the isomorphism between the two) but do not reproduce the smooth manifold M, since the equivalent of the curve traced out on M by successive retraction maps simply connects matrix elements which lie along the (light blue) fibers which represent the stabilizers h ′ as indicated illustratively at the identity. In G, the gradient vector field at points Mn is denoted by Kn as defined in (159). The lines connecting points Mn and Mn+1 in the group are defined by e sKn with s ∈ [0, 1] as written in (162).
generates non-zero variations in the complex structure. This idea is what leads to the expression for the variation of a state in terms of a Lie algebra element in (70).
We can relate the gradient vector F µ to the associated Lie algebra element K as using local basis Ξ µ of h ′ ⊥ . A second key point arises from the left-invariance of the Riemannian metric on the Gaussian state manifold: Since this leads to the preservation of the orthonormality of any choice of Ξ µ under transformations in the group, we do not need to compute a new orthonormal basis at different points in the manifold. Instead, we can choose Ξ µ to be the generators of the Lie group, which form the natural orthonormal basis of the tangent space to the identity. This leads to significant computational speedups, particularly when optimizing over high-dimensional manifolds. The setup for gradient descent on the Gaussian state manifold is visualised in fig. 2.

C. Performing gradient descent
Since we work in a parametrization of the state manifold solely in terms of the transformations of a reference state, a computational implementation of the algorithm should be able to evaluate the target function f for any state vector |J M with only J 0 and M as arguments.
Here, we write this as (M, J 0 ) → f (M, J 0 ). To define local derivatives, we introduce a local coordinate system x µ around a point M ∈ G, such that f (x) = f (M e x µ Ξµ , J 0 ) leading to which allows us to define the vector field F µ according to (151) with respect to the basis Ξ µ of h ′ ⊥ . We reemphasize here that (160) lets us naturally express the gradient in terms of the variation of group element only, i.e., at no point are we required to move from the group G to the state manifold M. This approach is shown for various examples in Section V.
We now provide a step-by step explanation of the realization of an iterative gradient descent minimization algorithm based on the considerations above. The steps are summarized graphically in fig. 3, which complements fig. 2. 1. Initialization. Our algorithm is initialized on a Gaussian state vector |J 0 with covariance matrix Γ 0 and complex structure J 0 , such that the action of the subgroup G ′ ⊂ G generates the state manifold under consideration. We then construct an orthonormal basis Ξ µ for h ′ ⊥ , which are both defined with respect to Γ 0 . For this, we compute the metric G µν explicitly. We evaluate g µν in an arbitrary basis Ξ µ , so we can orthogonalize it, such that both g µν and G µν are equal to the identity. The metric g µν is efficiently computed as where we use (75) with respect to the reference state vector |J 0 = |0 . By construction, we identify the tangent spaces at all group elements M n with the ones at M 0 = 1. While |J 0 is usually chosen in some standard form, we can still initialize the algorithm on some M 1 based on the problem at hand. 2. Gradient computation. We now perform successive steps in the group as where K n = F µ Ξ µ with F µ calculated at the point M n and s chosen such that f (M n+1 ; J 0 ) < f (M n ; J 0 ). 3. Sub-routine to determine step-size. To choose an appropriate step-size s, we use a sub-routine at each iteration which should be chosen so as to balance the efficiency gained by needing fewer steps to reach the minimum and the extra computational effort of executing the subroutine. In the examples discussed in later section, we found that the very rudimentary approach of iteratively halving the step size was sufficient to ensure good convergence. However, a simple line search methods like a quasi-Newton routine can also be used.

Stop condition.
This iterative motion is repeated until some pre-determined stopping criterion (e.g., a tolerance on the gradient norm or the difference between subsequent function values) is reached.

D. Practical considerations
While the focus of this work is not on elaborate numerical methods for implementing the algorithm described above, we mention here for the sake of completeness some additional considerations regarding the practical implementation the algorithm. These have been added in our implementation of the algorithm to varying degrees to enhance its efficiency.
Constrained optimization. Our approach lends itself intuitively to constrained optimization, since we can choose to restrict our optimization to a smaller range of states, being some subspace of G, by truncating the Lie algebra basis. We show an example of this is in the section on Complexity of Purification, where those algebra elements which do not generate non-zero variations of the complex structure can be explicitly cut out of the basis.
Extension to global optimization. There is ultimately no fail-safe way to locate global minima using gradient descent. However, we can increase the probability of convergence to the global minimum by performing gradient descent from a number of sufficiently far separated starting points in the manifold and choosing the lowest of the local minima from a large enough sample size.
Identifying suitable starting points. In choosing different starting points, we in some cases it may be possible, given some analytical intuition about the physical system at hand, to identify starting points in the manifold from which the risk of landing in a local minimum is particularly low. Additionally, there may be some starting points in the manifold from which gradient descent will converge the fastest (i.e., from which it will take a much lower number of iterations to reach the minimum).
Parallel optimization. Where the most suitable starting points cannot be found analytically, we must resort to numerical methods: If, rather than minimizing successively from different starting points, we choose to perform the optimizations simultaneously, we can discriminate between trajectories which promise to converge more or less quickly to the minimum. In our algorithm, we implement this feature, and after each set of 5 iterations, only the 10% of trajectories with the lowest function value and the highest gradient, respectively, are pursued further. While this does generally speaking greatly reduce the total number of iterations required, there is of course a trade-off between this improvement and the computational effort of an initially large number of trajectories.

E. The GaussianOptimization.m package
To complement the theory outlined in this paper, we supply the public GaussianOptimization.m Mathematica package with a simple implementation of the optimization algorithm discussed in the previous sections. The package revolves around the function GOOptimize, which performs the gradient descent optimization from some initial complex structure and transformation. The input arguments to this function are divided into three categories: Problem-specific. These are the arguments related to the specific optimization problem at hand, the scalar function and its derivative with respect to some Lie algebra element, expressed in terms of the initial complex structure and an arbitrary transformation, in the spirit of Table IV.
System-specific. These relate to the geometry of the optimization problem.
Fundamentally, this in-cludes the symplectic or orthogonal basis (which can be generated using the built-in functions GOSpBasis and GOOBasis), but also the corresponding metric (generated by GOMetricSp and GOMetricO). It also includes the initial complex structure and the (list of) initial transformations.
Procedure-specific. These are the parameters related to the numerical implementation of the algorithm, including stopping criteria based on step limits and tolerances on the function value and gradient.
The GaussianOptimization.m package is designed to be user-friendly and all functions come with comprehensive documentation. It is accompanied by an example notebook which includes a systematically organized overview over the functions included in the package, as well as implementations of the applications discussed in the next section. The functions and function gradients for these applications are also implemented as part of the package.

V. APPLICATIONS
In this section, we show how our optimization algorithm may be used in several relevant physical contexts: approximating the ground state of Hamiltonians and computing the entanglement and complexity of purification for fermionic and bosonic systems. We indicate how to parametrize the function to be extremized in terms of the complex structure and how to compute the associated local derivatives (160). As discussed in the previous section, this is essential to unlocking the full computational efficiency of the algorithm. We also provide suggestions regarding convenient starting points and parametrizations.
In the examples of this section, the gradient of the function f could be obtained analytically in terms of the complex structure J using the chain rule and properties of matrix calculus. However, this may not be possible in general, e.g., if a function f is the result of some numerical algorithm, it may not be possible to compute its derivative analytically. In this case, automatic differentiation (AD) procedures [29], which provide a numerical algorithm to compute the gradient without the need of an analytical derivative and also without the drawbacks of a purely numerical derivative, present a computationally feasible alternative.

A. Approximate ground states
The energy function E = Ĥ is one of the most prominent function on families of quantum states that should be minimized. This is particularly relevant in the context of finding variational ground states, i.e., finding states within a given variational family of ansatz ground states that approximates the true ground state most accurately (A) Approximate ground states (B) Entanglement of purification (C) Complexity of purification  (with respect to some merit function). The energy expectation value E is often used as merit function.
Pure Gaussian states and certain submanifolds are known to be very suitable variational families to approximate ground states of bosonic and fermionic Hamiltonians with local interactions. The approximation typically improves with the dimension of the system, i.e., Gaussian states often only capture qualitative features in one spatial dimension, but improve when moving to two and and three dimensions, as mean field descriptions become more accurate. Gaussian states are also heavily used as trial states in mathematical physics to find upper bounds to the energy of quantum gases, e.g., when studying the dilute limit of Bose gases [5]. There exists a range of different tool to find the best Gaussian state, i.e., the Gaussian state with the lowest energy expectation value E. A prominent example is the Hartree-Fock method, which is typically applied to fermions, but can also be used to approximate bosonic ground states. Another established method is imaginary time evolution, where the geometric flow of e −τĤ is approximated on the given manifold.
From a purely numerical perspective, many optimization methods are suitable to find the minimum of a function on conveniently parametrized family. However, in the context of Gaussian states many standard parametrization (using squeezing parameters or quadratic Hamiltonians acting on a reference state) tend to converge unreliably or get stuck in local minima. Taking the natural Riemannian geometry of Gaussian states (induced by the Fubini-Study metric) into account can significantly improve the convergence of such numerical methods. In fact, one can show that gradient descent with respect to this natural geometry coincides with projected imaginary time evolution [17], which is known to have favorable convergence properties. Both, the group-theoretic parametrization and the resulting straight-forward gradient descent algorithm are therefore perfectly suitable to find approximate ground states within the Gaussian state families.
Finding the minimum of energy function E = Ĥ requires us to evaluate E and its derivative dE efficiently.
We can always use canonical commutation or anticommutation relations to ensure that (t i ) a1...ai is totally symmetric (bosons) or anti-symmetric (fermions), in which case Wick's theorem only leads to all contractions with 1 2 Γ ab , i.e., either way, we find E = E(Γ) as polynomial in the entries of Γ. A Lie algebra element K perturbs Γ at linear order (tangent vector) as δΓ = KΓ + ΓK ⊺ , such that This allows us a straight-forward implementation of gradient descent on the manifold of all pure Gaussian states (or appropriate submanifolds) based on the algorithm discussed in section IV.
As an interesting observation, let us mention that the described approach can also be used to approximate real time evolution on the manifold of pure Gaussian states. Due to the fact that the manifold of pure Gaussian states is a Kähler manifold the commonly used variational principles (Lagrangian, McLachlan, Dirac-Frankel) agree [17] and can be implemented as Hamiltonian equations of motion. Our gradient descent algorithm implements the vector field which must be replaced by the Hamiltonian time evolution i.e., we only need to adjust our algorithm in step 2. Gradient computation, where we replace K n = F µ Ξ µ by K n = X µ Ξ µ . Here, Ω µν is the inverse of ω µν computed from (76). Just as in the case of G µν our group-theoretic parametrization (left invariance) of the Gaussian state manifold ensures that we only need to evaluate Ω µν once and can use the same matrix for subsequent steps in the algorithm. Note, however, that there are important differences between imaginary time evolution (gradient descent) and real time evolution. For imaginary time evolution, the step size is only used to ensure that the energy decreases, while for real time evolution we need to keep track of it to know the current time parameter. Moreover, for imaginary time evolution, we just need to make sure that the energy function decreases with each step, while other errors due to the finite step size are not a problem.
For real time evolution, we will always make small errors due to the finite step size and can only try to decrease it by enforcing relevant conservation laws. In particular, the energy function should stay exactly constant, so we can try to use the step size to keep the accumulated error under control. We can also use (165) in combination with (166) and (167) to derive the real and imaginary evolution equations of the covariance matrix Γ, namely [17,30] For bosonic systems, it is natural to also allow for a nonzero displacement vector z a = ξa , which can be seamlessly integrated in the presented formalism, as discussed in refs. [17,30]. In summary, our group-theoretic parametrization of Gaussian states and the resulting optimization algorithm are suitable to find approximate ground states and to perform projected real time evolution on the manifold of pure Gaussian state. In practical applications, we can typically reduce the dimension of the manifold by implementing certain symmetries, e.g., translational symmetry, from scratch by reducing the number of Lie algebra generators Ξ µ accordingly and choosing an initial state respecting the chosen symmetry.

B. Gaussian entanglement of purification (EoP)
The entanglement of purification (EoP), first introduced in ref. [31], quantifies the degree of entanglement between subsystems in a composite quantum system. As such, it serves as a valuable correlation measure and has recently become of interest in quantum many body systems [32]. Suppose we are given a mixed state in some Hilbert space H = H A ⊗ H B which can be described by a density operator ρ AB . We now define a new Hilbert space, choosing the ancillary H A ′ ⊗H B ′ in such a way that there exists a purification |ψ ∈ H ′ such that Of course, this purification is not unique, and the EoP is defined in terms of the von Neumann entropy S(ρ) = −tr(ρ log ρ), as the minimum of the entanglement entropy between subsystems A ⊕ A ′ and B ⊕ B ′ . Accordingly, determining the EoP in general requires an optimization over the full Hilbert space H ′ , which is a computationally intensive task that quickly becomes unfeasible. A much more reasonable problem is to focus instead on the Gaussian EoP, obtained by assuming that both the initial mixed state and the purification are Gaussian. The optimization over all purifications then reduces to the familiar problem of optimization on the sub-manifold of Gaussian states composed from purifications of ρ AB . The properties of these states have been discussed in some detail in section II F -in fact, the only difference to note here is that in the context of EoP, we label the original subsystem by A ⊕ B rather than A and the ancillary by A ′ ⊕ B ′ rather than B.
We recall from the previous discussion on Gaussian purifications that the manifold of purifications can be parametrised in terms of complex structures J with restrictions to the subsystems given by the restricted complex structures J AB , J A ′ B ′ , J A ′ A , J BB ′ . In Refs. [16,33], an expression based on this parametrization was derived for the Gaussian entanglement entropy, first defined in [34] for bosons and [35] for fermions. The expression reads and once again makes use of the complex structure formalism to provide a unified expression for both bosons and fermions. This expression can be framed more concisely by defining D = 1 2 (1 + i J AA ′ ) as The derivative of the entanglement entropy can be obtained by a straightforward application of the product rule and the cyclicity of the trace as where we have defined dD = i 2 δJ AA ′ . These results are summarised in Table IV. Equipped with a manifold of pure Gaussian states and a scalar function and its derivative defined on this manifold in terms of the complex structure, we are now in a position to employ our optimization algorithm to efficiently compute the Gaussian EoP.
In practice, we begin with a matrix representation of the mixed state reduced complex structure J AB in a basiŝ ξ = (ξ A ,ξ B ) which decomposes over the two subsystems A and B. We denote the transformation by T which relates J AB to its mixed state standard form J m sta defined in (80), so that The transformation is obtained from the eigenvectors of J AB as discussed in section II F. We can now construct an initial purification of the form in (80). In doing so, we are free to choose any dim(A ′ B ′ ) ≥ dim(AB), and we speak of a minimal purification when the number of modes in AB is the same as that in A ′ B ′ . The convenience of our choice of basis asξ ′ = (ξ A ,ξ B ,ξ A ′ ,ξ B ′ ) now becomes evident: In this basis, the complex structure of the purified state will take the block form where the blocks on the main diagonal are the restricted complex structures defining the mixed states ρ AB and ρ A ′ B ′ . It should be noted that, in contrast, the offdiagonal blocks do not represent complex structures as they map from A⊕B to A ′ ⊕B ′ or vice versa. While J AB is fixed to preserve the restriction to the original subsystem, varying J A ′ B ′ and the off-diagonal blocks in a compatible way corresponds to different purifications of ρ AB . The state manifold of interest is therefore parametrized by the transformations M A ′ B ′ acting on the reduced complex structure J A ′ B ′ , which act on the full complex struc- We initialize the optimization algorithm at the initial purification, which is in the standard form and therefore the very first transformation must be of the form where M 0 denotes starting point in the variational manifold and the leading block in the transformation returns J st AB to its initial form J AB . This ensures that the restriction of J 1 = M 0 J 0 M −1 0 to A ⊕ B returns the initial reduced complex structure J AB . The optimization can then proceed in the way outlined in the previous section, with steps to ensure that the optimization procedure leaves J AB unchanged.
A comprehensive study of Gaussian entanglement of purification in free quantum field theories based on our methods can be found in [15].

C. Gaussian complexity of purification (CoP)
In ref. [36], it has first been suggested that a notion of (circuit) complexity might provide fresh insights and might meaningfully complement notions of entanglement in holography. Motivated by the subsequent interest in holographic complexity as well as a preceding geometric interpretation of complexity in quantum circuits [37], significant attention has been dedicated to extending notions of complexity to quantum field theories [38,39]. Since the thermal and ground states of free quantum fields are Gaussian, a framework for the complexity of Gaussian states has been developed in refs. [19,20], which we draw on here.
A particular area of recent interest has been the study of complexity of purification (CoP) as a correlation measure in composite quantum systems based on the notion of complexity rather than entanglement [40]. In this context, a typical problem would be the following: We are given a mixed state in some Hilbert space H A , which is characterized by a density matrix ρ A . We now define a new Hilbert space, choosing the ancillary system B in such a way there exists a purification |ψ T ∈ H ′ such that We refer to this purification as the target state. The CoP is defined as the minimum of a complexity function C with respect to some reference state ψ R over all purifications of the initial state i.e., There are several distinct proposals for the complexity function C(|ψ T , |ψ R ) in the literature. In the context of this work, we will once again focus only on Gaussian CoP and make the assumption that both the reference and target states are Gaussian in nature. For bosonic and fermionic Gaussian states, there exists a consensus definition 14 associated with the geodesic distance between reference and target states, whose analytical expressions have been derived in ref. [20] for bosons and in ref. [19] for fermions. The most concise formulation of this complexity function unsurprisingly involves the relative complex structure, introduced in (47), of the target and reference states, which captures all the information between the two. In terms of ∆, the complexity is then defined as although for the purposes of a numerical optimization, the square root is irrelevant and can be neglected. Given this parametrization of the complexity in terms of complex structures, we may obtain the CoP by optimization on the manifold of Gaussian purifications of the initial J A , as in the previous section. However, we may also choose a more computationally efficient approach, by noting a somewhat subtle point regarding the complexity function. By the cyclicity of the trace, any transformation J T → M J T M −1 will change the complexity (183) in a way equivalent to the transformation J R → M −1 J R M . This means that we can choose to optimize over the manifold of pure reference states rather than target states. This seems arbitrary until we note that we may assume without loss of generality that the reference state is a product state between A and B i.e., that the matrix representation of J R in our basis is simply where the subscripts A and B denote the restrictions to either subsystem. A comprehensive study of Gaussian complexity of purification in free quantum field theories based on our methods can be found in [15].

VI. OPTIMALITY OF GAUSSIAN EOP
This section focuses on a specific application defined and reviewed in the previous section V B, namely entanglement of purification. We combine our numerical results used from our numerical algorithm with several analytical arguments to support the conjecture that for mixed Gaussian states only Gaussian purifications are required to compute the entanglement of purification.

A. Conjectures on optimality
We will present numerical and analytical arguments for the validity of the following two conjectures. Conjecture 1 (Gaussian optimality conjecture). Given a mixed Gaussian state ρ of a bosonic or fermionic system and a system decomposition V = A ⊕ B with N A and N B degrees of freedom, respectively, it is sufficient to optimize over all Gaussian purifications to compute the entanglement of purification (minimal entanglement entropy S AA ′ over all purifications in H A ⊗ H B ⊗ H A ′ ⊗ H B ′ ), i.e., the global minimum of S AA ′ is reached on the submanifold of Gaussian states.
Conjecture 2 (Minimum purification conjecture). When minimizing the S AA ′ over all Gaussian purifications, the minimum is reached when choosing the numbers of degrees of freedom of the purifying systems A ′ and B ′ to be given by the respective numbers of degrees of freedom in A and B, i.e., At first sight, this conjecture may appear rather ambitious, considering that we assume that the optimization over the generally exponentially small family of Gaussian purification (compared to all non-Gaussian purifications) is sufficient and that the number of purifying degrees of freedom just match the ones of the original system (in contrast to the bounds for finite dimensional non-Gaussian systems from ref. [31]). However, for researchers familiar with typical properties of Gaussian states, our conjecture will likely appear much more realistic, considering that Gaussian states provide in many settings on of the simplest non-trivial realizations of quantum information concepts. This appears in the context analytical formulas for the entanglement entropy and other correlations measures (such as the logarithmic negativity).
Let us emphasize that we have formulated two distinct conjectures that only together provide us with clear instructions on how the full entanglement of purification can be computed numerically from the Gaussian optimization algorithm presented in the previous section. Conjecture 1 ensures that for mixed Gaussian states, we only need to consider Gaussian purifications, but to actually run the algorithm, we need to choose both the total number N of degrees of freedom to purify as well as how we split these purifying degrees of freedom into the auxiliary subsystems A ′ and B ′ .

B. Numerical evidence
We provide numerical support for conjectures 1 and 2 based on two paradigmatic models. For bosons, we consider the Klein-Gordon scalar field with mass m, discretized on a one-dimensional periodic lattice with N sites, equipped with the Hamiltonian where δ > 0 represents the lattice spacing. For fermions, we consider the transverse field Ising model in the critical limit J = h. Here, S x i and S z i represent the local spin-1/2 x-and z-component operators on the i-th site (the conventions match [42][43][44]).
Providing categorical numerical evidence for the first conjecture proves a substantial challenge, since it requires an optimization over the entire Hilbert space, which is the daunting problem that our Gaussian approach is trying to circumvent.
In the fermionic case, the finite-dimensional Hilbert space allows us to adapt our approach to gradient descent using Lie groups and algebras to this problem by optimizing over the (compact) group of unitary transformations U(2 N ) of an N -mode density operator. On the manifold of transformations U ∈ U(2 N ) with respect to some reference state ρ 0 , parametrizing the non-Gaussian purifications according to ρ U = U ρ 0 U † , we can define the entanglement entropy function as with ρ AA ′ = tr BB ′ U ρ 0 U −1 . In line with the previous discussion, we can also define the derivative of S AA ′ as It should be noted that computing the partial trace for a fermionic density operator in this context is non-trivial. In practice, we construct the initial purified density operator in the convenient basisξ = (ξ A ,ξ B ,ξ A ′ ,ξ B ′ ). Tracing out the subsystem BB ′ therefore involves a permutation of the degrees of freedom, in the sense ρ ABA ′ B ′ → ρ AA ′ BB ′ . While such a re-ordering is trivial for commuting bosonic degrees of freedom (or spin degrees of freedom), the permutation of fermionic creation operators do anti-commute, so the computation of the partial trace to find ρ AA ′ will lead to extra sign flips due to the required permutations. This is subtle, but well-understood in various contexts [45][46][47][48] and already taken into account when we computed the entanglement entropy of fermionic Gaussian states in (172).
Evidence for conjecture 1. This approach to non-Gaussian optimization proves efficient at small system sizes, however, the computational effort grows exponentially in the number of degrees of freedom in the system and it soon becomes unfeasible. Table V shows the non-Gaussian EoP for the fermionic (critical) Ising model, within the numerically accessible regime. Evidently, this data supports the conjecture that the optimal purification of a mixed Gaussian state is Gaussian.
Evidence for conjecture 2. We can tackle the second conjecture in a more comprehensive way, since it only requires us to perform Gaussian optimization. Table VI shows the numerical Gaussian EoP for a variety of dimensions, for both the bosonic and fermionic cases. Evidently, here we also see good agreement between the numerical results and our expectations based on the conjecture.

C. Analytical bounds
As alluded in the previous section, it is highly plausible to conjecture that the purification for which the entanglement entropy is minimized belongs to the class of Gaussian states (conjecture 1). We have some further analytical evidence for this: After all, the map from quantum states on H ′ to ones on H A ⊗ H A ′ performing a partial trace over the complement of H A ⊗ H A ′ can be seen as a constrained Gaussian channel, reflecting the constraint that the inputs must be such that the reductions to H A ⊗ H B are precisely the given Gaussian states ρ AB . Captured in this way, the entanglement of purification can be seen as a solution to a minimum output entropy problem of a Gaussian quantum channel [49,50], a problem in which the von-Neumann entropy of the output of a quantum channel is minimized under varying the input of the channel. At least for Gaussian bosonic systems has been settled under rather general conditions (albeit not under the specific constraints considered here). This connection will be made more precise elsewhere. Not referring to this conjecture, the Gaussian entanglement of purification (only allowing for Gaussian purifications), constitutes an upper bound for E P .
That said, the quality of approximation can be bounded by a lower bound to E P that can be computed [31]. This is the entanglement of formation (EoF) E F of ρ AB [51], satisfying and being defined as the infimum The entanglement of formation can be in instances computed and also conveniently bounded [52]. The easiest such lower bounds, valid for arbitrary as well as for Gaussian states, for which it is extremal both in the bosonic and fermionic [53] setting, is the hashing bound Another insight helpful in the numerical optimization of the Gaussian entanglement of purification is a bound to the number of auxiliary modes constituting systems A ′ B ′ that is required without restricting generality. Naively, one might expect that one needed a squared number of bosonic or fermionic modes in the purification. In fact, it is easy to see that one can restrict systems A ′ B ′ to be composed of as many modes N A ′ and N B ′ as A and B consist of, i.e., N A and N B . Given a quantum state ρ AB , it will be associated with some J AB . As discussed in section II F, we can always find a basis, such that J AB takes the standard form of a mixed state given by (80). Note, however, that this will be in general with respect to a basis that mixes the degrees of freedom of A and B. If we start with a basiŝ ξ 1 = (ξ A ,ξ B ), there exists a group transformation T AB ∈ G AB , such that where the mixed state standard form was defined in (80). We can use this M AB , which combines A and B to construct a purification |J ABA ′ B ′ , in which A ′ and B ′ are correlated in the same way. For this, we complete the basisξ = (ξ A ,ξ B ) from (193) toξ ′ = (ξ A ,ξ B ,ξ A ′ ,ξ B ′ ) and choose in this basis i.e., we use the same transformation T AB to combineξ A andξ B as we use to mixξ A ′ andξ B ′ . Here, we have the purified standard form J p sta from (84). From our numerical studies, we know that this is choice is generally not the optimal one, but it provides a meaningful starting point for our optimization algorithm.
We can also move on to arrive at analytical upper bounds, however. For this purpose, we block-diagonalize the submatrices [J] A and [J] B individually (rather than [J] AB as a whole as in (193)), i.e., we write with M A ∈ G A and M B ∈ G B , so that where A 2 has been introduced before and X is some 2N A × 2N B rectangular matrix andc A i andc B i are real numbers in [0, ∞) for bosons and [0, 1] for fermions. As J is just originating from a basis transformation of the purified J and J p sta , we havẽ One can now arrive at analytical upper bounds, acknowledging the following insight. The von-Neumann entropy of AA ′ can be computed from the reduction [J] AA ′ . One finds for the reduction the von-Neumann entropy via (172). This way, the von-Neumann entropy formula can directly be computed on the level of [J] AA . Let P be the pinching that projects the matrixJ into the 2 × 2 block diagonal form both in the main block and the off diagonal block ofJ. Since iJ is Hermitian (with respect to the inner product of g), such a pinching will render the resulting matrix P (iJ) more mixed than iJ in the sense of majorization [54], i.e., if the non-increasingly ordered eigenvalues of iJ AA ′ are ± iλ i and the ones of P (iJ AA ′ ) are ± iλ ′ i , we will have for all j in the first equation. Since the function S AA ′ (J) as a function ofJ is Schur-concave both for bosons and fermions, we have again for both bosons and fermions. In other words, the pinched matrix gives rise to an upper bound to the von-Neumann entropy of the involved quantum states and hence also an upper bound to the entanglement of purification. That said, now the eigenvalues entering the expression can be read off directly, giving rise to an explicit formula of an upper bound of the entanglement of purification. This mindset can be used to avoid costly numerical optimization and to study systems in the thermodynamic limit, while still arriving at reasonable bounds.

D. Proof of local optimality
Some further analytical evidence in support of conjecture 1 is provided by the fact that the entanglement entropy S AA ′ is locally optimal for a Gaussian purification, i.e., we will prove that after finding the optimal Gaussian purification |J ABA ′ B ′ with minimal S AA ′ any infinitesimal non-Gaussian change of |J ABA ′ B ′ will not lower S AA ′ . For simplicity of notation, we write |J for the purification |J ABA ′ B ′ on ABA ′ B ′ . We consider the mixed Gaussian state ρ AB . Let us define |J as the optimal Gaussian purification, i.e., a Gaussian state vector such that and such that the entanglement entropy S AA ′ (|J J|) = S(ρ AA ′ ) is minimal among all Gaussian states. In practice, we would choose here N A = N A ′ and N B = N B ′ as suggested by conjecture 2, but this is not important for the argument. As discussed in section III H, we can write the mixed Gaussian ρ AA ′ = exp(−Ĥ AA ′ )/Z withĤ AA ′ = q abξ aξb and Z = e c0 based on formula 8. If we now perturb our optimal purification in a non-Gaussian way, i.e., by applying a unitary with U A ′ B ′ (0) = 1 A ′ ⊗ 1 B ′ , the first law of entanglement entropy [55][56][57] states that the linear change of δS AA ′ around ǫ = 0 is given by However, we note thatĤ AA ′ is a quadratic Hamiltonian, which implies that the first order change of the entanglement entropy will only feel the change of the two-point function of |ψ ǫ . This means that at linear order, we can replace the change of |ψ ǫ by a Gaussian change of the state. However, by assumption the state vector |J has been the optimal Gaussian purification, such that any Gaussian perturbation will always increase the entanglement entropy S AA ′ . Moreover, as the Gaussian purification |J has been assumed to be optimal among all Gaussian purifications, the variation δS AA ′ will vanish at linear order. In summary, even if we allow for non-Gaussian perturbations of |J , we will have However, this does not exclude the possibility of a finite transformation U ǫ to lower the entanglement entropy, but constitutes a first step towards proving that the Gaussian purification is optimal.

VII. DISCUSSION
We have presented a geometric approach to optimize over arbitrary differentiable functions on the manifolds of pure bosonic or fermionic Gaussian states. Our method is based on the well-known gradient descent algorithm, but exploits the natural action of a Lie group to move between different Gaussian states. This way, we can efficiently perform gradient descent with respect to the Fubini-Study metric associated to the manifold of Gaussian states. In the context of variational families, it is an important question if a given manifold satisfies the socalled Kähler property [17], but for the purpose of gradient descent on Gaussian manifolds this property is not important and we show explicitly how our approach can be applied to suitable Gaussian submanifolds (generated by subgroups of the symplectic or orthogonal group).
For the most part of this manuscript, we used a new formalism for the treatment of Gaussian states that largely unifies the bosonic and fermionic case and emphasizes their similarities. This formalism is based on the geometric Kähler structures consisting of a metric G, a symplectic form Ω and a complex structure J on the classical phase space V of the theory, as reviewed in section II. In order to carefully distinguish if a matrix represents a linear map (such as J), a bilinear form (such as G and Ω) or a dual bilinear form (such as g and ω), we used the index position of a respective matrix entry (such as J a b vs. G ab ). As there are many equivalent ways to describe and parametrize Gaussian states, we provided a comprehensive dictionary section III to allow for a seamless conversion between different formalisms. This dictionary may also be of use to other applications involving Gaussian states.
We have further implemented our optimization algorithm numerically to study three example applications that are relevant for condensed matter physics, quantum information and high energy theory, namely finding approximate ground states, computing the Gaussian entanglement of purification (EoP) and finally finding the so-called Gaussian complexity of purification (CoP). For each of these applications, we have reviewed the key ingredients of our optimization procedure, namely an analytical expression for the function and its gradient in terms of the complex structure J parametrizing our Gaussian state family.
In section VI, we combine numerical and analytical insights to support a conjecture on the optimality of Gaussian entanglement of purification, i.e., we claim that for a mixed Gaussian state it is sufficient to optimize entanglement of purification only over Gaussian states. This claim is supported by numerical evidence from small fermionic systems, where we can also perform the full optimization over all purifications and find that it agrees with one over only Gaussian purifications. Moreover, we show analytically that the Gaussian entanglement of purification is locally optimal even in the larger set of non-Gaussian optimizations. Finally, our conjecture also makes a statement about the required number of degrees of freedom (and their distribution) in the purifying subsystem. This is supported by our numerics, as well.
The key reason why we do not need to re-evaluate the Fubini-Study metric at each step of our optimization algorithm lies in the fact that our optimization manifold (Gaussian states or suitable submanifolds) are generated by the Lie group G (Sp(2N, R) for bosons, O(2N, R) for fermions) or a suitable subgroup G ′ . As we have a unitary representation U(M ) of group elements M ∈ G, the Hilbert space inner product is preserved under the leftaction of this group. It therefore suffices to choose an or-thonormal basis of Lie algebra elements at one point (at a given reference state vector |J 0 in the manifold) and this basis will stay orthonormal when moving to other states via the group action U(M ) |J 0 = |M J 0 M −1 . Another advantage is that we naturally ensure to not overparametrize, i.e., we can remove those Lie algebra elements that do not change the reference state |J 0 which ensures via the natural group action that we also do not have redundant directions at other states. All of these desirable properties also apply to other families of pure states, as long as they are generated by from some unitary representation of a Lie group. A prominent example of such families are the so-called group theoretic coherent states introduced by Gilmore [58,59] and Perelomov [60,61]. The only difference to the Gaussian case is that we may not have equally simple analytical formulas for the functions we would like to optimize, such as expectation values (Wick's theorem) or entanglement entropies. Of course, our method also applies to the family of all pure states (projective Hilbert space) and in fact, we already used an appropriately adjusted version of our algorithm when we computed the full non-Gaussian entanglement of purification in section VI B for small fermionic systems (large or bosonic systems are not feasible due to the large or infinite dimension of the associated Hilbert space).