Tangent-space methods for uniform matrix product states

In these lecture notes we give a technical overview of tangent-space methods for matrix product states in the thermodynamic limit. We introduce the manifold of uniform matrix product states, show how to compute different types of observables, and discuss the concept of a tangent space. We explain how to variationally optimize ground-state approximations, implement real-time evolution and describe elementary excitations for a given model Hamiltonian. Also, we explain how matrix product states approximate fixed points of one-dimensional transfer matrices. We show how all these methods can be translated to the language of continuous matrix product states for one-dimensional field theories. We conclude with some extensions of the tangent-space formalism and with an outlook to new applications.


Introduction
The quantum many-body problem is of central importance in diverse fields of physics such as quantum chemistry, condensed-matter physics and quantum field theory. This is the reason that for the last 90 years most of research in theoretical quantum physics has focused on this problem. In the last decade, the field of quantum information has opened a new viewpoint into this problem by rephrasing it in terms of entanglement theory. It has become clear that equilibrium states of strongly-correlated quantum systems are very special, in the sense that they exhibit area laws for the entanglement entropy. This has led to the introduction of tensor-network states, which can be understood as the most general variational wavefunctions exhibiting such area laws. 1 The essential property which makes tensor networks appealing is that they allow for an exponential compression of the many-body wavefunction by modeling the entanglement degrees of freedom in the system, rather than the correlation functions as is done in traditional many-body theory. This makes them interesting both from the conceptual and the computational point of view. First, it has allowed to identify the corner of Hilbert space parameterizing ground states of gapped local Hamiltonians, and this has led to the classification of topological phases of matter for interacting many-body systems. Second, the exponential compression allows to view tensor networks as variational ansätze for which expectation values can be calculated efficiently, and makes them well suited for ground-state calculations of strongly-interacting systems such as the Hubbard model. The ubiquitous density matrix renormalization group (DMRG) is the prime example of this computational approach, and has been used extensively for modeling recent experiments in condensed-matter and atomic physics. DMRG however represents just one aspect of tensor networks. First of all, the matrix product states (MPS) used in that approach can readily be generalized to other geometries and continuum quantum field theories. Second, post-MPS and tensor network methods can be formulated to access spectral and dynamical information about the systems of interest. The natural way of describing those novel tensor network methods is through the low-dimensional manifold that those states span in the full Hilbert space. This manifold picture provides a unifying framework by which both DMRG and time-dependent and spectral MPS methods can be understood.
The main objective of these lecture notes is to highlight the novel aspects of quantum tensor networks that are made apparent by looking at them through the lens of this manifold picture and, more specifically, by studying the tangent spaces of this manifold. Those tangent spaces play a central role as they parameterize the directions in Hilbert space which are accessible from within the manifold. It will be shown that the time-dependent variational principle, a unifying way of looking at both stationary and time-dependent methods for dealing with tensor networks, amounts to projecting the full Hamiltonian on this tangent space. It turns out that the elementary excitations or quasiparticles in the full many-body system can also be understood by such a projection, and even topological nontrivial excitations such as spinons and anyons can be understood within this framework.
The structure of the lecture notes is as follows. In Sec.2 we introduce uniform matrix product states (MPS) in the thermodynamic limit, compute expectation values and derive canonical forms. In Sec. 3 we discuss the notion of a tangent space on the MPS manifold, and derive efficient parametrizations and associated expressions for the tangent-space projectors. In Sec. 4 these notions are used to develop variational ground-state optimization algorithms, and the vumps algorithm is explained in detail. In Sec. 5 the time-dependent variational principle is derived, and we discuss how the flow equations are integrated. In Sec. 6 we introduce the quasiparticle excitation ansatz, and show how to compute elementary excitation spectra of generic spin chains. Finally, we extend all the previous notions to the case of one-dimensional transfer matrices in Sec. 7, and we look at the continuous version of MPS for one-dimensional field theories in Sec. 8. We close these lecture notes with an outlook to further applications and extensions in Sec. 9.

Matrix product states in the thermodynamic limit
In this first section, we will introduce the class of translation-invariant MPS in the thermodynamic limit [6,7]. As we will see, these uniform matrix product states have some useful properties that enable us to work with them in a computationally efficient way. In the next sections, we will show that, although most state-of-the-art MPS-based methods rather work on finite systems, working directly in the thermodynamic limit has a number of conceptual and numerical advantages.

Uniform matrix product states, gauge transformations and canonical forms
A uniform MPS in the thermodynamic limit is introduced as where A s is a D × D matrix for every entry of the index s. Alternatively we can interpret the object A as a three-index tensor of dimensions D × d × D, where d is the dimension of the physical Hilbert space at every site in the chain and D is the so-called bond dimension. The latter determines the amount of correlations in the MPS and can be tuned in numerical simulations -it is expected that MPS results for gapped systems are exact in the limit D → ∞, and the complexity of all MPS algorithms scales as O(D 3 ). In these lecture notes, we will make use of the diagrammatic language of tensor networks. In this language we represent tensors by geometrical shapes where the indices are indicated by lines sticking out; whenever two indices of two different tensors are contracted (i.e., summed over), the corresponding legs are connected in the diagram. Using this language, we can represent a uniform MPS as In this representation, the right-hand side is a big tensor, written as a contraction of a number of smaller tensors, describing the coefficient for a given configuration of spins that appears in the superposition in Eq. (1). In the following, we will just use the notation that the right-hand side is the state itself. In this diagrammatic representation, the definition of a uniform MPS is obvious: we just repeat the same tensor A on every site in the lattice, giving rise to a state that is translation invariant by construction. 2 In Eq. (1) we have also introduced two boundary vectors v † L and v R , but, as we work on an infinite system, the boundary conditions will never have any physical meaning. Indeed, translation-invariant MPS for which the boundary conditions do matter are called non-injective, and correspond to macroscopic superpositions (cat states), where the specific superposition is encoded in the boundary vectors. Non-injective MPS tensors appear with measure zero in the space of all possible MPS tensors and are not considered throughout these notes 3 . For injective MPS (the generic case), we will show that the physical properties (expectation values) of the 2 We could introduce states that are translation invariant over multiple sites by working with a repeated unit cell of different matrices A1, A2, . . . , and all methods that we will discuss can be extended to the case of larger unit cells (see Sec. 9). 3 We refer the reader to Ref. [8] for additional details.
state |Ψ(A) only depend on the tensor A, and therefore the MPS tensor truly describes the bulk properties of the state. The central object in all our calculations is the transfer operator or transfer matrix, defined as which is an operator acting on the space of D × D matrices. From its definition it follows that the transfer matrix is a completely positive map [9], where the MPS matrices A s play the role of Kraus operators. The transfer matrix has the property that the leading eigenvalue is a positive number η, which should be scaled to one by rescaling the MPS tensor as A → A/ √ η for a proper normalization of the state in the thermodynamic limit. In the generic (i.e. injective) case, this leading eigenvalue is non-degenerate 4 are positive matrices. They can be normalized such that Tr(lr) = 1, or, diagrammatically, l r = 1.
The infinite product reduces to a projector on the fixed points, so that the norm reduces to the overlap between the boundary vectors and the fixed points. We will now choose the boundary vectors such that these overlaps equal unity -there is no effect of the boundary vectors on the bulk properties of the MPS anyway -so that the MPS is properly normalized as Ψ(Ā)|Ψ(A) = 1.
Although the state is uniquely defined by the tensor A, the converse is not true, as different tensors can give rise to the same physical state. This can be easily seen by noting that the gauge transform A → X −1 A X (8) leaves the state in Eq. (1) invariant. In fact, it can be shown [8,10] that this is the only freedom in the parametrization 5 , and it can be fixed (partially) by imposing canonical forms on the MPS tensor A.
As is well known from DMRG and other MPS algorithms on finite chains, the use of canonical forms helps to ensure the numerical stability of the resulting algorithms, and this extends to algorithms for infinite systems discussed below. First, we can always find a representation of |Ψ(A) in terms of a new MPS tensor A L such that the MPS tensor obeys the following condition The matrix L is found by decomposing the fixed point l of A as l = L † L, because with that choice we indeed find The representation of an MPS in terms of a tensor A L is called the left-orthonormal form. This gauge condition still leaves room for unitary gauge transformations, which can be used to bring the right fixed point r in diagonal form. Similarly, a right-orthonormal form A R can be found such that and where the left fixed point l is diagonal. These left-and right-orthonormal forms now allow us to define a mixed gauge for the uniform MPS. The idea is that we choose one site, the 'center site', bring all tensors to the left in the left-orthonormal form, all the tensors to the right in the right-orthonormal form, and define a new tensor A C on the center site. Diagrammatically, we obtain the following form This mixed gauge form has an intuitive interpretation. First of all, we introduce a new tensor C = LR which implements the gauge transform that maps the left-orthonormal tensor into the right-orthonromal one, and which defines the center-site tensor A C : This allows us to rewrite the MPS with only the C tensor on a virtual leg, linking the left-and right orthonormal tensors, In a next step, the tensor C is brought into diagonal form by performing a singular-value decomposition C = U SV † , and taking up U and V † in a new definition of A L and A R -remember that we still had the freedom of unitary gauge transformations on the left-and right-canonical form: The above form of the MPS, with a diagonal C, now allows to straightforwardly write down a Schmidt decomposition of the state 6 across an arbitrary bond in the chain: where the states are orthonormal states on half of the lattice, This implies that the diagonal elements C i in this (diagonal) mixed canonical form are exactly the Schmidt numbers of any bipartition of the MPS. The bipartite entanglement entropy is given by Next we discuss how to characterize the overlap or fidelity between two uniform MPS. Given two properly normalized MPS |Ψ(A 1 ) and |Ψ(A 2 ) , the overlap is given by Ψ(Ā 2 )|Ψ(A 1 ) = . . .
This expression is either one (up to a phase factor) or zero, depending on whether λ max (E 1 2 ), the largest eigenvalue (in magnitude) of this mixed transfer matrix E 1 2 , is on the unit circle or not. Supposing that we have fixed the relative phase between the two tensors A 1 and A 2 such that λ max (E 1 2 ) is positive, the overlap is given by This result is known as the orthogonality catastrophe, according to which states in the thermodynamic limit are either equal or orthogonal. The condition λ max (E 1 2 ) = 1 is indeed sufficient to conclude that there exists a gauge transformation between A 1 and A 2 . A more physical quantity to express whether two MPS in the thermodynamic limit are 'close', is the fidelity per site, which exactly corresponds to

Truncating a uniform MPS
The mixed canonical form enables us to truncate an MPS efficiently [12], which is one of the primitive tasks in any MPS toolbox. Such a problem typically occurs when one is multiplying a MPS with a matrix product operator (see Sec. 7), for which one is interested in reducing the bond dimension again. The sum in the above Schmidt decomposition can be truncated, giving rise to a new MPS that has a reduced bond dimension for that bond. This truncation is optimal in the sense that the norm between the original and the truncated MPS is maximized, but the resulting MPS is no longer translation invariant -it has a lower bond dimension on one leg. We can, however, introduce a translation invariant MPS with a lower bond dimension by transforming every tensor A L or A R as in Eq. 17, but where we have truncated the number of columns in U and V , giving rise to the isometriesŨ andṼ . The truncated MPS in the mixed gauge is then given by withS the truncated singular values of C, and This procedure is not guaranteed to find the MPS with a lower bond dimension that is globally optimal, in the sense that it minimizes the error on the global (thermodynamic limit) state. A variational optimization of the cost function would find the optimal truncated MPS tensor A, but the above approximate algorithm has, of course, the advantage of being numerically efficient. In Sec. 3 we will discuss a variational method for optimizing this cost function.

Algorithm for finding canonical forms
Above we have seen that the set of uniform MPS can be parametrized in two different ways: (i) the uniform gauge, where we have one tensor A that is repeated on every site in the chain as in Eq. (1), and (ii) the mixed gauge, where we have a set of three matrices {A L , A R , C} obeying the relation (15), specifying the MPS as in Eq. (14).
In the algorithms in these notes we will often need to switch between these two gauges, so that a reliable algorithm is needed for extracting a set {A L , A R , C} from a given uniform MPS tensor A. In principle the above relation (9) yields an algorithm for finding a left-orthonormal tensor A L , and a similar relation yields A R and C. In practice, however, this algorithm is suboptimal in terms of numerical accuracy. While l and r are theoretically known to be positive hermitian matrices (up to a phase), at least one of them will nevertheless have small eigenvalues, say of order η, if the MPS is supposed to provide a good approximation to an actual state. In practice, l and r are determined using an iterative eigensolver (Arnoldi method) and will only be accurate up to a specified tolerance , so that hermiticity and positivity of the smallest eigenvalues might be violated and need to be 'fixed'. Upon taking the 'square roots' L and R, the numerical precision will go down to min( √ , / √ η). Indeed, computing L and R from l and r is analoguous to computing the singular values of a matrix M from the eigenvalues of M † M . Furthermore, gauge transforming A with L or R requires the potentially ill-conditioned inversion of L and R, and will typically yield A L and A R which violate the orthonormalization condition in the same order / √ η. Both problems are resolved by taking recourse to single-layer algorithms, i.e.
algorithms that only work on the level of the MPS tensors in the ket layer, and never consider operations for which contractions with the bra layer are needed. Suppose we are given an MPS tensor A, and we want to find the left-orthonormal tensor A L and the matrix L, such that A L = L −1 AL. 7 The idea is to solve the equation LA L = AL iteratively, where in every iteration (i) we start from a matrix L i , (ii) we construct the tensor L i A, (iii) we take a QR decomposition to obtain A i+1 L L i+1 = L i A, and (iv) we take L i+1 to the next iteration. The QR decomposition is represented diagrammatically as Because the QR decomposition is unique -in fact, it is made unique by the additional condition that the diagonal elements of the triangular matrix be positive -this iterative procedure is bound to converge to a fixed point for which L (i+1) = L (i) = L and A L is left orthonormal by construction: The convergence rate of this approach is the same as that of a power method for finding the left fixed point l = L † L of A, which is typically insufficient if the transfer matrix has a small gap. We can however speed up this QR algorithm by, after having found an updated guess A i+1 L according to Eq. (28), further improving the guess L i+1 by replacing it with the fixed pointL i+1 of the map 7 We apply a slight abuse of notation here: The expressions AX and XA, with A an MPS tensor and X a matrix are meant as A s X, ∀s. λ ← L , L ← λ −1 L Normalize new L and save norm change 6: δ ← L − L old Compute measure of convergence 7: while δ > η do Repeat until converged to specified tolerance 8: (∼, L) ←Arnoldi(X → E(X), L, δ/10) Compute fixed point of transfer map in Eq. (30) using initial guess L, up to a tolerance depending on δ (U, C, V ) ← Svd(C) Diagonalize C matrix 5: return A L , A R , C, λ 8: end procedure which can be found by an Arnoldi eigensolver. Note that we don't need to solve this eigenvalue problem forL i+1 to high precision early in the algorithm. In particular, we don't want to restart the eigensolver (and thus only build the Krylov subspace once), as the outer iteration i of the algorithm acts as the restart loop. The resulting algorithm for left orthonormalization is presented in Algorithm 1 and a similar algorithm for right orthonormalization follows readily. Algorithm 2 combines both to impose the mixed gauge with diagonal C.

Computing expectation values
Suppose we want to compute the expectation value of an extensive operator where the extra factor |Z| −1 represents the number of sites, and is introduced to obtain a finite value in the thermodynamic limit -in fact, we are evaluating the density corresponding to operator O. Because of translation invariance, we only have to evaluate one term where O acts on an arbitrary site. The expectation value is then -assuming the MPS is already properly normalized We can now use the left and right fixed points of the transfer matrix to contract everything to the left and to the right of the operator, to arrive at the contraction An even easier contraction is obtained by going to the mixed gauge, and locating the center site where the operator is acting. Indeed, then everything to the left and right is contracted to the identity and we obtain A two-site operator such as a hamiltonian term h is evaluated as Correlation functions are computed similarly. Let us look at where m and n are abritrary locations in the chain, and, because of translation invariance, the correlation function only depends on the difference m − n. Again, we contract everything to the left and right of the operators by inserting the fixed points l and r, so that From this expression, we learn that it is the transfer matrix that determines the correlations in the ground state. Indeed, if we apply the eigendecomposition, we can see that the correlation function reduces to The first part is just the product of the expectation values of O α and O β , called the disconnected part of the correlation function, and the rest is an exponentially decaying part. This expression implies that connected correlation functions of an MPS always decay exponentially, which is one of the reasons why MPS are not well suited for capturing critical states. The largest λ, i.e. the second largest eigenvalue of the transfer matrix, determines the correlation length ξ and the pitch vector of the correlations Q as 8

The static structure factor
In experimental set-ups, one typically has access to the Fourier transform of the correlation function, called the (static) structure factor. Since we are working in the thermodynamic limit, we can easily compute this quantity with a perfect resolution in the momentum range [0, 2π). The structure factor corresponding to two operators O α and O β is defined as where . . . c denotes that we only take the connected part of the correlation function. This can be implemented by redefining the operators such that their expectation value is zero, This quantity can be computed directly in momentum space by a number of steps. First, due to translation invariance, every term in Eq. (41) only depends on the difference (m − n), so that we can eliminate one of the two sums, Every term in the sum has the form of a connected correlation function of the form but we can resum all these diagrams in an efficient way. Indeed, all terms where the operator O β is to the right of O α can be rewritten as The geometric series are worked out as where we have defined a regularized transfer matrixẼ as and P is the projector on the fixed points. Since the spectral radius ofẼ is strictly smaller than one, the geometric series converges and we can replace it with the inverse (1 − e −iqẼ ) −1 . The second term above containing the projector P could lead to a divergent part, but does not contribute because we have which we have set to zero in the definition of the operators O α and O β . We define a 'pseudo-inverse' of an operator as implying that we project out the fixed point of E, and take the inverse of the operator on the complement. The part where O β is to the left of O α is treated similarly, and we also have the term where both are acting on the same site; we obtain the following final expression: Note that we don't have to compute the pseudo-inverse (1 − e −iq E) P explicitly -that would entail a computational complexity of O(D 6 ). Instead, we will compute e.g. the partial contraction i.e. the action of the pseudo-inverse (1 − e −iq E) P on a given left-hand side, as the solution of a linear problem of the form (1 − e −iqẼ ) × x = y. This linear equation can then again be solved using Krylov-based interative methods (generalized minimal residual or biconjugate gradient methods), where only the action of (1 − e −iqẼ ) on a given vector needs to implemented. This reduces the computational complexity to only O(D 3 ). We can compute the right-side partial contraction in the same way, so that we can compute the structure factor as a simple contraction In the mixed canonical form, this expression reduces to where we have introduced the notations for the transfer matrices Here, we have chosen to associate the location of the center site in ket (bra) with the position where O α (O β ) acts, as this will generalize when discussing quasiparticle excitations in Sec.6.

The tangent space and tangent vectors
Let us now introduce the unifying concept of these lecture notes: the MPS tangent space. First, we interpret the set of uniform MPS with a given bond dimension as a manifold [11] within the full Hilbert space of the system we are investigating. The manifold is defined by the map between the set of D × d × D tensors A and physical states in Hilbert space |Ψ(A) . The resulting manifold is not a linear subspace as any sum of two MPS with a given bond dimension D clearly does not remain within the manifold. Therefore, it makes sense to associate a tangent space [15] to every point |Ψ(A) . By differentiating with respect to the parameters in A, an (overcomplete) basis for this tangent space is obtained. The MPS manifold, with a tangent space associated to every point, is represented graphically in Fig. 1.
A tangent vector is defined as where i is a collective index (combined physical and virtual indices) for all entries of the tensor A and is summed over as in the summation convention. The new tensor B describes a general linear combination of the partial derivatives and parametrizes the full tangent space; obviously, the tangent space is a linear subspace of the Hilbert space. The overlap between two tangent vectors can be written as where G ij (Ā, A) = ∂ i Ψ(Ā)|∂ j Ψ(A) is the Gram matrix or the metric on the tangent space as parametrized by the tensor B. As we will see later on, this Gram matrix is singular because of the over-completeness of the basis of partial derivatives, which can be traced back to the gauge redundancy in the MPS description. In the following sections, we often need an expression for the projector that implements an orthogonal projection of a given vector |χ in Hilbert space onto the tangent space. This expression is found by realizing that, due to the Euclidean inner product in Hilbert space, we are in fact looking for the tangent vector |Φ(B, A) which maximizes the overlap with the vector |χ , or min In the following we will see that this condition implies that the tangent-space projector formally looks like As the Gram matrix is singular in general, this expression is not well-defined, and we first have to find a good parametrization of the tangent space that eliminates all singular parts. In the following two subsections we work out two different parametrizations, describe the properties and derive the expressions for the corresponding tangent-space projectors.

Tangent vectors in the uniform gauge
If we work in the uniform gauge, the MPS is parametrized by a single tensor A, and a general tangent vector has the form The over-completeness in the parametrization of tangent vectors follows from studying the infinitesimal gauge transform G = e X of |Ψ(A) . To first order, we obtain which can be brought to the level of states, with B s = A s X − XA s . But, since this is a gauge transform in the MPS manifold, the tangent vector |Φ(B; A) should be zero. This implies that every transformation on B of the form with X an arbitrary D × D matrix, leaves the tangent vector |Φ(B; A) invariant. This gauge freedom can be easily checked by substituting this form in the state (60), and observing that all terms cancel, leaving the state invariant. The gauge degrees of freedom can be eliminated by imposing a gauge-fixing condition, which can again be chosen so as to be useful from an algorithm perspective. The easiest choice is the so-called left gauge-fixing condition (there is of course a right one, too), given by Note that this corresponds to D 2 scalar complex-valued equations, whereas we have D 2 complex parameters in the gauge transform X in Eq. (63). However, the component X ∼ 1 does not actually modify B and should therefore not be counted. If we try to explicitly transform a given B according to Eq. (63) so that it satisfies the left gauge-fixing condition [Eq. 64], this amounts to the equation As the left hand side of this equation is annihilated when contracting with r, it can only have a solution for X if also the right hand side satisfies This is, however, a natural condition, because this is precisely saying that the tangent vector is orthogonal to the original MPS. Indeed, one can easily see that the overlap between an MPS and a tangent vector is given by The factor corresponds to the system size, which diverges in the thermodynamic limit. We will denote this diverging factor as 2πδ(0), inspired by the representation of the δ function as n∈Z e ipn = 2πδ(p).
For a tangent vector |Φ(B, A) that is orthogonal to |Ψ(A) , we can then always find a parametrization in terms of a tensor B satisfying Eq. (63) by solving the linear system for the gauge transorm X using (1 −Ẽ) −1 = (1 − E) P as regularized inverse, exactly as we have seen in Sec. 2.5. The restriction to tangent vectors that are orthogonal to the original MPS is crucial in several of the algorithms that follow. In fact, we can implement the left gauge-fixing condition [Eq. (64)] explicitly by constructing an effective parametrization for the B tensor that automatically fixes all gauge degrees of freedom, and which has some nice advantages for all later calculations. First, we construct the tensor V L such that where the right index of V L has dimension D(d − 1). Put differently, V L corresponds to the D(d − 1)-dimensional null space of the matrix We orthonormalize V L as Next, the B tensor is expressed in terms of a new matrix X as where X is a (D(d − 1) × D)-dimensional tensor. This parametrization of the B tensor constitutes an effective parametrization for the tangent space that automically enforces the left-gauge fixing condition and eliminates all degrees of freedom. Yet, the great advantage of this particular choice of effective parametrization concerns the overlap between two tangent vectors. The overlap between |Φ(B; A) and |Φ(B ; A) is computed similarly to the structure factor in Sec. 2.5: we have two infinite terms, but we can eliminate one sum because of the translation invariance of the MPS; this sum will again result in a factor 2πδ(0). There still remains a sum over all relative positions between B and B . Now the power of the left gauge fixing condition is revealed: all terms vanish, except the term where B and B are on the same site. Indeed, all terms of the form are automatically zero because of Eq. 64. Consequently, the norm reduces to or in terms of the effective parameters in X and X , The fact that the overlap of tangent vectors reduces to the Euclidean inner product on the effective parameters X and X will prove to be a very useful property in all tangent-space algorithms. More formally, this implies that the Gram matrix (see Eq. (57)) for the tangent space as parametrized by the matrix X reduces to the unit matrix.
With this effective parametrization of the tangent space in place, we can now derive the tangent-space projector P A , i.e. the operator that orthogonally projects any state |χ onto the tangent space associated to a given MPS |Ψ(A) . The orthogonal projection on a linear subspace of Hilbert space is equivalent to minimizing As this minimization problem is quadratic in X andX with a quadratic term Tr(X † X), the solution is given by X = ∂X (. . . ). Since the overlap between two tangent vectors is given by Eq. (76), the solution of the minimization problem is found as or, if |χ is translation invariant we can cancel the 2πδ(0) factors, The vector that results from the tangent-space projector should again be of the form of Eq. 60, so we transform the above X tensor back into the form of a B tensor according Eq. 73, such that we have yielding the final form of the tangent space projector as

Tangent vectors in the mixed gauge
The above tangent-space projector contains inverses of l and r, which are potentially ill-conditioned. Therefore, we also derive the expression for the projector in the mixed gauge. We first write down a tangent vector in the mixed gauge: The crucial difference with the standard form of the tangent vector is that the elements of B are now not directly related to derivatives with respect to the parameters in the MPS tensors A L and A R , and that we need to derive the projector onto the tangent space in a slightly more involved way. As we still have the gauge freedom B → B + A L X − XA R , we again start by imposing the left-gauge fixing condition, which now has the simpler form We define the effective parametrization of the tangent vector in terms of the matrix X as where the tensor V L obeys the usual conditions Interpreting A L as the first D columns of a (Dd) × (Dd) unitary matrix, V L corresponds to the remaining D(d − 1) columns thereof. This effective parametrization has the same useful properties as before, but now does not require taking any inverses of l or r. Suppose now again we have an abritrary state |χ , which we want to project on a given tangent vector |Φ(B; A L , A R ) . This is equivalent to minimizing We have a quadratic minimization problem as before and, since the Gram matrix with respect to the effective tangent-space parametrization X is again the unit matrix, the solution of the minimization problem is found as which yields The corresponding tangent vector is This form of the projector can be cast into an even more useful form for the tangent-space algorithms below, by rewriting the projector on V L as so that the final form of the tangent space projector is given by In contrast to the simpler form of the previous section, in the mixed canonical representation the tangent-space projector has two different terms, but, precisely because we work with a mixed canonical form, the projector does not require the inversion of potentially ill-conditioned matrices l −1/2 and r −1/2 .

Variational optimization of the overlap
The concept of a tangent space and its associated projector now allow us to formulate variational MPS methods that implement energy optimization for ground-state approximations, real-time evolution within the MPS manifold, or low-energy excitations on top of an MPS ground state.
In the following sections we will develop these algorithms in full detail, but here we can already explain a simple variational algorithm for maximizing the overlap of an MPS |Ψ(A) with a given reference MPS |Ψ(Ã) . Typically, the latter has a larger bond dimension and the following algorithm is a variational method for truncating an MPS, which is one of the primitive tasks in any MPS toolbox (remember Sec. 2.2 for a non-variational method for truncating a uniform MPS). The optimization algorithm can be written down as Because of the orthogonality catastrophe, this objective function might seem ill-defined in the thermodynamic limit, as it evaluates to either zero or one. Nevertheless, the resulting extremal condition (where |Ψ(A) is assumed to be normalized) is valid and meaningful in the thermodynamic limit, and states that |Ψ(Ã) , after subtracting the contribution parallel to |Ψ(A) , should be orthogonal to the tangent space of the MPS manifold at the point |Ψ(A) . This condition serves as a variational optimality condition, in the sense that there are no infinitesimal directions on the manifold that improve the overlap in first order. Geometrically, we can write this condition as P A |Ψ(Ã) = 0, with P A the projector onto the tangent space (or at least that part of tangent space which is itself orthogonal to |Ψ(A) ).
Using the above derivation of the tangent-space projector, we can work out this expression as and Together with the consistency equations for the mixed gauge, an optimal MPS is therefore characterized by the equations It is clear that these equations are satisfied if an MPS is found in the mixed gauge {A L , A R , C, A C } such that A C = A C and C = C. A straightforward algorithm for finding this fixed-point solution is a simple power method: start from a random MPS {A 0 L , A 0 R , C 0 , A 0 C }, in every iteration (i) compute a new A C and C from the above equations, (ii) distract a new A i L and A i R , and repeat until convergence. Each iteration requires that we can represent the infinite strips in Eqs. (95) and (96) or that we find the fixed points of the maps Indeed, this allows us to rewrite the above equations for A C and C as A more involved step requires us to extract a new A i L and A i R from a A C and C . In Sec. 4.4 we show how to efficiently do this. These steps are summarized in Algorithm 3.

Finding ground states
Now that we have introduced the manifold of matrix product states and the concept of the tangent space, we should explain how to find the point in the manifold that provides the best approximation for the ground state of a given hamiltonian H. In these notes, we only consider nearest-neighbour interactions so that the hamiltonian is of the form where h n,n+1 is a hermitian operator acting non-trivially on the sites n and n + 1. We refer the reader to Ref. [16] for the generalization to arbitrary long-range hamiltonians.

Algorithm 3 Variational algorithm for maximizing overlap with MPS |Ψ(Ã)
until δ < η 11: return A L , A R , C, λ 12: end procedure As in any variational approach, the variational principle serves as a guide for finding groundstate approximations, viz. we want to minimize the expectation value of the energy, In the thermodynamic limit the energy diverges with system size, but, since we are working with translation-invariant states only, we should rather minimize the energy density. Also, we will restrict to properly normalized states. Diagrammatically, the minimization problem is recast as Traditionally, this minimization problem is not treated directly, but recourse is taken to imaginary-time evolution using the time-evolving block decimation algorithm [6,17], or to infinite DMRG methods [18]. In this section, we will rather treat this problem in a more straightforward way, in the sense that we will use numerical optimization strategies for minimizing the energy density directly. This approach has the advantage that it is, by construction, optimal in a global way, because we never take recourse to local updates of the tensors -we always use routines that are optimal for the MPS wavefunction directly in the thermodynamic limit. As a result, we have a convergence criterion on the energy density for the infinite system.

The gradient
Any optimization problem relies on an efficient evaluation of the gradient, so let us first compute this quantity. The objective function f that we want to minimize is a real function of the complex-valued A, or, equivalently, the independent variables A andĀ. The gradient g is then obtained by differentiating f (Ā, A) with respect toĀ, 9 where we have clearly indicated A andĀ as independent variables and e is the current energy density given by In the implementation we will always make sure the MPS is properly normalized, such that the numerators drop out. Furthermore, we subtract from every term in the hamiltonian its current expectation value so that the gradient takes on the simple form The gradient is obtained by differentating the expression with respect toĀ. It is given by a sum over all sites, where in every term we differentiate with one tensorĀ in the bra layer. Differentiating with respect to oneĀ tensor amounts to leaving out that tensor, and interpreting the open legs as outgoing ones, i.e. each term looks like For summing the infinite number of terms, we will use the same techniques as we did for evaluating the structure factor [Sec. 2.5]. Instead of varying the open spot in the diagram, we will vary the location of the hamiltonian operator h. Then, we first treat all terms where h is either completely to the left or to the right of the open spot, by defining the partial contractions 9 Numerical optimization schemes are typically developed for functions over real parameters. In order to translate these algorithms to complex parameters, we take x = xr + ixi, and take the gradient g = gr + igi with gr = ∂x r f and gi = ∂x i f , which is equal to g = 2∂xf (x,x).
As we have seen, taking these pseudo-inverses is equivalent to summing the infinite number of terms. Note that, because the expectation value of h is by definition subtracted in the gradient of the normalized energy expectation value, we indeed only need to take the connected part into account and no diverging δ-contribution is present. The partial contractions above are combined with the two contributions where h acts on the open spot, so that we have the final expression for the gradient As such, the gradient is an object that lives in the space of MPS tensors. However, we can further exploit the manifold structure of MPS by interpreting the gradient as a tangent vector -the gradient is, indeed, supposed to indicate a direction on the manifold along which we can lower the energy. In order to consistently interpret the gradient as a tangent vector, we need some additional steps. First, let us note the meaning of the gradient as defined above in terms of the first-order approximation of a change in the tensor A + B where vectorized versions of tensors are denoted in bold. Now we realize that g is a vector in the space of tensors, not a state in the full Hilbert space. So how do we lift the notion of the gradient to the level of a state? Note that an infinitesimal change in the tensor A + B corresponds to a tangent vector so that we would like to write the first-order change in the energy through an overlap between this tangent vector and a 'gradient vector' |Φ(G; A) We know, however, that building |Φ(G; A) using the tensor g is not correct, because the overlap Φ(G; A)|Φ(B; A) = G † B. Instead, we will have to determine the tangent-space gradient by its reduced parametrization, where, as usual so that the tangent-space gradient is given by the usual expression for a tangent vector, with the tensor G given by The difference between these two notions of a gradient can be elucidated by looking at the variational manifold from the perspective of differential geometry. Whereas the gradient is a covariant vector living in the cotangent bundle, we can use the (non-trivial) metric of the MPS manifold to define a corresponding (contravariant) vector living in the tangent bundle [11,15].
The latter is what we have defined as the tangent-space gradient. Note that we can also derive the expression for the tangent-space gradient from the tangentspace projector in Eq. (81). Indeed, we can readily check that This expression for the gradient will be the starting point for the vumps algorithm in Sec. 4.4.

Optimizing the tensors
Using these expressions for the different types of gradient, we can easily implement a gradientsearch method for minimizing the energy expectation value. The first obvious option is a steepest-descent method, where in every iteration the tensor A is updated in the direction of the parameter-space gradient: The size of α is determined by doing a line search: we find a value for which the energy density has decreased. In principle, we could try to find the optimal value of α, for which we can no longer decrease the energy by taking the direction −g in parameter space. In practice, we will be satisfied with an approximate value of α, for which certain conditions [19] are fulfilled. Other optimization schemes based on an evaluation of the gradient, such as conjugate-gradient or quasi-Newton methods, are more efficient. Even more efficient would be an algorithm that requires an evaluation of the Hessian, which in principle we can also do with the techniques above.
As another road to more efficient optimization schemes we could take the tangent-space gradient a bit more seriously. A first option amounts to computing the tangent-space gradient, and then update the A tensor by simply adding them in parameter space, i.e. do a line search of the form This scheme can, again, be improved by implementing conjugate-gradient or quasi-Newton optimization methods. It is expected that the use of the tangent-space gradient yields a more efficient energy optimization, because it takes the structure of the manifold (embedded in Hilbert space) into account. Taking a step further in this approach, instead of just adding G in parameter space, we would like to do a line search along geodetic paths through the manifold, which would involve integrating the geodesic equation.
It remains an open question, however, whether this could lead to more efficient optimization schemes. Crucially, this way of variationally optimizing an MPS has a clear convergence criterion: we say that we have reached a -possibly local -optimum if the norm of the gradient is sufficiently small.

The energy variance
In any variational approach, finding an optimal set of parameters does not guarantee that the state provides a good approximation to the true ground state of the hamiltonian. We do have access, however, to an unbiased measure of how well the MPS approximates any eigenstate of the system, called the variance. It is defined by where we have subtracted the ground-state energy density from the local nearest-neighbour term in the hamiltonian, i.e. h n,n+1 → h n,n+1 − Ψ(Ā)| h n,n+1 |Ψ(A) . This quantity can be readily interpreted as a zero-momentum structure factor, so we can apply the formulas from Sec. 2.5. The formulas are a bit more complicated, since we have a two-site operator. In the end, the variance is given by This quantity is supposed to be a small number obtained by a cancellation of large terms, and therefore this expression can give rise to errors. In Ref. [20] a slightly different error was proposed that avoids this numerical inaccuracy.

The vumps algorithm
We can now combine the notion of a tangent-space gradient with the use of the mixed gauge in order to develop an efficient variational ground-state optimization algorithm known as vumps 10 [16]. We have seen above that a variational optimum is characterized by the condition that the gradient be zero. The error measure is therefore given by In the mixed canonical form this vector is given by using the tangent-space projector where A C and C are given by These equations define effective hamiltonians H A C (·) and H C (·) such that Since the gradient should be zero in the variational optimum, we can characterize this point In turn this implies that in the optimum the MPS should obey the following set of equations The vumps algorithm consists now of an iterative method for finding an {A L , A R , A C , C} that satisfies these equations simultaneously. In every step of the algorithm we first solve the two eigenvalue equations, yielding two new tensorsÃ C andC. An obvious choice for the global updates now seems to be given byÃ L =Ã CC −1 andÃ R =C −1Ã C , but then we face additional problems. Away from the converged solution, the resultingÃ L andÃ R will in general not be isometric. Furthermore, we would like to avoid taking (possibly ill-conditioned) inverses ofC altogether, as this ruins the advantage of working with the center site. As an alternative strategy, we determine global updatesÃ L andÃ R as the left and right isometric tensors that minimize In exact arithmetic, the solution of these minimization problems is known, namelyÃ L will be the isometry in the polar decomposition ofÃ CC † . Computing the singular-value decompositions yieldsÃ Notice that close to (or at) an exact solution A s C = A s L C = CA s R , the singular values contained in Σ l/r are the square of the singular values of C, and might well fall below machine precision. Consequently, in finite precision arithmetic, corresponding singular vectors will not be accurately computed. An alternative that has proven to be robust and still close to optimal is given by directly using the following left and right polar decompositions to obtainÃ This concludes one step of the iteration procedure, yielding a new set of tensors {Ã L ,Ã R ,Ã C ,C}. This process is repeated until the norm of the gradient is sufficiently small. Another error measure is the value of either L or R , which can be proven to be proportional to close to convergence.

The time-dependent variational principle
Although DMRG was originally developed for finding the ground state, and, possibly, the first low-lying states of a given hamiltonian, the scope of DMRG simulations has since been extended to dynamical properties as well. One of the many new applications has been the simulation of time evolution, where the MPS formalism has been of crucial value for coming up with algorithms such as the time-evolving block decimation [21][22][23]. In this section, we discuss another algorithm for simulating time evolution in the thermodynamic limit, based on the time-dependent variational principle (TDVP) [24][25][26]. This approach has the advantage of being variational -in a sense that we will explain below -and, therefore, more controlled, and leading to a set of a symplectic differential equations. The MPS version of the TDVP [7,15,27] has been applied to spin chains [28], gauge theories [29,30], and spin systems with long-range interactions [27,[31][32][33][34].
In these notes we are exclusively interested in the case where the initial state is a uniform MPS and the time evolution is governed by a translation-invariant hamiltonian (global quench). However, the framework can be extended to so-called local quenches as well.

Time evolution on a manifold
The algorithm relies on the manifold interpretation of uniform matrix product states, and, in particular, the concept of a tangent space. We start from the Schrödinger equation, which dictates how a quantum state evolves in time. The problem with this equation is the fact that, for a generic hamiltonian, an initial MPS |Ψ(A) is immediately taken out of the MPS manifold. Nonetheless, we would like to find a path inside the manifold |Ψ(A(t)) , which approximates the time evolution in an optimal way. The time-derivative of this time-evolved MPS is a tangent vector, but, again, the right-hand side is not. Indeed, the vector H |Ψ(A(t)) points out of the manifold, so that an exact integration of the Schrödinger equation is out of the question. FindingȦ for which the corresponding tangent vector provides the best approximation to H |Ψ(A(t)) amounts Note that the solution of this minimization problem is equivalent to projecting the time evolution orthogonally onto the tangent space, This projection transforms the linear Schrödinger equation into a highly non-linear differential equation on a manifold, and is illustrated graphically in Fig. 2. We can work out the formal properties of the TDVP in a bit more detail. The TDVP equation can be rewritten as where we have defined If we note that we obtain the TDVP in the explicit form where the non-linearity follows from the non-linear dependence of both h and G on A (andĀ).
For arbitrary functions f and g of A andĀ, we can now introduce a Poisson bracket (henceforth suppressing the explicit A dependence) as which is clearly anti-symmetric and bilinear, and can be shown to obey the Jacobi conditions. Here, f (and g) would typically represent expectation values f (A,Ā) = Ψ(Ā)|F |Ψ(A) . It follows that the equations of motion for such expectation values are given by This equation, in the end, shows that the TDVP for the MPS manifold gives rise to an effective classical hamiltonian system with corresponding Poisson bracket. These equations are, howewer, highly non-linear because of the non-linearity of the tangent-space projector. Colloquially, we can state that the TDVP approach maps the linear quantum dynamics in an exponentially large Hilbert space to a set of non-linear semi-classical equations of motion for the expectation values, in terms of a smaller number of effective degrees of freedom (the variational parameters). This observation has important consequences with respect to conservation laws. Indeed, it is trival to see that so that the energy expectation value is exactly conserved under time evolution with the TDVP. Also, other conserved quantities are exactly conserved, under the condition that they commute with the tangent-space projector. Indeed, if we have that for the generator of the symmetry is obeyed, one can show that . For the MPS manifold, this is the case for all symmetries which act as a tensor product of one-site gates, i.e. when the generator is a sum of one-site operators.

TDVP in the uniform gauge
Let us now work out the TDVP equation in the uniform gauge. In Sec. 3 we have written down the tangent-space projector in the uniform gauge. Both the original Schrödinger equation and the TDVP evolution are norm preserving (for real time evolution), but introduce a global phase proportional to the total energy, which is divergent with the system size. By imposing that Ψ(A)|Φ(B; A) = 0, this phase is eliminated and norm preservation is explicitly enforced (now even in the case of imaginary time evolution as introduced below). By now, we know very well that an effective parametrization of the tangent vector in terms of the matrix X can be introduced which automatically enforces orthogonality. In order to implement this projector, we compute the matrix element Φ(B; A)| H |Ψ(A) for general B. Again, we have two infinite sums, but one is eliminated because of translation invariance and gives rise to a 2πδ(0) factor. Then we need all terms where the hamiltonian term acts fully to the left and to the right of the B tensor, but this can again be resummed efficiently by introducing pseudo-inverses of the transfer matrix in the following partial contractions: and In addition, we also have the terms where the hamiltonian operator acts directly on the site where the B tensor is located. Putting all contributions together, we obtain The tangent-space projected time evolution is then obtained by taking the tensor F , and compute the time evolution of the MPS tensor according to the TDVP aṡ This procedure for computing the time derivative of the MPS tensor A according to the TDVP is summarized in Algorithm 6. The simplest option for integrating this differential equation for A(t) consists of a simple Euler scheme, where A(t + δt) = A(t) + δtȦ(t), but a numerical integrator that does not destroy the symplectic properties of the TDVP equation is often preferred. At this point, the attentive reader might already have noticed that these formulas are very similar to the ones that we obtained for the gradient of the energy that appears in a ground-state optimization algorithm -the tangent-space gradient is the same as the right hand side of the TDVP equation up to an imaginary factor. The connection is laid bare by noting that another road to a ground-state optimization algorithm is found by implementing an imaginary-time evolution (t → −iτ ) on a random starting point |Ψ(A 0 ) , confined within the MPS manifold. Indeed, in the full Hilbert space, imaginary time evolution results in a projection onto the ground state in the infinite-time limit for almost any initial state |Ψ . When using the TDVP to restrict imaginary time evolution to the manifold of MPS and integrating the resulting equations using a simple Euler scheme, we are effectively performing a steepest-descent optimization with the tangent-space gradient, where in every iteration the line search is replaced by taking a fixed step size α = dτ .

TDVP in the mixed gauge
We can now use this tangent-space projector to write down the TDVP equation for an MPS in the mixed canonical form. We have explained that the optimal way for implementing real-time evolution within the MPS manifold is by projecting the exact time derivative onto the tangent space at every point, i.e.
In the mixed canonical gauge, the tangent space projector decomposes into two different parts corresponding to the two different types of terms in Eq. (91). For each of the terms individually, the corresponding differential equation can be integrated straightforwardly. Let us take the first part, for which we first define the partial contractions which capture the contributions where the hamiltonian is completely to the left and to right of the open spot in the projector. Again, we have used pseudo-inverses for resumming the infinite number of terms. These are combined into such that In order to obtain the second part, we need to contract the above G 1 tensor another time with A L , in order to arrive at The two parts of the differential equation can be solved separately, but in different representations of the MPS. Indeed, if we write the MPS in the mixed canonical form with center site, the first equation is simplyȦ where G 1 (A) is interpreted as a linear operator working on A C according to Eq. (170); the solution is simply A C (t) = e −iG 1 t A C (0). Alternatively, if the MPS is written in the mixed canonical form without center site, the second equation isĊ where the sign difference in the right hand side comes from having a minus sign in the second part of the tangent-space projector. Again, G 2 (A) is seen as a linear operator acting on C according to Eqs. (170) and (172) with corresponding solution given by C(t) = e +iG 2 t C(0). These exponentials can be evaluated efficiently by using Lanczos-based iterative methods.

Integrating the TDVP equations
The meaning of the TDVP equations is slightly different in this mixed canonical form, and a correct interpretation starts from considering the case of a finite lattice. There the meaning is clear: every site in the lattice has a different MPS tensor attached to it, and performing one time step amounts to doing one sweep through the chain. For every step in the sweep at site n, we • start from a mixed canonical form with center site tensorÂ C (n), all tensorsÃ L (n − 2), A L (n − 1), etc, have already been updated, while tensors A R (n + 1) and A R (n + 2), etc., are still waiting for their update, • we update the center-site tensor asÃ C (n) = e −iG 1 (n)δtÂ C (n), • we do a QR decomposition,Ã C (n) =Ã L (n)C(n), • we update the center matrix asĈ(n) = e +iG 2 (n)δtC (n) • we absorb this center matrix into the tensor on the right to define a new center-site tensor A C (n + 1) =Ĉ(n)A R (n + 1).
The version for the infinite system can be derived by starting this procedure at an arbitrary site n in the chain -say n → −∞, so that we will never notice the effect of this abrupt operation in the bulk of the system -and start applying exactly the same procedure until it converges. In this context, convergence at site n would mean that the center-site that we obtain for the next site,Â C (n + 1), would give us the same as the one we started from,Â C (n) =Â C (n + 1). Our real interest, however, goes out to the converged value ofÃ L , because this allows us to obtaiñ A R ,Ã C andC. Only after we have obtained convergence in this sense, we have concluded the integration of one time step δt. This procedure, where each time step requires solving a consistency relation, is very costly in practice, and therefore we propose a simpler integration procedure. The idea is that consistency of the above scheme requires that, after one iteration of the scheme, we find the same C matrix as the one we started from. This allows to turns things around, where we assume that we retrieve the same C matrix after an iteration of the above scheme, evolve it with the time-reversed operator to findC = e −iG 2 δt C. Then, we can find an updatedÃ L andÃ R fromÃ C andC. This scheme for time evolving an MPS {A L , A R , A C , C} to {Ã L ,Ã R ,C,Ã C } after a time step δt then boils down to • time evolve the center-site tensor forward in time,Ã C = e −iG 1 δt A C • time evolve the center matrix backward in time,C = e −iG 2 δt C.
The last step can be done using Algorithm 5.
Note that the imaginary-time version of this last scheme is very close to the vumps algorithm [Sec. 4.4]. Indeed, if we implement the above scheme with imaginary-time steps where the size of the step is taken to infinity, the evolution operators reduce to projectors on the leading eigenvectors, and therefore we recover the eigenvalue equations that are used in vumps. It remains a matter of further investigation to assess whether we could gain efficiency in ground-state optimization by working with finite imaginary-time steps.

Elementary excitations
We have seen that working directly in the thermodynamic limit has a number of conceptual and numerical advantages over finite-size algorithms, but the real power of the formalism is shown when we want to describe elementary excitations. These show up in dynamical correlation functions [see Sec. 6.4] that can be directly measured in e.g. neutron-scattering experiments. Typically, these experiments allow to probe the spectrum within a certain momentum sector, giving rise to excitation spectra that look like the one in Fig. 3. The isolated branches in such a spectrum -these will correspond to δ peaks in the spectral functions, and are seen as very strong resonances in experimental measurements -can be interpreted as quasiparticles, which can be thought of as local perturbations on the ground state, in a plane-wave superposition with well-defined momentum [35]. The rest of the low-energy spectrum can be reconstructed by summing up the energies and momenta of the isolated quasiparticles -in the thermodynamic limit these quasiparticles will never see each other, so these energies and momenta can be simply superposed. This picture implies that all the low-energy properties should in the end be brought back to the properties of these quasiparticles! Crucially, this approach differs from standard approaches for describing quasiparticles in interacting quantum systems. Indeed, typically a quasiparticle is thought of as being defined by starting from a non-interacting limit, and acquires a finite lifetime as interactions are turned on -think of Fermi liquid theory as the best example of this perturbative approach. In contrast, our approach will be variational, as we will approximate exact eigenstates with a variational ansatz. This means that our quasiparticles have an infinite lifetime, and correspond to stationary eigenstates of the fully interacting hamiltonian.
This quasiparticle approach is only guaranteed to be applicable to gapped systems, where isolated excitation branches are expected to arise. Still, the approach continues to work for critical systems as well, where it leads to excellent estimates for gapless dispersion relations. The variational excited states that are obtained are no longer expected to the elementary excitations in these critical systems -it is not even clear if these can be unambiguously defined for a generic interacting (non-integrable) spin chain -but rather correspond to a superposition of many gapless excitations around the Fermi point, leading to a particle with a localized nature in a momentum superposition.

The quasiparticle ansatz
It is in fact very natural to construct quasiparticle excitations on top of an MPS ground state in the thermodynamic limit. The variational ansatz that we will introduce is a generalization of the single-mode approximation [36], which appeared earlier in the context of spin chains, and the Feynman-Bijl ansatz [37], which was used to describe the spectrum of liquid helium or quantum Hall systems [38]. In the context of MPS, a reduced version of the ansatz appeared earlier in Refs. [39,40], but it was explored in full generality in Refs. [15,41]. In recent years, the ansatz has been succesfully applied to spin chains [13,42,43], spin ladders [44], spin chains with long-range interactions [45], field theories [46], and local gauge theories [29].
The quasiparticle ansatz is given by i.e. we change one A tensor of the ground state at site n and make a momentum superposition.
In this whole section we work with tangent vectors in the mixed gauge. The newly introduced tensor B contains all the variational parameters of the ansatz, and perturbs the ground state over a finite region around site n in every term of the superposition -it uses the correlations in the ground state, carried over the virtual degrees of freedom in the MPS to create a lump on the background state. Clearly, these excitations have a well-defined momentum, and, as we will see, a finite energy above the extensive energy (and thus, finite energy density) of the ground state. Before we start optimizing the tensor B, we will investigate the variational space in a bit more detail. First note that the excitation ansatz is, in fact, just a boosted version of a tangent vector, so we will be able to apply all tricks and manipulations of the previous sections. For example, the B tensor has gauge degrees of freedom: the state is invariant under an additive gauge transformation of the form with Y an arbitrary D × D matrix. This gauge freedom can be easily checked by substituting this form in the state (176), and observing that all terms cancel, leaving the state invariant.
The gauge degrees of freedom can be eliminated -they correspond to zero modes in the variational subspace, which would make the variational optimization ill-conditioned -by imposing a gauge fixing condition. Again, we can impose the left gauge-fixing condition 11 We can reuse the method for parametrizing the B tensor such that it automatically obeys this gauge condition: As before, this fixing of the gauge freedom entails that the excitation is orthogonal to the ground state, because This effective parametrization has reduced the number of variational parameters in the quasiparticle ansatz to D 2 (d − 1). Moreover, the overlap between two excitations |Φ p (B) and |Φ p (B ) is computed similarly as before: we have two infinite terms, but we can eliminate one sum because of the translation invariance of the ground state. Now this will result in a 2πδ(p − p ) function, so excitations at different momenta are always orthogonal. Again, the physical norm on the excited states reduces to the Euclidean norm on the effective parameters, This will prove to be a useful property for optimizing the variational parameters. The presence of the δ indicates that these plane wave states cannot be normalized to one, as is well known from single-particle quantum mechanics.

Computing expectation values
Let us first write down the expressions for evaluating expectation values, or more generally, matrix elements of the form where the ground-state expectation value has already been subtracted, i.e. Ψ(A)| O i |Ψ(A) = 0. This implies that we will look at expectation values of O relative to the ground state density. As we will see, this will give rise to finite quantities in the thermodynamic limit. First we notice that the above matrix element is, in fact, a triple infinite sum. Again, one of the sums can be eliminated and yields the factor 2πδ(p − p ) that also appears in the norm Φ p (B )|Φ p (B) , so that henceforth we are only interested in all different relative positions of the operator O, the B tensor in the ket layer, and the B tensor in the bra layer. Let us first define two partial contractions, corresponding to the orientations where O is to the left and to the right of both B and B , Here, we can again use the pseudo-inverse because O was defined with zero ground state expectation value, so that there is no diverging 'disconnected' contribution. Similarly, we define the partial contractions where B travels to the outer left or right of the chain 12 : In these expressions, (1 − e ±ip E) (where E is E L R or E R L ) is not singular for p = 0 and (1 − e ±ip E) P is not really a pseudo-inverse. It is still defined as subtracting the contribution in the subspace corresponding to eigenvalue 1 of E. This arises because the geometric sum ∞ n=0 e ipn is strictly speaking not defined. Thanks to the gauge fixing condition, there is no contribution in this subspace anyway, so that we could also have used the regular inverse. In the following, this should always be kept in mind whenever a (. . . ) P appears.
We use the above expressions to define all partial contractions where B and O are both either to the left or to the right of B , and The e ±ip factors originate from the extra shift in the relative position of B in these terms. The final expression is

Solving the eigenvalue problem
At this point, we still need to find the algorithm for the variational optimization of the B tensor in the excitation ansatz. We have seen that the effective parametrization in terms of an X matrix (i) fixes all gauge degrees of freedom, (ii) removes all zero modes in the variational subspace, (iii) makes the computation of the norm of an excited state particularly easy, and (iv) makes sure the excitation is orthogonal to the ground state, even at momentum zero. The variational optimization boils down to minimizing the energy function, where we have made sure to shift the hamiltonian such that the ground state has energy density zero. Because both numerator and denominator are quadratic functions of the variational parameters X, this optimization problem reduces to solving the generalized eigenvalue problem where the effective energy and normalization matrix are defined as and X is a vectorized version of the matrix X. Now since the overlap between two excited states is of the simple Euclidean form, the effective normalization matrix reduces to the unit matrix, and we are left with an ordinary eigenvalue problem. Solving the eigenvalue problem requires us to find an expression of H eff , or, rather, the action of H eff on a trial vector. Indeed, since we are typically only interested in finding its lowest eigenvalues, we can plug the action of H eff (which is hermitian) into a Lanczos-based iterative eigensolver. This has great implications on the computational complexity: The full computation and diagonalization of the effective energy matrix would entail a computational complexity of O(D 6 ), while the action on an input vector Y can be done in O(D 3 ) operations.
So we need the action of H eff on an arbitray vector Y . We first transform the matrix Y to a tensor B in the usual way. Then we need all different contributions that pop up in a matrix element of the form Φ p (B )| H |Φ p (B) , i.e. similarly to the expression (188), we need all different orientations of the nearest-neighbour operator of the hamiltonian, the input B tensor and an output. Because we are confronted with a two-site operator here, the expressions are a bit more cumbersome. Let us again define the following partial contractions and which we use for determining and These partial contractions allow us now to implement the action of the effective energy matrix on a given input vector B as (ω, X) ← Arnoldi(X → H eff (p)X,'sr',N ,δ/10) call the function EffectiveH below 8: return ω, X 9: end procedure return Y 17: end procedure In the last step, we need the action of H eff (p) (without the tilde), so we need to perform the last contraction All contractions above have a computational complexity of O(D 3 ). By solving this eigenvalue equation for all momenta, we obtain direct access to the full excitation spectrum of the system. Note that the eigenvalue equation has D 2 (d − 1) solutions, but only the few lowest-lying ones have a physical meaning. Indeed, for a given value of the momentum, one typically finds a limited number of excitations living on an isolated branch in the spectrum, whereas all the other solutions fall within the continuous bands. It is not expected that these states are approximated well with the quasiparticle ansatz. The accuracy of the approximation can be assessed by computing the energy variance -just as we did with the ground state in Sec. 4.3 -but, for an excitation this is an involved calculation [44].

Dynamical correlations
As we have mentioned before, the excitation spectrum determines the dynamical correlation functions or spectral functions. We will use the following definition of the spectral function: where the time-evolved operator O α n (t) = e iHt O α n (0)e −iHt is introduced. By projecting the time evolution on all excited states of H, we obtain the following representation where γ labels all excited states of the system with excitation energies E γ − E 0 . Let us now take only the one-particle excitations into account (the excitations corresponding to isolated branches in the excitation spectrum), for which we know that they can be described by the excitation ansatz. For these states, which have a well-defined momentum, the sum is rewritten as where we have introduced Γ 1 as the set of all isolated branches in the spectrum, R γ as the momentum region where every branch γ exists. Because of translation invariance, we have so that we obtain for the one-particle part of the spectral function where Γ 1 (q) denotes the set of one-particle states in the momentum sector q and ω γ (·) is the excitation energy of that mode. The spectral weights are easily computed. First, we again define the following partial contractions so that we have the following contractions

Topological excitations
The elementary excitations in one-dimensional spin systems are not always of the simple form that we have introduced earlier. In the case of symmetry breaking, where the ground state is degenerate, the elementary excitations are typically kinks or domain walls, i.e. particles that interpolate between the different ground states. These excitations are called topological, as they cannot be created by a local operator acting on one of the ground states. It is not clear that they are local (i.e. that the interpolation between the two ground states is happening over a small region), and, in fact, this is not at all obvious from other approaches such as the Bethe ansatz [47]. One expects, however, that the proof for excitations in the trivial sector in Ref. [35] can be extended to topological excitations, and we can apply the quasiparticle ansatz here as well.
Because it is formulated in the thermodynamic limit directly, our framework can be easily extended to target these topological sectors. 13 Suppose we have a twofold-degenerate ground state, approximated by two uMPS |Ψ(A) and |Ψ(Ã) . The obvious ansatz for a domain wall excitation is 14 i.e. the domain wall interpolates between the two ground states [41]. All the calculations of the previous sections can be repeated in order to determine gauge-fixing conditions, compute expectation values and solve the eigenvalue problem. The only difference concerns the appearance of mixed transfer matrices such asẼ which determines the correlation functions corresponding to string-like operators that interpolate between the two ground states. This matrix has spectral radius smaller than one -otherwise the two ground states would not be orthogonal -such that the geometric sums involving these transfer matrices should be computed with the full inverse. Yet there is one problem with considering topological excitations. Strictly speaking the momentum of the ansatz [Eq. (206)] is not well defined: multiplying the tensorÃ R with an arbitrary phase factorÃ R ←Ã R e iφ shifts the momentum with p ← p + φ. The origin of this ambiguity is the fact that one domain wall cannot be properly defined when using periodic boundary conditions. Physically, however, domain walls should come in pairs. For these states the total momentum is well defined, although the individual momenta can be arbitrarily transferred between the two domain walls. A heuristic way to fix the kink momentum unambiguously is related to the above mixed transfer matrix; it can be imposed that its spectrum be symmetric with respect to the real axis. This will give rise to a kink spectrum that is symmetric in the momentum. This problem disappears, as we will see, when considering excitations with two topological particles.

Larger blocks
There is no guarantee that the variational energies converge to the exact excitation energy of the full hamiltonian, even for a clearly isolated excitation branch. The reason is that the effect of physical operators of growing size cannot always be reproduced by the excitation ansatz, even by growing the bond dimension. This can pose a problem for one-particle excitations that are very wide, because e.g. they are very close to a scattering continuum in the spectrum.
The excitation ansatz can be systematically extended, however, in order to capture larger and larger regions. Instead of inserting a one-site tensor, one can introduce larger blocks, which leads to the ansatz [15,35] In principle this approach is guaranteed to converge to the exact excitation energy -assuming the ground state energy is converged -but the number of the variational parameters in the big B tensor grows exponentially in the number of sites, so that, practically, this becomes infeasible quickly.
The same gauge freedom is present for these larger blocks, and the same gauge conditions can be imposed. The left gauge condition reads and can be enforced by going to the effective parametrization of the B tensor where X s 2 ,...,s M is a (D(d − 1) × d × · · · × d × D) tensor containing all variational parameters. With this effective parametrization, the overlap of states again reduces to the Euclidean norm on the tensor X, and the variational optimization reduces to an eigenvalue problem. For further details of this implementation we refer to Ref. [49].

Two-particle states
The excitations that were introduced in the previous section can be naturally interpreted as particles living on a strongly-correlated background state, and we can ask the question as to how to describe the interactions between these effective particles [42,44,50]. As an answer to that question, in this section we show how to construct two-particle states and how to compute the two-particle S matrix. We will start from a one-particle spectrum consisting of a number of different types of particles, labelled by α, with dispersion relations ∆ α (p). In the thermodynamic limit, constructing the two-particle spectrum is trivial: the momentum and energy are the sum of the individual momenta and energies of the two particles. The two-particle wavefunction, however, depends on the particle interactions. These depend on both the hamiltonian and the ground state correlations, and are reflected in the wavefunction in two ways: (i) the asymptotic wavefunction has different terms, with the S matrix elements as the relative coefficients, and (ii) the local part of the wavefunction.

Variational ansatz
In order to capture both effects of the interactions on the wavefunction, we introduce the following ansatz for describing states with two localized, particle-like excitations with total momentum P where the basis states are We collect the variational coefficients either in one half-infinite vector C with C j,n = c j (n) or using the finite vectors c(n) with entries {c j (n), j = 1, . . . , L n } for every n = 0, 1, . . .. Here, we Note that the sum in Eq. (211) only runs over values n ≥ 0, because a sum over all integers would result in an overcomplete basis. At this point, we will reduce the number of variational parameters to keep the problem tractable. The terms with n = 0 (corresponding to the basis vectors in Eq. (212)) are designed to capture the situation where the two particles are close together. No information on how this part should look like is available a priori, so we keep all variational parameters c j (0), j = 1, . . . , L 0 = D 2 (d − 1). The terms with n > 0 corresponding to the basis vectors in Eq. (213) represent the situation where the particles are separated. We know that, as n → ∞, the particles decouple and we should obtain a combination of one-particle solutions. With this in mind, we restrict the range of j 1 and j 2 to the first basis tensors {B (i) , i = 1, . . . , }, which can be chosen so as to capture the momentum dependent solutions of the one-particle problem. Consequently, the number of basis states of Eq. (213) for n > 0 satisfies L n = 2 , which we will henceforth denote as just L.
This might seem like a big approximation for small n: when the two particles approach, their wavefunctions might begin to deform, so that the B tensors that were obtained as solutions for the one-particle problem, no longer apply. Note, however, that the local (n = 0) and non-local (n > 0) part are not orthogonal, so that the local part is able to correct for the part of the non-local wavefunction where the one-particle description is no longer valid.
As the state (211) is again linear in its variational parameters C, optimizing the energy amounts to solving a generalized eigenvalue problem with ω the total energy of the state and (H eff,2p (P )) n j ,nj = χ P,j (n )| H |χ P,j (n) (215) (N eff,2p (P )) n j ,nj = χ P,j (n )|χ P,j (n) two half-infinite matrices. They have a block matrix structure, where the submatrices are labelled by (n , n) and are of size L n × L n . The computation of the matrix elements is quite involved and technical -each element contains three infinite sums with each term containing two B tensors in both ket and bra layer -so we refer to Ref. [49] for the explicit formulas.
Since the eigenvalue problem is still infinite, it cannot be diagonalized directly. Since we actually know the possible energies ω for a scattering state with total momentum P (it follows from the one-particle energies), we can also interpret Eq. (214) as an overdetermined system of linear equations for the coefficients C j,n = c j (n). In the next two sections we will show how to reduce this problem to a finite linear equation.

Asymptotic regime
First we solve the problem in the asymptotic regime, where the two particles are completely decoupled. This regime corresponds to the limit n , n → ∞, where the effective norm and hamiltonian matrices, consisting of blocks of size L × L, take on a simple form. Indeed, if we properly normalize the basis states, the asymptotic form of the effective norm matrix reduces to the identity, while the effective hamiltonian matrix is a repeating row of block matrices centred around the diagonal (H eff,2p (P )) n ,n → A n−n , n, n → ∞.
The blocks decrease exponentially as we go further from the diagonal, so we can, in order to solve the problem, consider them to be zero if |n − n | > M for a sufficiently large M . In this approximation, the coefficients c(n) obey From the special structure of the blocks A m [49] and their relation to the effective one-particle hamiltonian H eff (p), we already know a number of solutions to Eq. (219). Indeed, if we can find Γ combinations of two types of particles (α, β) with individual momenta (p 1 , p 2 ) such that P = p 1 + p 2 and ω = ∆ α (p 1 ) + ∆ β (p 2 ), then the polynomial eigenvalue problem will have 2Γ solutions µ on the unit circle. These solutions take the form µ = e ip 2 and the corresponding eigenvectors are given by where u α (p) is a vector corresponding to the one-particle solution of type α with momentum p with respect to the reduced basis {B (i) , i = 1, . . . , } (in the case of degenerate eigenvalues we can take linear combinations of these eigenvectors that no longer have this product structure). Every combination is counted twice, because we can have particle with momentum p 1 on the left and momentum p 2 on the right, and vice versa. Moreover, since A † m = A −m , the number of eigenvalues within and outside the unit circle should be equal. This allows for a classification of the eigenvalues µ as The last eigenvalues with modulus bigger than one are not physical (because the corresponding c(n) ∼ µ n i v i yiels a non-normalizable state) and should be discarded. The 2Γ eigenvalues with modulus 1 are the oscillating modes discussed above; we will henceforth label them with γ = 1, . . . , 2Γ such that µ = e ipγ (p γ being the momentum of the particle of the right) and the corresponding eigenvector is given by Finally, the first eigenvalues are exponentially decreasing and represent corrections when the excitations are close to each other. We henceforth denote them as e −λ i with Re(λ i ) > 0 for i = 1, . . . , LM − Γ and denote the corresponding eigenvectors as w i . With these solutions, we can represent the general asymptotic solution as Of course, we still have to determine the coefficients {q i , r γ } by solving the local problem.
Solving the full eigenvalue equation This implies that we can safely insert the asymptotic form for n > N in the wavefunction, which we can implement by rewriting the wavefunction as where The {e −λ i n w i } and {e −ipγ n v γ } are the vectors corresponding to the damped, resp. oscillating modes, while the identity matrix is inserted to leave open all parameters in c(n) for n ≤ N . The number of parameters in x is reduced to the finite value of D 2 (d − 1) + N L + LM + Γ. Since the equation is automatically fulfilled after M + N rows, we can reduce H eff,2p and N eff,2p to the first rows, so we end up with the following linear equation with This 'effective scattering matrix' consists of the first (M + N ) × (M + N ) blocks of the exact effective hamiltonian and norm matrix and the A matrices of the asymptotic part [Eq. (217)] to make sure that these matrices remain the truncated versions of a hermitian problem. This matrix has D 2 (d − 1) + (N + M )L rows, which implies that the linear equation (228) has Γ exact solutions, which is precisely the number of scattering states we expect to find. Every solution consists of a local part (D 2 (d − 1) + N L elements), the LM − Γ coefficients q of the decaying modes and the 2Γ coefficients r of the asymptotic modes.

S matrix and normalization
After having shown how to find the solutions of the scattering problem, we can now elaborate on the structure of the asymptotic wavefunction and define the S matrix. We start from Γ linearly independent scattering eigenstates |Υ i (P, ω) (i = 1, . . . , Γ) at total momentum P and energy ω with asymptotic coefficients r i (P, ω). The asymptotic form of these eigenstates is thus a linear combination of all possible non-decaying solutions of the asymptotic problem: where the coefficients are obtained from solving the local problem. The number of eigenstates equals half the number of oscillating modes that appear in the linear combination. With every oscillating mode γ we can associate a function ω γ (p) giving the energy of this mode as a function of the momentum p γ of the second particle at a fixed total momentum P . If γ corresponds to the twoparticle mode with particles α γ and β γ , this function is given by ω γ (p) = ∆ αγ (P −p)+∆ βγ (p). The derivative of this function, which will prove of crucial importance, is ω γ (p) = ∆ βγ (p) − ∆ αγ (P − p). It can be interpreted as the difference in group velocity between the two particles, i.e. the relative group velocity in the center of mass frame. Much like the proof of conservation of particle current in one-particle quantum mechanics, it can be shown [49] that, if (230) is to be the asymptotic form of an eigenstate, the coefficients r γ i (P, ω) should obey γ r γ i (P, ω) 2 dω γ dp (p γ ) = 0.
This equation can indeed be read as a form of conservation of particle current, with ω γ (p γ ) playing the role of the (relative) group velocity of the asymptotic mode γ. As any linear combination of eigenstates with the same energy ω is again an eigenstate, this relation can be extended to γ r γ j (P, ω)r γ i (P, ω) dω γ dp (p γ ) = 0.
With this equation satisfied, we can define the two-particle S matrix S(P, ω). Firstly, the different modes are classified according to the sign of the derivative: the incoming modes have dω dp > 0 (two particles moving towards each other), the outgoing modes have dω dp < 0 (two particles moving away from each other), so that we have γ∈Γ in r γ j (P, ω)r γ i (P, ω) dω γ dp (p γ ) = γ∈Γout r γ j (P, ω)r γ i (P, ω) dω γ dp (p γ ) .
If we group the coefficients of all solutions in (square) matrices R in (P, ω) and R out (P, ω), so that the i'th column is a vector with the coefficients r γ i for the in-and outgoing modes of the i'th solution, we can rewrite this equation as with V in,out (P, ω) ij = δ ij dωγ dp (p γ ) 1/2 a diagonal matrix. As R in (P, ω) and R out (P, ω) should be connected linearly, we can define a unitary matrix S(P, ω) as V out (P, ω)R out (P, ω) = S(P, ω)V in (P, ω)R in (P, ω).
This definition corresponds to the S matrix that is known in standard scattering theory. Note, however, that S(P, ω) is only defined up to a set of phases. Indeed, since the vectors v γ [Eq. (224)] can only be determined up to a phase, the coefficient matrices R in and R out are only defined up to a diagonal matrix of phase factors. These arbitrary phase factors show up in the S matrix as well. In the case of elastic scattering of two identical particles the phase can be fixed heuristically; in the case where we have different outgoing channels only the square of the magnitude of the S-matrix elements is physically well-defined. This formalism allows to calculate the norm of the scattering states in an easy way. Indeed, the general overlap between two scattering states is given by The δ factor for the momenta p γ is obviously only satisfied if ω = ω , so we can transform this to a δ(ω − ω ). Moreover, if p γ (ω) = p γ (ω ) for γ = γ , then necessarily v † γ v γ = 0, so we can reduce the double sum in γ, γ to a single one. If we omit all finite parts, we have With the R in/out as defined above the overlap reduces to

Two-particle contribution to spectral functions
Similar to the one-particle contributions to the spectral functions, we can now compute the two-particle contribution as well. The projector on the two-particle subspace can be written as dP 2π dω 2π where Γ 2 is the set of all types of two-particle states at that momentum-energy. Here we have orthonormalized the two-particle states as Υ γ (P , ω )|Υ γ (P, ω) = 4π 2 δ(P − P )δ(ω − ω)δ γγ .
The two-particle contribution to the spectral function is then given by As compared to the one-particle contribution, this function is continuous (no δ peaks) in the momentum-energy region where two-particle states exist. The expressions for the spectral weights can be found in Ref. [44].

Bound states
Above we have seen how the one-particle ansatz can be extended to larger blocks in order to describe very broad excitations, a situation that arises when a bound state forms out of a two-particle continuum. We could, however, study these bound states with the two-particle ansatz as well. Specifically, the formation of a bound state out of a two-particle continuum should correspond to a continuous deformation of a two-particle wavefunction into a very broad, yet localized one-particle wavefunction. As the asymptotic part of the two-particle wavefunction is supposed to vanish in this process, we expect a non-analyticity in the S matrix -in particular, the scattering length diverges as the bound state forms [44].
In contrast to a scattering state the energy of a bound state is not known from the one-particle dispersions, so that we will have to scan a certain energy range in search of bound state solutionsof course, with the one-particle ansatz we can get a pretty good idea where to look. A bound state corresponds to solutions for the eigenvalue equation with only decaying modes in the asymptotic regime. In principle we should even be able to find bound-state solutions within a continuum of scattering states (i.e. a stationary bound-state, not a resonance within the continuum) by the presence of additional localized solutions for the scattering problem.

Transfer matrices and fixed points
Matrix product states have been used extensively as variational ansatz for ground states of local hamiltonians, but in the last years it has been observed that they can also provide accurate approximations for fixed points of transfer matrices. One-dimensional transfer matrices pop up whenever we want to contract two-dimensional tensor networks, which occur naturally in the context of two-dimensional classical many-body systems as representations of partition functions and can represent ground states and real-time evolution of one-dimensional quantum systems, e.g. for systems with local interactions in terms of Trotter-Suzuki decompositions. Additionally, they occur in the context of projected entangled-pair states (PEPS), the two-dimensional version of matrix product states. [51] The contraction of a two-dimensional tensor network using MPS methods goes back to the corner transfer matrix of Baxter [52,53] and the work of Nishino and Okunishi on classical partition functions in two dimensions [54,55]. Ten years later these works led to contraction algorithms based on the time-evolving block decimation [17] or the corner transfer matrix renormalization group [56]. Complementary to these approaches, in this section we formulate tangent-space methods for one-dimensional transfer matrices [57].
A one-dimensional transfer matrix in the form of matrix product operator (MPO) [12,58] is written as . . . |i n−1 j n−1 | ⊗ |i n j n | ⊗ |i n+1 j n+1 | . . . , (244) or represented diagrammatically as Whenever we contract an infinite two-dimensional tensor network, we want to find the fixed point of this operator, i.e. we want to solve the fixed-point equation We now make the ansatz that the fixed point (leading eigenvector) of this operator is an MPS, such that it obeys the eigenvalue equation Let us first try to find a way to properly define this eigenvalue equation. Suppose we have indeed found an MPS representation |Ψ(A) of the fixed point of T (O), then the eigenvalue is given by In order to determine Λ, we bring |Ψ(A) in the mixed canonical form, such that Contracting this infinite network requires that we find F L and F R , the fixed points of the left and right channel operators and T L and T R , which are represented diagrammatically as The fixed points F L and F R are normalized such that The eigenvalues λ L and λ R are necessarily the same value λ, so that Λ is given by where N is the diverging number of sites. From a physical point of view, it is the 'free energy density' f = − 1 N log Λ = − log λ that is the most important quantity. In the case that we want to normalize the MPO, such that the leading eigenvalue is equal to one (or f = 0), we can just divide by λ: O → O/λ.

The vumps algorithm for MPOs
The next step towards an algorithm [57,59] is stating an optimality condition for |Ψ(A) such that it can serve as an approximate eigenvector of T (O). Inspired by all the above tangent-space algorithms, we will require that the projection of the residual onto the tangent space is zero: In the mixed canonical form, the tangent-space projector consists of two parts, yielding or using the left and right fixed points or in terms of the tensor and O C : Now the condition for having an optimal MPS representation is equivalent to having G = 0. Together with the consistency conditions, a fixed point is thus characterized by the set of equations In Sec. 4.4 we have seen how the vumps algorithm finds the fixed point iteratively. In every iteration of the algorithm, we (i) start from a given MPS {A i L , A i R , A i C , C i }, (ii) determine F L and F R , (iii) solve the two eigenvalue equations obtaining A C and C , and (iv) determine the A i+1 The vumps algorithm for MPOs is summarized in Alg. 8.

Excited states of an MPO
We can also apply the excitation ansatz to compute 'excitations' of a transfer matrix [51,60,61]. The algorithms for computing dispersion relations are quite similar to the case of hamiltonians, which we have studied extensively. In a first step, we renormalize the MPO such that the eigenvalue λ of the fixed point equation equals one. Then we use the excitation ansatz, until δ < η 12: return λ, A R 13: end procedure to find the subleading eigenvectors. Again, we take recourse to the effective parametrization such that optimizing the variational parameters boils down to solving the eigenvalue equation, with the effective transfer and normalization matrix defined as In order to solve this eigenvalue eqation iteratively, we need the action of T eff (p) on a general input vector X. First we compute the tensor B(X), and define the partial contractions and where the channel operators are defined as Again -if everything is properly normalized -these operators have a leading eigenvalue equal to one (with F L and F R as fixed points), so they should be regularized in order to define the inverses at momentum p = 0. The action ofT eff (p) on the tensor B is then given bỹ In the last step, we need the action of T eff (p) (without the tilde), so we need to perform the last contraction Upon solving this eigenvalue equation for all momenta, we find the dispersion relation of the transfer matrix. The largest eigenvalue defines the gap of the transfer matrix, and is related to the correlation length of the two-dimensional tensor network [61]. The momentum at which this gap is reached defines the pitch vector of the correlations, and possibly indicates incommensurate correlations in the two-dimensional tensor network.

Continuous matrix product states
In this section we show that the tangent-space framework for uniform MPS can be extended to the case of continuous field theories. For the sake of simplicitly, we explain this in detail for onecomponent Bose gases, and we work out the explicit equations for the Lieb-Liniger hamiltonian. This set-up can be easily extended -with a large notational overhead -to multi-component gases, fermions [62] and hamiltonians with superconducting terms [63] and exponentially-decaying interactions [64]. Continuous matrix product states were originally introduced [65] as the continuum limit of a particular subset of MPS, chosen so as to obtain a limiting state with valid physical properties. We approximate the one-dimensional continuum by a chain with lattice spacing a and send a → 0. For simplicity, we restrict to a system containing a single flavor of bosonic particles, i.e. spinless bosons (we refer to e.g. [62] for the more general case). The local basis on each site n on the chain consists of |0 n (empty site) and |k n = 1 √ k! (b † n ) k |0 n (k ≥ 1 bosons). In order to obtain a state with a finite density of particles in the continuum limit, the probability of detecting k particles on a site will have to scale as a k . This quickly leads to the following parameterization where the states corresponding to k > 1 are for most purposes irrelevant; the corresponding MPS matrices are completely determined in terms of A 1 , i.e. in terms of the matrix R. If we now take the continuum limit a → 0 and identify the bosonic creation operator on the site with a bosonic field operator asψ † (na) =b n / √ a, we obtain (through a Taylor expansion of the path-ordered exponential) An alternative approach to obtain cMPS as a continuous measurement process [66], whereby the physical degrees of freedom correspond to the field that leak out of a zero-dimensional cavity, which plays the role of the ancilla system and thus had D internal levels. This interpretation also has a clear holographic interpretation, which provides on possible avenue towards higher-dimensional generalizations.

Gauge transformations and canonical forms
Just like for MPS, we focus on the case of translation invariant cMPS in the thermodynamic limit throughout these lecture notes, which are described by position-independent (i.e. uniform) matrices Q and R. We start by computing the norm of a uniform cMPS, which is determined by the transfer matrix This expression is related to the MPS transfer matrix as and the properties of T can be obtained from this correspondence. In the generic (injective case), the eigenvalue λ 1 with largest real part is non-degenerate and purely real; its corresponding left and right eigenvector should correspond to positive definite matrices l and r. The norm of the cMPS is given by which implies that, in order to have a properly normalized cMPS in the thermodynamic limit, the unique eigenvalue λ 1 of T with largest real part should be zero. If this is not the case, the cMPS needs to be rescaled, which amounts to shifting Q with the identity as Q → Q − λ 1 2 1. The corresponding left-and right eigenvectors then obey the equations and all other eigenvalues of T now have a strictly negative real part. Under these conditions, the (path-ordered) exponential of the transfer matrix 15 reduces to a projector on the fixed points. If we also make sure that the overlap of the boundary vectors v † L and v R with these fixed points are unity, then the uniform cMPS is properly normalized Ψ(Q,R)|Ψ(Q, R) = 1.
The parametrization of the cMPS in terms of matrices Q and R is not unique, because gauge transformations of the form leave the cMPS invariant. This gauge freedom in Q and R can be used to find canonical forms. Choosing g −1 = C L where l = C L C † L brings the cMPS matrices into left-canonical form, where the new left fixed point is the unit matrix, i.e.
Similarly, using g = C R where r = C R C † R we obtain the right-canonical form, where the right fixed point is the identity matrix as expressed by Again, we can combine both canonical forms in order to arrive at a mixed canonical form, where an extra matrix C is introduced linking the two By diagonalizing the matrix C = U SV † we arrive at a Schmidt decomposition of the state where we have redefined The fidelity between two different normalized cMPS |Ψ(Q 1 , R 1 ) and |Ψ(Q 2 , R 2 ) is computed similarly as before. Indeed, the overlap is given by with the mixed transfer matrix The fidelity is determined by the eigenvalue λ with largest real part of T 12 , which should have a real part smaller than or equal to zero if both individual cMPS are properly normalized. The total fidelity then corresponds to zero or one respectively, so that it makes more sense to define Reλ itself as the logarithmic fidelity density, or to define such that the overlap on a segment of length l scales as f l .

Evaluating expectation values
After having introduced the class of uniform cMPS, we show how to use them in actual calculations.
All expectation values involve field operators, so the first step consists of finding an expression for the action of a field operator on a cMPS. All of the results below are obtained by using the following identity for computing the commutator between a general operator and a path-ordered Applying this approach to the bosonic field operatorÔ =ψ(x), and choosingÂ(x The expectation value of the field operator is, therefore, given by Similarly, we find for the expectation value of the density operator Acting with a second field operator on the same location just brings down a second matrix R, so that we obtain for a contact interaction By acting with field operators at different locations, we can compute correlation functions. The field-field correlation function is given by (we assume x < y) and the density-density correlation function These expressions clearly show that a cMPS necessarily exhibits exponential decay of correlations. Indeed, if we split off the fixed-point projector from the transfer matrix (assuming a properly normalized cMPS with λ 1 = 0), we obtain so the second eigenvalue λ 2 (sorted by largest real part) of the transfer matrix determines the correlation length as ξ = −1/Re(λ 2 ); the imaginary part of the subleading eigenvalues again determine the oscillations in the correlation function [13].
so that we obtain the momentum distribution function n(p) as Using the above expression for the real-space correlation function, we obtain In order to further work out this expression, we define a regularized transfer matrix by splitting off the fixed point projector of the exponentiated transfer matrix (see Eq. (296)); this allows to compute the integral and to compute the momentum distribution function n(p) = 2πδ(p)(l| 1 ⊗R |r)(l| (R ⊗ 1) |r) Here, the δ-function contribution signals long-range order, in particular, associated with the condensation of the bosonic particles in the ground state. More advanced expectation values involve derivatives of field operators. Therefore, we differentiate the above expression for the action ofψ(x) on a cMPS [Eq. (289)] with respect to x, Using the equations we obtain 16 Computing the action of (T ± ip) P on a vector (to the left or right) efficiently requires to use a iterative linear solver. When 'p=0', nothing needs to be done in principle to eliminate the contribution of the zero eigenvalue, as any contribution that would be generated is immediately killed by acting with the operator upon constructing the Krylov subspace. In the more general case, it is useful to explicitly project out any contribution in the subspace of eigenvalue zero, using 1 − |r)(l|. In this expression the cancellation of the term with a creation operator is automatically obeyed, but this is not the case for cMPS with multiple species of bosons and/or fermions; in the more general case, additional regularity conditions on the R matrices have to imposed in order to obtain a finite kinetic energy (see Ref. [62]). The expectation value of a kinetic energy density term is given by

Tangent vectors
We introduce a tangent vector in the uniform gauge as where we have again used the notation For finding a proper parametrization of the tangent space, we first compute the overlap between two tangent vectors, This expression is further worked out using the above inversion of the transfer matrix.
Φ(V ,W )|Φ(V, W ) = 2πδ(0) (l|W ⊗W |r) The diverging prefactor corresponds to the infinite system size and originates from the fact that the tangent vectors represent momentum zero plane waves, that cannot be normalized to one. The additional divergence in the square brackets can be traced back to the possible overlap with the ground state, and vanishes if orthogonality to the ground state is enforced as The gauge freedom in the cMPS parametrization induces a redundancy in the parametrization of the states |Φ p (V, W ) , i.e. these states are invariant under the additive gauge transformation V ← V + Q L X − XQ R + ipX and W ← W + R L X − XR R for an arbitrary matrix X. This gauge freedom can be used to choose a parametrization that allows us to omit the non-local terms in the expressions above, e.g. by restricting to representations (V, W ) that satisfy This condition is henceforth referred to as the left gauge condition; it is typically used in combination with a left canonical choice for Q and R such that l = 1 and we simply have V = −R † W . Similarly one can choose instead a right gauge condition V = −W rR † r −1 , which simplifies in the case of a right canonical cMPS with r = 1.
Before proceeding, we generalize the definition of tangent vectors to which contain a boost so as to represent a momentum eigenstate with momentum p, and wherê U 1 andÛ 2 are defined in terms of two different pairs of matrices Q 1 , R 1 and Q 2 , R 2 , respectively. We can work in a mixed gauge by using Q 1 = Q L , R 1 = R L and Q 2 = Q R , R 2 = R R , or even Q 2 =Q R and R 2 =R R when there is a second ground state available and we want to target a non-trivial topological sector. Still using the parameterization V = −l −1 1 R † l 1 W with l 1 the left fixed point of the transfer matrix of Q 1 and R 1 , we obtain the local expression where r 2 is the right fixed point of the transfer matrix of Q 2 , R 2 . For both the time-dependent variational principle and for the quasiparticle ansatz, it is useful to know how an annihilation operator acts on a tangent vector, The same can be done for two annihilation operatorsψ(z),ψ(y) where we assume z ≤ y, For the kinetic energy we need to know how dψ(y) dy acts on a tangent vector, i.e. we have to take the derivative of the above equation One can check that a number of potentially problematic (infinite norm) terms which have a creation operatorψ † (y) at the fixed position y all nicely cancel.

Ground-state optimization and time-dependent variational principle
The time-dependent varational principle for cMPS can be obtained in a similar way as for MPS. Again, we restrict to uniform cMPS and translation invariant hamiltonians; we refer to Ref. [67] for the more general case. Starting from a uniform cMPS with time-dependent matrices Q(t) and R(t), we obtain for the left-hand side of the Schrödinger equation i.e. a tangent vector (momentum zero) with V = iQ and W = iṘ. The TDVP prescribes to chooseQ andṘ such that |Φ(iQ, iṘ; Q, R) − H |Ψ(Q, R) 2 is minimized. Let us first compute the general overlap Φ(V, W ; Q, R)|Ĥ|Ψ(Q, R) , where we take as an example the Lieb-Liniger hamiltonianĤ We define the quantities To solve the optimization problem we also need Φ(V, W )|Φ(V, W ) . We now exploit the gauge invariance in the cMPS manifold, which enables us to choose V = −l −1 R † lW , which simplifies the latter (and also makes the first term on the second line of Eq. (324) vanish. The minimum is then obtained by setting the derivative with respect toW equal to zero, resulting in or thus, by also using the defining equations of l and r, Note that, when the cMPS is itself in the left canonical gauge Q L + Q † L + R † L R L = 0 and l = 1, we can parameterise Q L = iK L − 1/2R † L R L , with K L a Hermitian matrix. The time derivativė Q L = −R † LṘ L is compatible with preserving this canonical form at all times, and can be cast into a direct equation for the time derivative ofK L as These first-order coupled differential equations can then be solved using standard ODE solvers. By replacing t → −iτ , we can evolve in imaginary time and obtain an algorithm to converge a random cMPS to the ground state, as was first used in Ref. [68]. Indeed, as in the MPS case, the right hand side of the TDVP equation is essentially the tangent-space gradient, and as such imaginary-time evolution effectively amounts to a continuous gradient descent.
Let us now, in the spirit of established MPS algorithms, try to formulate a mixed gauge approach. The starting point is to approximate H |Ψ using the more general formulation of tangent vectors where the left canonical matrices Q L , R L and the right canonical matrices Q R , R R are related by a gauge transform C. Let us now also define Q C = Q L C = CQ R and R C = R L C = CR R . Furthermore, we redefine with T L L = Q L ⊗ 1 + 1 ⊗Q L + R L ⊗R L and similarly for T R R . We now define F (V, W ) as and find Because of the gauge transformation, there are many equivalent ways of writing this expression. We have chosen to position C (or R C or Q C ) in the cMPS ket state in such a way that it coincides with the position of V and W in the bra state. Gauge invariance still enables us to choose a gauge condition for V and W , which could now be (1|(C ⊗V + R C ⊗W ) = 0 or (C ⊗V where Now we need to relate V and W to an update of the cMPS matrices. In the mixed gauge representation of tangent vectors, we can identify It thus makes sense to identify because of the final identity, which can easily be verified using the definitions of the various quantities involved. The final equations then become where, in the last equation, we used some more algebra (substituting definitions). These equations can then be integrated for a small time step, after which a new Q L and R L (and corresponding Q R and R R ) need to be extracted from the updated C, R C and Q C . For finding the best cMPS ground state approximation of a given Hamiltonian, we can evolve according to these equations in imaginary time, i.e. setting t → −iτ . Indeed, a variational optimum is characterized by the right hand side of the above equations becoming zero. Nonetheless, imaginary time evolution is not necessarily the fastest way to approach the variational optimum, in particular for systems near or at criticality. In the case of MPS, the VUMPS algorithm can be understood as being obtained from imaginary time TDVP by promoting the evolution equations to eigenvalue equations for the center site, and then taking bigger steps corresponding to the replacing the center site by the lowest eigenvector of that effective hamiltonian. In the case of cMPS, this is less clear. Indeed, because the cMPS ansatz is not simply a multilinear functional of the different Q(x) and R(x), it cannot be expected that such an interpretation as eigenvalue problem exists. An alternative approach that starts from the center site point of view was proposed and investigated in Ref. [69], and was found to work quite well.

Quasiparticle ansatz
Finally, we can also apply the MPS quasiparticle ansatz to the continuous field-theory setting, as was first explored in Ref. [63]. Indeed, we have already provided a generalized definition for a "boosted" tangent vector |Φ p (V, W ; Q 1 , R 1 , Q 2 , R 2 ) with good momentum quantum number p in Eq. (315). For a topologically trivial excitation, we will use the mixed gauge by setting Q 1 = Q L , R 1 = R L and Q 2 = Q R , R 2 = R R . In case of symmetry breaking, we can construct domain wall excitations by using the (right canonical) cMPS matrices of a different ground state for Q 2 and R 2 . For simplicity, we restrict to the topologically trivial case below, though the topologically non-trivial case is completely analogous.
We still have gauge freedom V → V + Q L X − XQ R + ipX and W → W + R L X − XR R 17 , which we use to parameterize V = −R † L W (or alternatively V = −W R † R ). The overlap between the two ansatz wavefunctions is then given by The physical norm reduces to the Euclidean norm on the parameters in W . Furthermore, the gauge condition ensures that the excited state is always orthogonal to the ground state. This implies that the variational optimization of the ansatz wavefunction, reduces to an ordinary eigenvalue problem where upon integration the contribution of X at +∞ or −∞ is irrelevant and both terms therefore cancel.
In order to implement this eigenvalue problem, we need to find an expression for the expectation value of the hamiltonian, for which we can use the expressions derived in Sec. 8.3. Again, we restrict ourselves to the terms in the Lieb-Liniger hamiltonian. The expectation value of the density operator is given as whereas the interaction energy is and the kinetic energy term Here, one always needs to insert V = −R † L W . Furthermore, we have defined the mixed transfer matrices T L R = Q L ⊗ 1 + 1 ⊗Q R + R L ⊗R R and vice versa for T R L , on top of the transfer matrices T L L and T R R that we defined in the previous section. Note that for all of these, the left and right eigenvectors of zero eigenvalue are easy combinations of 1, C and C † : T L L has left eigenvector 1 and right eigenvector CC † , for T R R we have C † C and 1 as left and right eigenvector. T R L has C and C † as left and right eigenvector, whereas T L R has C † and C as left and right eigenvector. These are needed to compute the "pseudo-inverses" using an iterative linear solver, as explained above.

Outlook
In these lecture notes we have explained the most important tangent-space methods for uniform matrix product states in full detail. Yet, this picture is far from complete, and many new applications are still to be expected in the near future -these lecture notes should in the first place be read as an invitation to further develop the framework. In this last section, we give a short overview of some of the topics that we have omitted in the main text, as well as the most exciting open directions.

Symmetries, fermions, larger unit cells and finite size
In the above we have exclusively dealt with translation-invariant matrix product states in the thermodynamic limit without any constraints.
First of all, in many spin chains translation invariance is spontaneously broken, where the ground state is invariant only under translations over a larger number of sites. The correct variational MPS is constructed by periodically repeating the same multi-site unit cell of tensors {A 1 , A 2 , . . . , A N . For example, an MPS with three-site unit cell is written down as Just as for uniform MPS, we can find canonical forms, develop ground-state optimization algorithms, and implement a quasiparticle excitation ansatz. For all details, we point the reader to Refs. [16,43]. Another extension of the above framework consists of implementing global symmetries of the ground state on the level of the MPS tensor. This strategy is made possible by the fundamental theorem of MPS, according to which we know that if an MPS is symmetric under a global symmetry operation, the MPS tensor itself transforms under this operation: This implies that the virtual degrees of freedom in the MPS transform under a projective representation of the symmetry group, a property that has led to the classification of symmetryprotected topological phases in one dimension using MPS [70,71]. On the numerical level this property can be exploited to great advantage. Indeed, this symmetry property imposes a sparseness for the MPS tensor (if it is written in the correct basis), and therefore a more efficient representation of the symmetric state itself. Moreover, symmetries in the MPS tensor can also be used to label the quasiparticle ansatz with specific quantum numbers. For all details, we point the reader to Ref. [43].
The traditional method to simulate fermionic chains using MPS consists of first mapping the system to a bosonic spin chain using a Jordan-Wigner transformation. However, the uniform MPS framework allows to parametrize fermionic states on the chain directly using the formalism of super vector spaces [72]. This formalism allows to translate all of the above methods for describing interacting fermions on a chain as well.
Finally, many of the above methods have a counterpart for finite systems with open boundary conditions and without translation invariance. In that case, each tensor in the MPS is different. In particular, the time-dependent variational principle can be nicely formulated on a finite chain, allowing for simulating time evolution with arbitrary hamiltonians [27]. On a finite system momentum is no longer a good quantum number, such that the quasiparticle ansatz does not have an analog on a finite chain. Tangent-space methods can also be formulated on systems with periodic boundary conditions [73], but, just like all MPS methods, suffer from a higher computational cost.

Real-time evolution with conserved quantities
In Sec. 5 we have seen that the TDVP respects the conservation of energy during the time evolution, as well as other conserved quantities that commute with the tangent-space projector. This property, which is not shared by other time-dependent MPS algorithms, has been exploited recently [74] to capture the long-time dynamics of thermalizing spin chains, despite huge truncation errors. Also, we have seen that the TDVP gives rise to an effective classical hamiltonian system with a Poisson bracket, which has recently allowed to relate the dynamics of spin chains to classical chaotic systems [75].
It remains a matter of further research to what extent conserved quantities that are not contained within the tangent space -e.g., the higher conserved quantities in integrable systems -are respected in the time evolution according to the TDVP, and whether a more generalized version can be formulated that takes more and more conserved quantities in account. Also, the relation to hydrodynamic approaches for quantum dynamics remains an important open question.

Many-particle physics on top of an MPS
In Sec. 6 we have shown that the tangent-space framework yields a natural language for describing elementary excitations as interacting quasiparticles on a strongly-correlated MPS background. This result suggests that these quasiparticle excitations are the relevant degrees of freedom for describing the low-energy dynamics in strongly-correlated spin chains. Therefore, we expect that the extension of the framework towards real-time and finite-temperature properties of these spin chains will prove very interesting.
In Refs. [50] and [44] it was shown that the information on the one-particle disperson relation and the two-particle S matrix makes it possibly to apply the formalism of the Bethe ansatz in an approximate way to describe the condensation of magnons in a magnetic field. This approach can be extended to out-of-equilibrium situations as well. Ideally, it would be extremely interesting to develop an interacting many-particle theory (possibly in second quantization) that describe these quasiparticles.

cMPS
As with MPS, cMPS are not restricted to translation-invariant systems and can easily be formulated for inhomogeneous systems, e.g. finite systems with open boundary conditions, systems with periodic boundary conditions [76] or infinite systems with a finite-length unit cell, by making the cMPS matrices Q and R spatially dependent. The differential equation that follows from the TDVP equation in this non-uniform setting can be interpreted as a non-commuting version of the Gross-Pitaevskii equation [67]. However, because Q and R will then depend on a continuous coordinate x, any representation in terms of a finite number of parameters will have to resort to some sort of discretization or exploit a family of basis functions. For example, the use of splines was investigated in Ref. [77].
More stringently, however, is the fact that a good optimization algorithm for such generic cMPS is lacking. This is in sharp contrast to the case of MPS, where DMRG (in one of its modern flavors) is still de facto the most robust and efficient way for optimizing a generic MPS. Ref. [78] has investigated to leverage the robustness of DMRG while trying to construct the continuum limit numerically.
More generally, many of the well-known methods from the MPS toolbox, such as simulations of local quenches, finite temperature, or non-equilibrium situations with dissipation, have no counterpart yet in terms of cMPS. Finally, in the context of tangent-space methods, we have explained how to describe single-particle excitations. Extracting scattering information by constructing variational approximations to two-particle states has so far not been addressed with cMPS, but should be a straightforward generalization of the MPS case. Any further extension that is discussed here in the context of MPS, equally applies to cMPS.

Projected entangled-pair states
The class of uniform matrix product states can be straightforwardly generalized to two dimensions. These states are known as projected entangled-pair states (PEPS) [79] and can be represented as where now the state is fully described by a single five-leg tensor A.
The variational optimization of the PEPS ground-state approximation for a given model hamiltonian is generally taken to be a hard problem. Traditionally, this is done by performing imaginary-time evolution: a trial PEPS state is evolved with the operator e −τ H , which should result in a ground-state projection for very long times τ . This imaginary-time evolution is integrated by applying small time steps δτ with a Trotter-Suzuki decomposition and, after each time step, truncating the PEPS bond dimension in an approximate way. This truncation can be done by a purely local singular-value decomposition -the so-called simple-update [80] algorithm -or by taking the full PEPS wavefunction into account -the full-update [81] or fast full-update algorithm [82]. Although computationally very cheap, ignoring the environment in the simple-update scheme is often a bad approximation for systems with large correlations. The full-update scheme takes the full wavefunction into account for the truncation, but requires the inversion of the effective environment which is potentially ill-conditioned.
Recently, important steps were taken towards the formulation of tangent-space methods in two dimensions. Variational optimization schemes were introduced in Refs. [83,84] that aim to optimize the energy density in the thermodynamic limit directly. In both approaches, an efficient summation of an infinite number of terms was needed in order to compute the gradient, similarly as we have seen in Sec. 4. A generic method for contracting overlaps of tangent vectors -a crucial ingredient in any tangent-space method -was introduced in Ref. [61], and a benchmark of the quasiparticle excitation ansatz was performed [85].