A beginner's guide to non-abelian iPEPS for correlated fermions

Infinite projected entangled pair states (iPEPS) have emerged as a powerful tool for studying interacting two-dimensional fermionic systems. In this review, we discuss the iPEPS construction and some basic properties of this tensor network (TN) ansatz. Special focus is put on (i) a gentle introduction of the diagrammatic TN representations forming the basis for deriving the complex numerical algorithm, and (ii) the technical advance of fully exploiting non-abelian symmetries for fermionic iPEPS treatments of multi-band lattice models. The exploitation of non-abelian symmetries substantially increases the performance of the algorithm, enabling the treatment of fermionic systems up to a bond dimension $D=24$ on a square lattice. A variety of complex two-dimensional (2D) models thus become numerically accessible. Here, we present first promising results for two types of multi-band Hubbard models, one with $2$ bands of spinful fermions of $\mathrm{SU}(2)_\mathrm{spin} \otimes \mathrm{SU}(2)_\mathrm{orb}$ symmetry, the other with $3$ flavors of spinless fermions of $\mathrm{SU}(3)_\mathrm{flavor}$ symmetry.

perform tensor optimization via imaginary-time evolution [Sec. 3.5], including the gauge fixing for iPEPS [46,47]. Additionally, we go beyond the scope of Corboz' work by explaining how arbitrary non-abelian symmetries can explicitly incorporated in the fermionic iPEPS ansatz in a generic manner, based on the QSpace [30] tensor library. A pedagogical discussion of SU (2) iPEPS was recently given in Ref. [48], with benchmarking computations on spin systems reported in Ref. [49]. Our treatment of symmetries represents a fully alternative approach to theirs, which permits us to deal with non-trivial outer multiplicities (OM) on a general footing. While OM is not present for SU (2) for rank-3 tensors, it already also occurs for SU (2) for tensors of rank r > 3. For larger symmetries, such as SU(N > 2), OM already occurs generically even at the elementary level of rank-3 tensors.
A first application of our non-abelian fermionic iPEPS code, published concurrently with this tutorial review, is a study of the 2D fermionic t-J model [50] -by exploiting either U (1) or SU(2) symmetry to allow or forbid spontaneous spin symmetry breaking, we elucidate the interplay between antiferromagnetic order, stripe formation and pairing correlations. In the present work, we further illustrate the power of non-abelian iPEPS by presenting some exemplary results for two 2D fermionic Hubbard models of higher complexity: a model with two degenerate bands of spinful fermions, featuring SU(2) spin ⊗ SU(2) orb symmetry, and a model with three degenerate bands of spinless fermions, featuring SU(3) flavor symmetry.

Tensor network diagrams and convention
As implied by their name, tensor network techniques typically involve a large number of tensors of various rank that are iteratively manipulated. These manipulation steps may vary in their complexity and, for example, include matrix multiplication, or decomposition techniques such as singular value or eigenvalue decompoitions. In order to simplify the lengthy mathematical expressions which describe these steps and typically involve large sums over multiple indices, we heavily rely on using a diagrammatic representation for tensor network states. Analogous to the role of Feynman diagrams in quantum field theories, these tensor network diagrams are pictorial representation of mathematical expressions and help a great deal grasping the essence a TN algorithm. Since we extensively employ this pictorial language in this review, we here give a brief summary of our conventions together with an explanation on how to understand these diagrams in the following.
Each TN diagram consists of one or multiple extended objects (squares, circles, ...), which are connected by lines. Objects and lines represent tensors and indices, respectively. In the following, we give a few simple examples. For instance, a matrix or rank-2 tensor A has two indices α, β, The number of values that an index can take is called its dimension. The next expression, illustrating a matrix multiplication involves the sum over the common index β of A and B. This contraction is indicated by a line connecting A and B.
In addition to the simple expressions shown above, we often have to deal with diagrams containing multiple sums and open indices, such as It holds generally true, that the diagrammatic representation becomes more beneficial, the more complex the expression and the larger the number of tensors involved since the logic of reading and understanding these diagrams remains the same. For more evolved topics, such as fermionic TN descriptions and symmetric TNs, the diagrams will contain extra features. We will introduce these features in detail at the appropriate parts of this review.

Infinite projected entangled pair states
Projected entangled pair states (PEPS) present the natural generalization of the well-known MPS ansatz to higher spatial dimensions [10]. Analogously to their 1D counterpart, a PEPS consists of a set of high-ranked tensors which are connected by virtual bonds along the physical directions of the corresponding lattice system. In addition, PEPS satisfy the area law of the entanglement entropy in two dimensions [51], thus being able to faithfully represent physical states in gapped lattice models. In other words, the computational cost to simulate a lowentangled state using a PEPS scales only polynomially with system size.
In this section, we give a pragmatic introduction to the PEPS construction from the point of view of numerical practitioners. To this end, we only briefly elaborate the ansatz and its properties before discussing numerical details of contraction, optimization, fermionic systems, and the implementation of symmetries.

PEPS ansatz and properties
To give a practical example, we consider a generic many-body wavefunction |ψ on a 3 × 3 cluster. In its most general form, the wavefunction can be expressed in terms of the rank-9 coefficient tensor Ψ σ where the integer indices x and y enumerate sites in the horizontal and vertical direction. The local or physical index σ x y ∈ 1, ..., d labels states in the local Hilbert space at site r = (x, y). Obviously, this generic representation suffers from an exponential system-size scaling, which is reflected in the fact that the number of elements of Ψ is equal to the total Hilbert space d N = d 9 . Here N denotes the total number of sites and the local dimension d describes the total number of quantum states per site. Typical values are d = 2 for a spin- 1 2 system or spinless fermions, d = 3 for spin-1, and d = 4 for spinful fermions.
The key idea of the PEPS construction is to circumvent the exponential scaling in system size by decomposing Ψ into a set of high-ranked tensors (in the following denoted M tensors). A PEPS representation for the wavefunction in Eq. (4) requires a total of nine M tensors, |ψ = α 6 γ 6 |σ 1 1 σ 1 2 ... σ 3 3 .
Each tensor M has a set of virtual indices, α i for horizontal bonds and γ i for vertical bonds, connecting each M to its counterparts on up to four neighboring sites, according to the lattice geometry. Following Sec. 2, the diagrammatic representation can be easily derived by introducing the diagram for a rank-4 "bulk" tensor The boundary tensors of a finite-size PEPS contain fewer legs. Since we focus on the translationally invariant formulation of PEPS in the following, we refrain from a detailed discussion of various boundary conditions and the corresponding tensors [47]. In general, the number of M tensors in the PEPS representation is equal to the number of sites in the system, e.g., N = L × L tensors for a square lattice of L × L sites. Starting from Eq. (6), the diagrammatic representation of the full wavefunction |ψ in Eqs. (4) and (5) follows immediately, = .
In principle, one can perform such a decomposition exactly for any arbitrary many-body wavefunction. For larger systems, however, the dimension of the virtual indices has to be increased exponential which, for numerical purposes, is not practicable. Therefore, one limits the dimension of the virtual bonds of each PEPS tensor to some upper cutoff D [52]. Thus adding an additional site (or row/column of sites) only leads to a polynomial increase in the number of coefficients of the wavefunction. In numerical practice, D is used as a control parameter for the numerical accuracy. It is typically restricted to D ≤ 8-16, depending on the model and lattice geometry, because for larger values the numerical costs become unfeasibly high.
Restricting the bond dimension of the M tensors comes at a price: only a subset of states can efficiently be represented by a PEPS, since D also limits the maximum amount of entanglement that can be captured by the construction. Fortunately, this is perfectly in line with the area law of the entanglement entropy in 2D, which is fully satisfied by a PEPS representation. Hence, PEPS are ideally suited to approximate low-energy states, including the ground state of local gapped Hamiltonians in two dimensions. Although this statement cannot yet be put on such a mathematically rigorous foundation as 1D, it is strongly supported by numerical evidence [53].
Moreover, the PEPS representation has the remarkable property that, in contrast to MPS, it is capable of faithfully representing algebraically decaying correlation functions, which are characteristic for gapless models. This can easily be shown for the example of the partition function of the 2D Ising model [51]. Therefore, the PEPS ansatz is in principle able to also treat critical ground state wavefunctions. In practice, however, this does not help substantially in the context of 2D quantum criticality (the above mentioned example deals with classical and not quantum criticality). Based on the quantum-to-classical correspondence, one would require a 3D PEPS construction to faithfully approximate a critical 2D quantum system. Thus, in reality PEPS faces the same challenges in the context of gapless 2D systems as MPS treating critical 1D models: Both TN frameworks may yield results ranging from excellent to moderate quality depending on the "severeness" of the area-law violation in a particular system [52].

iPEPS
For finite-size PEPS simulations, each M tensor is typically chosen to be different (similar to MPS applications for finite systems). Alternatively, it is possible to exploit the translational invariance of a system and directly work in the thermodynamic limit (of course, this approach also works for MPS [54]). In this way, finite-size and boundary effects can be completely eliminated.
In order to construct an infinite PEPS (iPEPS) [42], we first choose a fixed unit cell of a certain size, and repeat it periodically to cover the entire infinitely large lattice. The size of the fundamental unit cell directly translates into the number of different M tensors required for the iPEPS representation. For instance, one can impose strict translational invariance and choose a unit cell of size 1 × 1, The resulting iPEPS representation of |ψ then requires only a single M tensor. However, ordered ground states often break translational invariance to some degree. An iPEPS ansatz of type (8) cannot capture this behavior. Therefore, it is advisable to relax the translational invariance to some extent by choosing a larger unit cell. For example, the following ansatz is fully compatible with a antiferromagnetic ground-state order using two different M tensors in a 2 × 2 unit cell: In principle, unit-cells of arbitrary size can be considered, e.g., The numerical costs scale linearly with the number of tensors in the unit cell, meaning that large unit cells become numerically expensive. A natural guideline to evaluate which unitcell sizes should be considered in a simulation is to remember that the unit cell should be compatible with the actual ground-state order. Otherwise, one does not obtain the actual ground state from an iPEPS calculation. Instead, one ends up with the lowest-energy state for the system constrained to the corresponding unit-cell geometry and, therefore, is restricted to specific orders. When studying systems with competing low-energy orders, the flexible unit-cell setup of the iPEPS algorithm actually becomes a big advantage. By probing different unit cells, it is possible to stabilize wavefunctions with competing orders independently. Comparing the energies obtained from the corresponding simulations, one may then determine which order survives in the ground state of the system [5,6].

Contractions
To extract local observables, perform overlaps, or to actually optimize the tensors, the (i)PEPS framework requires contracting an (infinitely) large tensor network. This turns out to be much more challenging than in context of MPS where, for example, overlaps can be evaluated exactly with only polynomial costs in system size. For a PEPS tensor network, however, the calculation of an exact overlap represents an exponentially hard problem [55] and cannot be performed efficiently. Fortunately, there exist a variety of approximate schemes to deal with this issue.
In this review, we focus on the corner transfer matrix method (CTM) [56,57], which is particularly well suited for iPEPS applications on square-lattice geometries. Alternatively, it is also possible to rely on an infinite MPS technique for the purpose of this work [42,58,59]. Other contraction schemes based on renormalization ideas, such as the tensor renormalization group [15,16], or tensor network renormalization [19,20], do have some technical disadvantages (e.g., environmental recycling [46,60] is not possible, and difficulties arise when calculating longer-ranged correlators, ect.), rendering them unsuitable for our purposes.
Before discussing the details of the CTM scheme for evaluating the scalar product ψ|ψ , we first introduce the corresponding diagram of ψ| for the 3 × 3 square-lattice toy example, which is a mirror image of Eq. (7). The contraction of ψ|ψ can be done site by site using so-called reduced tensor m = M x † y M x y , which is obtained by tracing over the joint physical where the double indices (e.g., (αα )) have dimension D 2 , as indicated by their increased line thickness. In the second line, we redrew the lines representing indices γ and ρ in such a way that pairs of corresponding primed and unprimed indices match up. This diagrammatically performed "index bending" exploits the non-uniqueness of the graphical representation for a tensor network [44]. This modification is completely trivial for bosonic iPEPS but will add additional complications in the context of fermions [see Sec . 4].
To reduce the complexity of the TN diagrams appearing in the following, we introduce a modified version of the conjugate tensor that automatically accounts for the index bending discussed in Eq. (12): This distinction may seem unnecessary at this point, sinceM x † y and M x † y are mathematically equivalent objects in the context of bosons. However, this is not the case for fermionic systems [c.f. Eq. (61)]. Therefore, we emphasize the importance of this modification already here.
The scalar product ψ|ψ for this simple example is obtained by contracting all physical and virtual index of the nine m tensors, Note that the second step ( * =) also exploits the non-uniqueness of the diagrammatic representation by employing a number of so-called "jump-moves" [44]. In these operations, it is possible to drag a line over a tensor without changing the corresponding TN. For example, the line connecting M 2 3 and M 3 3 was dragged downward acroos M 3 2 † . Again, this modification is trivial in context of bosonic PEPS, but nontrivial for fermionic PEPS [see Sec . 4].
Studying the small tensor networks in Eq. (14), it becomes obvious that the exact contraction of the expression scales exponential with system sizes. No matter in which order one decides to contract the tensors, i.e., which "contraction pattern" one uses, one always generates an object with a number of open indices scaling with L (here L = 3).

Corner transfer matrix scheme
Since it is not possible to perform the exact calculation of a scalar product efficiently in the PEPS nor in the iPEPS framework, one has to rely on approximate approaches. A particularly powerful contraction scheme is based on ideas of the corner transfer matrix (CTM) renormalization group proposed by Nishino and Okunishi [56]. Their idea was later adapted by Orús and Vidal [57] in the context of quantum systems to efficiently evaluate an iPEPS tensor network contraction.
The key insight of the approach is to represent the infinitely large tensor network by a small number of tensors, zooming into a 1 × 1 or 2 × 2 window of sites (in general, this might be only a subset of the full unit cell, which in general has the size L x × L y ). The rest of the system, the so-called "environment", is represented by a set of corner matrices C and transfer tensors T . For the 2 × 2 subset embedded in the environment, this takes the form where the environmental tensor network is represented by a set of four corner matrices (C lu , C ld , C ru , C rd with subscripts denoting the spatial location, i.e., l, r, u, d stand for left, right, up, down, respectively), and eight transfer tensors (two tensors for each direction, T l , T r , T u , T d , respectively). In this representation, a new set of virtual indices is introduced connecting tensors of the environment only. As we discuss below, the dimension χ of these indices acts as additional parameter controlling the accuracy of the environmental approximation (reasonable choices are χ D 2 ).
CTM protocol.-The environmental tensors are obtained by performing directional coarse graining moves in each direction of the lattice. Each coarse graining move consists of three different steps: (i) insertion of an extra unit cell; (ii) absorption of a single row or column of the unit-cell tensors into the set of environmental tensors in one lattice direction, leading to an enlarged environmental bond dimension χD 2 ; (iii) renormalization (or truncation/compression) of the enlarged environmental tensors to their original size. Steps (ii) and (iii) are repeated until the inserted unit cell has been fully absorbed into the set of environmental tensors in the one particular direction. Next, an additional unit cell is inserted next to the original unit cell in one of the other directions, and the move is carried out with respect to another direction of the lattice. A full coarse graining step is completed after one move in each of the four lattice directions (left, right, top, bottom) has been performed.
In the following, we illustrate this procedure for an iPEPS representation with a 2 × 2 unit cell, using four M tensors that all have the property M x y = M x+2 y = M x y+2 = M x+2 y+2 (as in Figure 1: CTM coarse graining move to the left lattice direction: (i) extra unit cell is first inserted, and then column-wise integrated into the left part of the environment by performing two subsequent (ii) absorption and (iii) renormalization steps.
Eq.9). A directional move to the left then includes the steps illustrated in Fig. 1. Note that the extra unit cell has been inserted horizontally on the left (this is also the case for a move to the right). Moreover, two absorption and renormalization steps are carried out, at the end of which the inserted unit cell has been fully integrated into the left part of the environment. This set of operations yields an updated set of environmental tensors for the direction of the coarse graining step. We also sketch in Fig. 2 a coarse graining move towards the top of the lattice. In this case, the unit cell is inserted vertically. Then we follow the same protocol as for the left move. Only the direction of the absorption and renormalization steps differs. After also carrying out these coarse graining moves with respect to the other two lattice directions, a full coarse graining step has been completed. The full cycle is typically repeated multiple times depending on the correlation length in the system. In gapped systems already a few full steps may be sufficient to obtain converged results. Especially for critical wavefunctions, however, the number of cycles required to reach convergence in local observables can significantly increase. Renormalization.-In addition to the number of steps performed, the convergence of the results also strongly depends on the implementation of the renormalization step, which truncates the environmental tensors after the absorption step. The renormalization is crucial for the performance of the CTM scheme. However, its implementation details are not very straightforward, and currently there seems to be ample room for future improvement. The ambiguity of implementation details is mostly caused by the lack of an exact canonical representation for a PEPS TN, which implies that there is no obvious optimal way of performing the truncation (in contrast to an MPS tensor network, which can be truncated optimally even in the context of translationally invariant systems [59]).
We list and comment on a number of different renormalization schemes. One corresponds to the directional updated scheme proposed by Orús and Vidal in Ref. [57], which we found to work well only in the context of very homogenous wavefunctions. This method takes only small subsets of the environment into account and implicitly assumes full translational invariance when generating the projectors (or isometries) to perform the truncation. This ultimately yields a very biased truncation pattern for inhomogeneous systems, where this method is bound to fail. The second approach is based on the original CTMRG of Nishino and Okunishi [56] and was first employed by Corboz, Jordan and Vidal Ref. [24] in the context of iPEPS. In this case, the full environment is taken into account in each truncation step, which presents a crucial advantage for simulating inhomogeneous states. On the other hand, it is severely limited by machine precision, making it unstable for large values of environmental bond dimension χ. This is far from ideal since it is desirable to use χ as additional control parameter. To overcome these shortcomings, Corboz, Rice and Troyer Ref. [6] introduced a third CTM variant that shows strongly improved convergence properties in comparison to the original CTMRG scheme and, at the same time, overcomes the inhomogeneity issues of the directional updated scheme. In the following, we sketch how to obtain the projectors used to reduce the sizes of the environmental tensors after an absorption step in the left direction, following Ref. [6]. The protocol works similarly for the other spatial directions of the lattice.
In the first step, we enforce two cuts in the tensor network consisting of the 2 × 2 unit-cell subset embedded in the effective environment as follows Our goal is to obtain projectors (or isometries) that are inserted after an absorption step at a specific bond to "project" (or truncate/compress) the enlarged environmental Hilbert space D 2 χ back to its original size χ. In this example, we specifically aim for the projectors to be inserted into the two bonds split by the left cut. 1 To this end, we contract the two upper and lower parts of the tensor network, leading to rank-4 tensors Q u and Q d . By applying a singular value (or QR) decomposition to both of these tensors, we obtain The product R u R d is then subjected to an additional SVD where only the χ largest singular values are kept, Using the inverse matrices R −1 u and R −1 d , we generate the projectors P x y ,P x y that are inserted at the left cut of the tensor network (15): The protocol is repeated for the entire row of the unit cell to be absorbed into the environment during this particular coarse graining step (i.e., L y times). In our example of an 2×2 unit cell, we therefore also obtain P x y+1 andP x y+1 (or alternatively P x y−1 andP x y−1 due to translational invariance) by considering the tensor network and repeating the procedure sketched above, Now we are fully equipped to renormalize the entire set of environmental tensor which are subject to truncation during an absorption step to the left, What has been achieved is a scheme that compressed the bond dimensions of the environmental tensors along the left row in a way that encodes information from the full environment.
Thus we can appropriately deal with translational symmetry breaking in the iPEPS wavefunction. At the same time, this procedure leads to numerically stable results since we can eliminate spurious parts of the SVD spectrum during the intermediate SVD decompositions in Eq. (16) by discarding very small singular values (e.g., < 10 −7 ). This helps to reduce the influence of numerical noise in the subsequent steps. Larger unit cells.-The CTM scheme can also deal with rectangular unit cells of arbitrary sizes containing L x × L y = N different M tensors, where the relative position of each tensor in the unit cell is labeled by its coordinate r = (x, y). To this end, we assign one set of corner matrices and transfer tensors to each coordinate, requiring a total number of 4N corner matrices and 4N transfer tensors to be stored independently. We illustrate this approach for a 3 × 2 unit cell in Fig. 3. After initialization (see below), the environmental tensors are then iteratively updated by performing coarse graining moves in all four lattice directions, as outlined above. However, an entire CTM cycle now includes L x coarse graining steps to the left and right, respectively, as well as L y coarse graining steps to the top and L y to the bottom of the lattice. Note that using a larger zooming window is not an option, since the numerical costs quickly become unfeasible.
Initialization.-While covering the coarse graining procedure to obtain the converged environmental tensors, we have not yet discussed the initialization of the CTM scheme. In principle, one could start from an arbitrary set of corner matrices and transfer tensors. However, choosing a completely random set can significantly increase the number of coarse graining steps required for obtaining a stable environment TN and sometimes even cause numerical instabilities. In practice, we found that optimal convergence is achieved by starting from an environmental tensor set formed by the corresponding M x y tensors and their conjugates, which previously have been generated by means of ground-state optimization [see Sec. 3.5]. We illustrate this initialization procedure for two examples, Effective contraction pattern.-The numerical costs of implementing the square-lattice CTM scheme presented above scales as O(D 6 χ 3 ), with iPEPS bond dimension D and environmental bond dimension χ. Note that these costs are equivalent to those of the infinite MPS method from Ref. [42]. Assuming that χ = O(D 2 ), we end up with a total cost scaling of O(D 12 ) for the iPEPS algorithm. The underlying assumption behind this cost scaling is that all contractions are carried out as efficiently as possible, which forces us to pay some attention to the contraction patterns. In particular, we cannot directly work with the reduced tensors m x y , but rather need to perform contractions involving M x y and its conjugate M x † y sequentially. This is illustrated below for contracting a part of the diagram in Eq. (15). First consider the case explicitly using the reduced tensor m x y , Counting the involved indices in the dashed box, it becomes clear that the last contraction step scales rather unfavorably as O(D 8 χ 2 ). If we want to achieve the optimal scaling O(D 6 χ 2 d) in this step, we have to contract over M x y and M x † y sequentially, The same applies to contraction orders of other TN such as, for example, the one shown in Eq. (20) and many others. It pays off to constantly pay attention and ensuring that the optimal contraction pattern is used when implementing an iPEPS algorithm. Otherwise, the backlash of an inefficient iPEPS implementation will quickly become apparent, since simulations with moderate to large D will not be feasible. Note that the most expensive steps of the CTM algorithm occur when generating the projectors. To obtain the tensor Q u in Eq. (16), for instance, one has to perform the contraction, .
This always yields a cost scaling of O(χ 3 D 6 ) which cannot be reduced further.

Expectation value
The CTM scheme enables us to evaluate observables within the iPEPS framework. For this case, we consider a simple two-site observableÔ (x+1,y) (x,y) which, for example, represents a spinspin correlation function involving two neighboring sites. To compute an approximation for the expectation value Ô (x+1,y) (x,y) = ψ|Ô (x+1,y) (x,y) |ψ / ψ|ψ , we represent the environment of the two contiguous sites r = (x, y) and r = (x, y + 1) in terms of the corner matrices and transfer tensors encountered in the last section, The contraction of the final tensor network, consisting of the six environmental tensors E 1 , ... , E 6 , the two M tensors, their conjugates, and the operatorÔ can be carried out efficiently. It produces an approximation of ψ|Ô |ψ χ which is generally expected to deviate from the exact value due to the non-exact representation of the full tensor network. The correct value of ψ|Ô (x+1,y) (x,y) |ψ / ψ|ψ ≈ ψ|Ô (x+1,y) (x,y) |ψ χ / ψ|ψ χ should be recovered in the limit χ → ∞. In practice, one evaluates Eq. (25) for a number of different values of χ = 10, 20, ..., 100, 150, ... until the observable shows no more significant dependence on χ. The required value for χ to obtain converged results strongly varies depending on the physical properties of the corresponding system and the employed iPEPS bond dimension D.

Ground state search
An iPEPS is an approximate representation for the ground-state wavefunction of a local Hamiltonian on a two-dimensional lattice. Having addressed the contraction issue by means of the CTM scheme [see previous Sec. 3 .3], the remaining open question concerns finding the ground-state iPEPS representation, given some HamiltonianĤ with only nearest-neighbor interactions. (Albeit technical more complicated, iPEPS can also treat longer-ranged interactions, for more details see Ref. [24,37].) Here we follow the strategy proposed in the original iPEPS formulation by Jordan, Orús, Vidal, Verstraete and Cirac [42], and use the imaginary time evolution to target the ground state, The time-evolution operator e −τĤ is further decomposed by Suzuki-Trotter decomposition, whereĥ x,x y,y describes the local interaction terms acting on a pair of nearest-neighbor sites in the unit cell, andĤ = (x,y),(x ,y ) ĥ x,x y,y . The two-site gates, e −ĥ x,x y,y τ , are subsequently applied to the corresponding pairs of M tensors, M x y and M x y . As in the case of MPS, the resulting tensor has to be truncated accordingly to restore the original form of the iPEPS representation.
In the MPS framework, the truncation can be implemented in an optimal way using the canonical form of the MPS and employing a single singular value decomposition. In the context of iPEPS, this step turns out to be more evolved. Due to the lack of an exact canonical form for the iPEPS, one has to rely on approximate techniques to account for the effects of the environment when employing the truncation. This can be done using several different optimization schemes, such as the simple update [61] and the full update [42]. We discuss both of these approaches extensively in the rest of this section.
Although not employed in the context of this review, we also note that two groups recently introduced alternative optimization schemes, which do not rely on imaginary time evolution [21,22]. Instead, they implement a variational update method, The major technical challenge of these newly developed schemes is to find an approximate, yet accurate, representation for the full HamiltonianĤ. Corboz [21] achieves this based on a modified CTM scheme, while Vanderstraeten, Haegeman, Corboz and Verstraete [22] build on MPS techniques. In addition, it is still unclear how to optimally translate the local update performed on a pair of tensors to the iPEPS representation in the infinite system. Despite these issues, both variational optimization techniques already obtain very impressive results, illustrating that the iPEPS formalism will continuously improve and become more competitive in the near future.

Bond projection
In this work, we only consider the optimization via imaginary-time evolution based on two-site Trotter gates, which implies that we constantly have to update two neighboring M tensors at once (i.e., there is no one-site version of this algorithm). Hence, it is essential to perform the tensor updates as efficiently as possible. Treating the full M tensors in this context turns out to be numerically very inefficient (i.e., numerical costs of O(D 12 ) in the context of the full update). Instead, it is always advisable to perform the tensor update on two subtensors with lower rank which are easily obtained by a bond projection [62], leading to a significant cost reduction (i.e., O(D 6 d 3 ) [46]. Note that this scheme does not introduce further approximations since the two-site Trotter gate only changes properties of the corresponding bond but leaves the remaining bonds of the iPEPS tensors unchanged. The bond projection is obtained by performing two exact SVD (or QR) decompositions: The tensor optimization now only affects the subtensors v x y and w x+1 y , whereas the remaining bonds are shifted into the subtensors X x y and Y x y , which can be treated as parts of the environment tensor network during the optimization.
Each tensor update is initialized by applying the corresponding Trotter gate in the bond projection, The Trotter gate increases the initial bond dimension D of the subtensors v x y and w x+1 y . Restoring the original representation exactly yields a pair of enlarged subtensorsṽ x y andw x+1 y with bond dimension dD (illustrated by the increased line thickness in Eq. (30)). In a next step, we have to find an appropriate truncation scheme to obtain a pair of subtensors v x y and w x+1 y with the original bond dimension D to prevent an exponential blowup of the iPEPS tensors.
In the following, we present two different truncation methods: (i) the simple update [61], a numerically very efficient and fast approach which, however, relies on a strong simplification of the environmental tensor network and thus carries out the truncation in a suboptimal way; (ii) the full update scheme [42] which leads to an optimal truncation by incorporating the effects of the entire wavefunction appropriately. However, the full update comes at the price of requiring significantly more numerical resources.

Simple update
The simple update, introduced by Jiang, Weng and Xiang Ref. [61] is formulated in a slightly modified iPEPS representation. So far, we only dealt with M tensors located directly at sites of the lattice. For the simple update we put an extra set of tensors on the bonds of the iPEPS tensor network. These tensors, here labeled λ x y for horizontal andλ x y for vertical bonds, are diagonal matrices similar to those used in Vidal's TEBD and iTEBD formulation for time-evolving matrix product states [58,63].
Starting from the standard iPEPS representation that has been adopted in this review, so far, it requires only a minor adaption to translate into this modified representation, where Γ x y in combination with the roots of all four bond tensors yields the original M x y tensor. The key idea of the simplified update is to approximate the full environment of two neighboring sites, r = (x, y) and r = (x + 1, y), by only the diagonal tensors surrounding this pair of sites. This procedure is adopted from MPS-based time evolution via the iTEBD algorithm.
To perform the simple update explicitly, we switch first into the bond projection to carry out the optimization more efficiently. We illustrate the projection here explicitly since different tensors are involved in the modified iPEPS representation, Now the Trotter gate is applied to the subtensors on the bond, adding entanglement and potentially increasing the bond dimension to dD. To obtain the pair of subtensors v x y and w x+1 y with the original bond dimension D, the simple update relies on a simple SVD, No extra iteration or optimization is required to complete the update (hence, the name "simple" update). The updated diagonal bond matrix λ x y contains the D largest singular values, the optimized subtensors are obtained from v x y = U and w x+1 y = V † . To restore the form of the iPEPS tensors from Q x y and Q x+1 y , we apply the inverse of the additional bond tensors, which have not been altered by this optimization step, The simple update is particular appealing due to its low complexity and high numerical efficiency; the truncation based on a plain SVD in Eq. (33) only scales with O(D 3 d 6 ) operations. Yet, the truncation itself cannot be considered optimal in the context of iPEPS. It would have been optimal if we had gauged the surrounding bonds in such a way that they exclusively contain orthonormal basis sets. Unfortunately, this is only possible if the environment is separable, as in the case of MPS or other tensor networks without loops. In fact, one can show that a tensor optimization performed in this way presents an optimal update for an infinite tensor network on a Bethe lattice [62]. Any iPEPS representation on a standard 2D lattice, however, does feature loops, which means that we cannot separate the environment into two blocks and find a gauge with orthonormal basis sets on all surrounding bonds. Hence, the simple update introduces a systematic error, as it does not properly account for the full environment of the bond during the optimization. The magnitude of this error turns out to be less severe than one might expect. Especially for systems in gapped phases, the simple update leads to excellent results [44]. Moreover, its numerical efficiency often allows simulations with larger bond dimensions compared to the full update; thus it can give access to complex systems which remain out of reach for full-update calculations.
We conclude this section with a few practical comments concerning the implementation of the simple update: • For a generic unit cell, the simple update is employed sequentially on all bonds in the system. One can easily work with a second-order Trotter decomposition by reversing the application order of the gates in every second step.
• The normalization of the tensor network can be conveniently achieved on the fly by normalizing the trace of each updated diagonal bond matrix λ x y to unity. This procedure leads to a numerically fully stable algorithm.
• To obtain a meaningful representation of the ground state by means of imaginarytime evolution, we start from a random set of tensors and use a fairly large time step τ = O(10 −1 ). A large initial time step is important since it minimizes the risk of getting stuck in a local energy minimum and, in case of symmetric iPEPS implementation, it enables us to dynamically adapt the symmetry sectors on the bonds (starting from a very small time step, one can get stuck in the initial symmetry configuration and not reach all relevant sectors). To decrease the effect of the Trotter error, we then gradually reduce τ as soon as we observe convergence with respect to the SVD spectra (typically after a few hundred or thousand time steps). After reaching a time step of the order O(10 −5 ), the ground-state wavefunction is typically converged.
• Measurements of observables are performed with the converged iPEPS representation, obtained from the simple update, as input for the CTM scheme. Relying on CTM, this leads to a total numerical cost scaling of O(χ 3 D 6 ), which is, in principle, equivalent to the cost scaling of the full update. In the latter, however, the full environment has to be calculated in every step and not just at the end to perform measurements.

Full update
The full update introduced by Jordan, Orús, Vidal, Verstraete and Cirac [42] represents a clean and accurate protocol for performing the tensor update during imaginary-time evolution. Its name is derived from the fact that the effects of the entire wavefunction on the bond tensors are considered, including the full environmental TN. The only approximation stems from the non-exact contraction of the environmental TN, which we carry out based on the CTM scheme After the application of the Trotter gate in Eq. (30), the full update generates the optimized pair of subtensors v x y and w x+1 y with bond dimension D by minimizing the squared norm between |ψ(v , w ) and the wavefunction |ψ(ṽ,w) containing the exact subtensorsṽ x y and w x+1 y with enlarged bond dimension dD, To minimize Eq. (35) with respect to v x y and w x+1 y , we first have to obtain an effective representation of the environment with respect to the bond to be updated (marked red): .
This is achieved via the CTM scheme, leading to an approximate representation of the envi-ronment in terms of corner matrices and transfer tensors, = As in the case of the simple update, we carry out the tensor update for efficiency reasons in the bond projection, as discussed above. In order to generate the full environment in this representation, we have to account for the subtensors X x y and Y x+1 y as well as their conjugates, and multiply them to the effective environment shown in Eq. (37), obtaining In this way, it is possible to represent the cost function (35) diagrammatically, d(ṽ,w, v , w ) is a quadratic function of the tensors v x y and w x+1 y . Thus, the optimized subtensors can be found using an alternating least-square algorithm [42].
To this end, we can first optimize v x y while keeping w x+1 y fixed. Analogous to the MPS compression, we form the partial derivative of Eq. (39) with respect to v †,x y , The solution for v x y in Eq. (40) is found by inverting R. Using the bond projection, the inversion can be computed exactly with moderate numerical effort O(d 3 D 6 ). The full M tensor representation, on the other hand, leads to an unfeasible costs of O(D 12 ) for the exact inversion, and O(D 8 ) employing approximation methods.
After obtaining the optimized subtensor v x y , we next update w x+1 y while keeping v x y fixed by forming the partial derivative of Eq. (39) with respect to w †,x+1 The solution for w x+1 y is again computed by matrix inversion of R. This alternation process is repeated until the subtensors v x y and w x+1 y converge. Monitoring the cost function d(ṽ,w, v , w ) after every iteration step i, the convergence is detected by means of a fidelity measure which, following Phien, Bengua, Tuan, Corboz and Orús [46], can be defined as The alternating optimization is stopped in case f d drops below some small threshold d = O(10 −10 ) while showing no sign of large fluctuations. Equipped with the converged subtensors v x y and w x+1 y , the original iPEPS form is then restored, so that we can apply the next Trotter gate and repeat the full update optimization. By accounting for the entire many-body wavefunction of the infinite system, the full update provides an optimization scheme that is free from the systematic error plaguing the simple update. Only the CTM representation of the effective environment induces some approximate character to the algorithm. The high accuracy of the method, however, comes at the price of drastically enhanced numerical costs since the full effective environment, in principle, has to be calculated after the application of every single Trotter gate (i.e., typically thousands of times). The fast-full update [46], where one updates the effective environment and site tensors simultaneously, offers an immediate improvement to this problem. Another possibility is the cluster update [64,65], a hybrid version of the simple and the full update, which takes into account an improved, yet not complete version of the effective environment. Also, we note that it may be possible to achieve improvements in accuracy when computing the environment by properly removing the short-range entanglement residing in loops. To this end, it may be fruitful to combine the CTM method with other entanglement filtering algorithms, such as the Loop-TNR algorithm [66], graph-independent local truncation [67], full environment truncation [68], or entanglement branching [69].

Gauge fixing
A well-known technical fact in the context of MPS is that the gauge degree of freedom on the bond indices can be efficiently exploited to generate a canonical representation [70]. Through the correct gauge, the effective environment of a specific bond, or rather its tensor network representation, can be replaced by identity matrices, ensuring numerical precision and stability of the MPS framework. The success of this scheme is closely linked to the fact that the environmental tensor network of an MPS is separable, such that the left and right block can be gauged independently. In the case of PEPS and iPEPS, the environment no longer factorizes into different blocks, due to the presence of loops in the tensor network. In other words, cutting the TN at a single bond does not yield a bipartition of the system (as in the case of MPS), and therefore no full canonical PEPS or iPEPS representation exists. Nevertheless, it is still possible to exploit the gauge degree of freedom on the bonds to improve the stability of the algorithm. Inspired by the 1D gauging protocol, Lubasch, Cirac, and Bañuls [47] recently introduced a gauge-fixing prescription for finite PEPS calculations that was later adapted in the context of iPEPS by Ref. [46]. It yields a significantly better conditioned effective environment and thus strongly improves the stability of the tensor optimization during the full update.
The gauge protocol [47] starts from the effective environment in the bond projection (38) which, after symmetrization, is subject to an eigenvalue decomposition, During this process, we remove the contributions from small negative eigenvalues to restore the positivity of E full . Next we independently apply a QR and LQ decomposition to the tensor Z, and insert two identities LL −1 and R −1 R, into the left and right bond indices of the effective environment, respectively. This yields a renormalized pair of subtensorsv x y andw x+1 y and a modified environmentĒ full : Moreover, one also has to apply the inverse L −1 , R −1 to the subtensors X x y and Y x+1 y , respectively, so that the full M tensors can be restored properly after the tensor update [c.f. Eq. (43)

Fermionic tensor networks
For the tensor network representations discussed so far, we implicitly restricted our discussion to bosonic quantum many-body models. However, some of the most challenging and interesting open questions with respect to the physics of strongly correlated systems involve fermions. Especially in two dimensions, the t-J model, the Hubbard model, and its multi-band extensions continuously attract much attention, since they are believed to play an important role for understanding of high-Tc superconductivity and quantum criticality. Due to the lack of alternative approaches (QMC is particularly limited by the sign problem in this context), much hope is set on tensor network techniques to treat these complex fermionic models under controlled conditions. TN representations can incorporate fermionic statistics in any spatial dimension, and several different approaches have been developed for its efficient implementation, being mathematically all equivalent [44,45,[71][72][73][74][75]. The most useful point of view for practitioners is that taken by Corboz and Vidal [71], adapted to the iPEPS by Corboz, Orús, Bauer and Vidal [44]. It fully implements the fermionic exchange rules in terms of modifications to the tensor network diagrams. In the following, we briefly review the main ingredients for fermionic tensor networks, mostly following [44], although not with the same formal rigor, to keep the presentation compact. We refer to Sec. 5.4 for technical details on the fermionic iPEPS implementation in combination with non-abelian symmetries.
For simplicity, we focus on a lattice of spinless fermions with a local Hilbert space dimension d = 2 on every site (though everything can easily be generalized to fermions with d > 2 [44]). The fermionic statistic of this model is typically treated at the level of operators, specifically by the anticommutation relations of the fermionic annihilation and creation operators,ĉ j andĉ † j , In addition, one always imposes some fermionic ordering of the sites, such that a fully occupied state on the lattice containing N sites can be expressed by means of second quantization using the vacuum state |0 1 |0 2 ...|0 N and an ordered sequence of creation operators, Starting from the techniques discussed in the context of bosonic systems, how can we incorporate the fermionic statistic into the framework of tensor networks? One possibility is to employ a Jordan-Wigner transformation to represent the fermionic operators in terms of Pauli matrices. In this way, the fermionic operatorĉ j is expressed in terms of bosonic operators in a non-local form, which can be described by a so-called Jordan-Wigner string acting on all sites j < j that appear "earlier" in the fermionic order of Eq. (49) [76]. These strings can be treated efficiently in the MPS framework, where it is always possible to choose the fermionic order j equivalent to the position of a site in the MPS chain mapping. However, it leads to severe complications in the context of PEPS, where two nearest-neighbor sites r = (x, y) and r = (x + 1, y) on the lattice might appear far apart in terms of their fermionic order j and j [44]. To retain the "locality" of the iPEPS algorithm as well, we here adopt a different approach for the treatment of fermionic statistic in the tensor network language. This formulation builds on two simple "fermionization" rules discussed below, that were pioneered in the context of fermionic MERA by Refs. [71] and [45], and later adapted to the PEPS and iPEPS framework [44].

Parity conservation
A Fermionic Hamiltonian typically preserves the parity of the particle number of the state it acts on, defined to be p = 1 or −1 for an even or an odd number of particles, respectively. This Z 2 parity symmetry enables us to define wavefunctions and operators in terms of a welldefined parity quantum number p, resulting in a block structure in the tensor network. In particular, every index of a tensor can be assigned a well-defined parity.
The first fermionization rule enforces parity conservation in a TN representation. To this end, all tensors have to be chosen to be parity preserving. Taking a generic element of some M tensor as example, it means that with p(α) ∈ {−1, 1} describing the parity of the state labeled by the index α [44]. This immediately has the consequence that operators changing the parity number of a state, such asĉ j have to be encoded with an additional index (see below). Parity conservation does not directly capture the fermionic statistic. However, it is crucial in order to track the fermionic signs, since we are able to distinguish states containing an even or odd number of fermions.

Fermionic swap gates
The second fermionization rule of [71] incorporates the fermionic statistics into the tensor network formalism. It implies that each line crossing in the TN is replaced by a fermionic swap gate,Ŝ αβ β α = δ αβ δ βα S(α, β) = , with S(α, β) = −1 if p(α) = p(β) = −1 and S(α, β) = 1 otherwise. Why do the swap gates mimic the anticommutation relations of the fermions? Each line of the TN diagrams corresponds to a fermionic degree of freedom representing either physical (site indices) or virtual particles (bond indices). Any line crossing then corresponds to a particle exchange [71]. The implication of such an exchange depends on the nature of the particles. In the case of bosons such a swap is a trivial operation without any consequence. In the context of other particles, such as fermions, the underlying particle statistic does yield non-trivial consequences. For instance, additional factors of −1 have to be multiplied to the tensor network when swapping two states with odd fermionic parity number. Thus, the fermionic statistic of any tensor network can be captured by adding swap gates of type (51) to the diagrammatic representation. As a prerequisite, one has to be able to read out the parity of every index in the TN (hence, the first rule).
We emphasize that the fermionization rules can be readily implemented into any standard bosonic TN algorithm without altering the leading numerical costs, since the swap gates can typically be absorbed into a single tensor [71]. All steps can be performed completely analogously. In our iPEPS implementation we were able to recycle most parts of our code for bosonic systems by simply adding swap gates at the appropriate lines.

Fermionic operators
Another prerequisite to capture the fermionic statistic in a TN representation relates to the proper definition of local fermionic operators. Consider a generic two-site operatorÔ ij acting on sites i and j, with j > i not necessarily labeling contiguous sites in terms of the imposed fermionic order. Applied to a generic wavefunction, the resulting TN diagram contains a number of fermionic swap gates (illustrated in detail for MPS and iPEPS below). The impact of these gates on the wavefunction can be interpreted as swapping the physical index of site i such that it becomes contiguous to j with respect to the fermionic order. But this alone does not fully account for the fermionic statistics. In addition, the fermionic order of the local two-site Hilbert space generated by sites i and j has to be properly incorporated on the level of the operators, which leads to factors of −1 for some matrix elements.
While easily generalizable to arbitrary systems [44], we illustrate this briefly for the simple example of spinless fermions, where the operator is expanded in the two-site basis The coefficients O σ i σ j σ i σ j are given by If the operator describes a pairing term,Ô =ĉ iĉj , the only non-vanishing coefficient is A standard hopping termÔ =ĉ † iĉj also has only a single nonzero element, We conclude this part with an additional comment on operators that change the parity of a state, such asÔ =ĉ j . The first fermionization rule restricts our TN description to parity preserving tensors, as defined in Eq. (50). Naively, this would imply that simple annihilation or creation operators could not be properly described by fermionic TNs, since their tensor representation does not conserve fermionic parity. However, any parity changing tensor can be represented by a parity conserving tensor just by adding an additional single-valued index δ with p(δ) = −1 [44]. For instance, the diagrammatic formĉ j is then given by where the red line indicates that δ only takes a single value, i.e., represents a singleton dimension in a rank-3 tensor. This representation ensures that the only nonzero element,

Fermionic PEPS implementation
To enter this discussion, we return to our finite-size PEPS example on a 3 × 3 square-lattice cluster used in the beginning of Sec. 3.1. Recall that each site is labeled according to its coordinate in space, r = (x, y), so that the local basis states are denoted by |σ x y . In addition, we now have to decide on a specific fermionic order and use an additional label j, running from 1 to 9, to enumerate all sites of the system, |σ x y,j (the red color of the fermionic index acts as guide for the eyes). Thus, a specific state in the Fock space can be expressed as Diagrammatically, this ordering always corresponds to the order in which the open indices of the wavefunction |ψ are drawn, and directly affects the specific appearance of the PEPS TN, = .
We emphasize that a different fermionic order automatically leads to a different diagrammatic representation, where the swap gates (black diamonds) potentially act on a different set of bonds. In this work, we only consider the fermionic 'zig-zag' order of Eq. (59) which (i) can also easily be applied to an infinite lattice system and (ii) enables us to recycle all bosonic iPEPS diagrams depicted in Sec. 3.1. For an explicit example of imposing another fermionic order, see Ref. [44]. After obtaining the proper diagrammatic form of the PEPS, all subsequent operations follow in complete analogy from the bosonic case. The only additional feature are the swap gates, which are put on every line crossing. For instance, an overlap calculation ψ|ψ , derived in Eq. (14) for the bosonic PEPS by performing a number of jump moves, is carried out similarly for a fermionic system, To reduce the complexity of the diagram, we again introduced a modified representationM x † y of the conjugate tensors in the second step of Eq. (60). In contrast to the bosonic case, wherē M x † y and M x † y are mathematically equivalent objects [see Eq. (13)], we emphasize thatM x † y here includes two fermionic swap gates that are absorbed into the tensor, according to = . (61)

Fermionic iPEPS implementation
Considering fermions in an infinite lattice system, the protocol of imposing a zig-zag fermionic order on the lattice can be adopted in a very straightforward manner [44]. In hindsight, we already implied this kind of ordering when drawing the iPEPS diagrams in Sec simplify the algorithm by a great deal. For instance, the calculation of an overlap ψ|ψ can even be represented diagrammatically without any swap gates present, = .
In principle, this would also enable us to carry out the coarse graining steps in the CTM calculation exactly in the same way as in bosonic iPEPS in terms of the reduced m tensors.
To perform the algorithm with an efficient cost scaling, however, the M tensors and their conjugates have to be kept separated [see Sec. 3.3]. This typically leads to the presence of four additional swap gates for each site (only two when usingM x † y ). The strategy of incorporating the swap gates appearing in a TN is to absorb them into one single tensor [71]. Depending on the TN, this is not always possible in the very first contraction step. Nevertheless, every swap gate can typically be absorbed at some intermediate contraction step. We illustrate this procedure for the contraction of parts of the CTM environment, Swap gates also appear in the context of tensor optimization and the evaluation of a two-site operator, such as, We conclude this section by pointing out the modifications to the full-update protocol in the context of fermions. Again, most of the steps are exactly the same as in the bosonic version of the algorithm. In particular, the actual tensor optimization does not contain any swap gates due to the absence of line crossings in Eq. (39). However, the initialization slightly differs since one has to account for the presence of swap gates when performing the bond projection, Importantly, the swap gate acts differently on the conjugate tensors, so that the conjugate subtensors have to be generated by two independent SVD or QR decompositions, The tensor network representation of the effective environment also contains an additional set of swap gates, Whereas the tensor optimization does not differ from the bosonic formulation, the restoration of the actual iPEPS representation after the update works in a slightly modified way, Compared to the bosonic case in Eq. (43), we have to account for the additional swap gate.

Fermionic iPEPS with non-abelian symmetries
For a lattice model with abelian symmetries, quantum states can be labeled |ql , where q is an abelian "charge" quantum number, and l distinguishes different states with the same charge. Consider the simplest non-trivial example of a rank-3 tensor A, which fuses the tensor product of two state spaces with abelian symmetry, |ql and |q m , into a third, |q n . This operation can be expressed as To reflect the system's abelian symmetry, the A tensor carries a q-label for each of the indices l, m, n. From a numerical perspective this introduces additional bookkeeping effort. At the same time, symmetry-specific selection rules enforce a large number of elements of A to be exactly zero [for the example of U(1) particle conservation, the selection rule takes the form q = q + q ]. Keeping only the nonzero elements leads to sparse tensor structures and, hence, results in significant computational speed-up and reduced memory requirements. Let us now consider the same example in the context of non-abelian symmetries. Then quantum states can be organized into irreducible symmetry multiplets carrying an additional internal multiplet index, |ql; q z , |q m; q z , and |q n; q z . The states within a specific multiplet are fully defined by the generalized Clebsch-Gordan coefficients of the specific symmetry. In this description, the elements of the A tensor factorize into products of reduced matrix elements and Clebsch-Gordan coefficients, so that Eq. (70) generalizes to q n; q z = qlqz q mq z q m; q z |ql; q z A Here mn denote reduced matrix elements, and C [qz] q z q z = qq ; q z q z | q ; q z Clebsch-Gordan coefficients [30]. This allows one to further compress the nonzero data blocks of the tensors, further reducing the numerical requirements (although adding significant bookkeeping effort). Analogously, the elements of an operatorÔq qz transforming as an irreducible tensor operator and acting in a state space of specified symmetry, {|ql; q z }, can be expressed in a factorized form exploiting the Wigner-Eckart theorem, with reduced matrix elements (O

[q]
qq ) [1] ll = q l |Ôq|ql and Clebsch-Gordan coefficients C [qz] qzq z [30]. With the definition of symmetric tensors and operators in Eqs. (71) and (72), respectively, When exploiting symmetries, every individual index (i.e., leg of a tensor or line) represents a state space that must be expressed in terms of symmetry subspaces, throughout. For nonabelian symmetries, a given index describes a state space s that is organized as |s ≡ |qn; q z , where q specifies a symmetry sector, n a specific multiplet within the symmetry sector q, whereas q z indexes the internal multiplet structure which can be split off as a tensor product with a generalized Clebsch-Gordan tensor [30].
we are ready to set up a tensor network built from A tensors respecting the underlying symmetry of the system. By construction, the same is true for the corresponding quantum states, which transform according to well-defined representations of the symmetry group. Building on the fusion rules for different state spaces in Eq. (71), one can generate symmetric tensor networks consisting of higher-rank tensors. This can be easily understood from the perspective of contracting multiple A tensors to some larger-ranked object. The resulting tensor then represents a tensor product of several state spaces. Setting up a symmetric PEPS tensor network, for example, follows exactly this pattern leading to the diagrammatic representations in Fig. (4) for a single tensor (left) and a contraction of several such tensors (right): The symmetrized M tensors contain additional arrows on the index lines to indicate which state spaces are incoming and outgoing (i.e., which (group of) state spaces are fused into which, according to Eq. (71)). We have some freedom in fixing the direction of these arrows and some choices might be more convenient to implement than others. Note that the extra index of M 3 3 determines the global symmetry state of a specific PEPS representation. Of course, the symmetric PEPS also guarantees that the corresponding quantum state is symmetric, i.e., forms a well-defined symmetry multiplet.

Implementation of non-abelian symmetries
Symmetry-induced selection rules cause a large number of matrix elements to be exactly zero, thus bringing the Hamiltonian into a block-diagonal structure and subdividing tensors into well-defined symmetry sectors. Keeping only the nonzero elements, we can achieve tremendous improvement in speed and accuracy in numerical simulations by the incorporation of symmetries. In the context of non-abelian symmetries, the nonzero data blocks are not independent of each other and can be further compressed using reduced matrix elements together with the Clebsch-Gordan algebra for multiplet spaces.
The special ingredient of our fermionic iPEPS implementation, that sets our work apart from that of other iPEPS practitioners, concerns the explicit incorporation of non-abelian symmetries, such as SU(2) spin ⊗ SU(N ) orb with the fermionic Z 2 parity symmetry in the particle sector. The non-abelian symmetries are fully encoded in the QSpace [30] tensor library, which automatically handles the symmetry-induced fusion rules of both the reduced matrix elements and the Clebsch-Gordan space.
Non-abelian iPEPS was pioneered by Liu, Li, Weichselbaum, von Delft and Su [36] for the case of the spin-1 Kagome Heisenberg antiferromagnet, which illustrated an SU(2) spin symmetric iPEPS representation in terms of a "projection" picture. Following ideas of SU(2) invariant iPEPS representations for the spin-1 2 resonating valence-bond state [77,78] and the spin-1 resonating AKLT state [79], the symmetric iPEPS tensors can be understood as emerging from sets of 'virtual particles' associated with each site that are pairwise maximally entangled along each virtual bond with their nearest neighbor sites, and then projected into the local degrees of freedom of the corresponding site. Starting from such an SU(2) invariant iPEPS, eventually one only specifies the effective bond dimension D * , and lets the tensor optimization dynamically determine the relevant symmetry sectors on each bond. The number of multiplets D * translates into a significantly larger actual number of states, D, associated with each bond (note that D may vary for the same D * depending on the actual multiplets being used). In practice, the maximal feasible values for D * correspond to retaining an actual number of states D which typically lies out of reach of standard iPEPS calculations incorporating abelian symmetries only.
In the remainder of this section we briefly point out some important technicalities when implementing non-abelian iPEPS.

Global symmetry sector
Ref. [36] states that the projection picture is dense, in the sense that it can cover the full Hilbert space and generate any symmetry eigenstate. Whereas this is true for finite-size PEPS, we emphasize that for translational invariant systems where the iPEPS is tiled with the same M tensor, by construction, there cannot be a "drift" in average value of a quantum number along any line of M tensors. In the case of non-abelian symmetries this implies that the global symmetry label of the iPEPS is always constrained to the singlet sector. This is conceptually similar to the case of U(1) symmetries in iPEPS, where states are restricted to a global symmetry sector corresponding to the quantum number zero, i.e., q = 0 (see Ref. [80], referred as 'identity charge' therein).
We note, however, that for abelian U(1) symmetries such as charge, any local filling can be realized based on the simple observation that U(1) symmetry labels are additive. Hence one is free to shift them locally and scale them globally at will. Specifically, one may shift the charge labels associated with the local state space of each site relative to the targeted mean local occupationq, i.e., q → q −q. By this simple relabeling trick, average charges associated with the virtual bonds can fluctuate around q = 0. For non-abelian symmetries, however, such a relabeling scheme appears ill-suited, so that, by construction, our iPEPS implementation represents a global singlet. For our results below at finite doping, we still also only use Z 2 charge parity even though charge itself is conserved. = = = Figure 5: Arrow inversion. An identity, I = U † U , is inserted on a bond (here the center bond) where up to normalization the unitary U represents a 1j symbol [81], i.e., a (degenerate) rank-3 tensor which combines two state spaces, q and its dualq into a scalar singlet state. Upon absorbing U and U † on opposite sides with the neighboring tensors, effectively, the arrow on the center bond has been reverted. The singlet index (dashed line) can be omitted in the end.

Arrow convention
When exploiting symmetries, every index represents a state space with a particular symmetry multiplet. Now when fusing state spaces across tensors, this naturally introduces the concept of state spaces that 'enter' a given tensor, and state spaces that 'emerge' from it. For tensors this implies in a graphical depiction that one has to distinguish ingoing and outgoing legs, i.e., every leg acquires a direction, specified by an arrow [e.g., see Fig. 4]. Mathematically, this is equivalent to distinguishing between co-and contravariant indices (a notational convention not used here) [81]. The action of raising or lowering indices then corresponds to reverting arrows, as schematically depicted in Fig. 5. This is an operation that represents gauge-transformations of tensor network states, leaving the physical properties of the individual states unaffected [82]. Importantly, within a tensor network state, a summed over, i.e., contracted index connecting a pair of tensors, where it is outgoing from one tensor, and incoming to the other. When setting up a symmetric iPEPS representation, we therefore have to choose an "arrow convention" for all iPEPS tensors. On a square lattice, when a single M tensor with four virtual bond indices tiles an entire 2D iPEPS, this necessarily implies that two virtual bond indices must be ingoing and two outgoing [cf. Fig. 4(b)].
For compactness and readability of the code, we want to minimize the number of steps in the algorithm that involve reverting arrows as in Fig. 5. To this end, we establish the arrow convention for M tensors as well as the corner transfer matrices as shown in Fig. 6. Thus the quantum labels on all virtual bonds always "flow" from the upper left to the lower right corner of the tensor network.

Efficient contractions
The standard procedure when contracting tensors in the absence of any symmetries is to reshape a contraction into an effective matrix product [83] where efficient libraries can be utilized. That is, for any tensor in a contraction, the indices that are contracted as well as the ones that are kept, are grouped, i.e., permuted into order, and then fused into hyperindices. This strategy also carries over when implementing symmetries, abelian and non-abelian alike. In principle, one has the option of matching symmetry sectors first, and then do the contractions for every match in the above spirit. However, for abelian and non-abelian symmetries alike, this would cause a significant proliferation of symmetry sectors with increasing rank already for an individual tensor, yet also for matching symmetry combinations when (a) (b) Figure 6: Arrow convention for M tensor (panel a) as they enter inside the corner transfer matrix (CTM) setup in (panel b). The latter combines bra and ket state as required for the minimization of the total energy E = ψ|H|ψ when truncating [57]. From the perspective of an individual site, this "double layer" tensor network translates into M | . . . |M . For this, note that we have reverted the bond indices of the 'bra-tensor' M →M such that they point in the same direction as the corresponding indices of M . Only then one can fuse the 'double bond index' into a single fat index. This greatly simplifies many fusion steps during the CTM procedure. The black diamonds in (a) indicate fermionic swap gates [44,71].
contracting a pair of tensors. Roughly, if there are on average m symmetry sectors associated with each of the r legs of a given rank-r tensor, one may expect up to m r possible symmetry combinations. The situation is worse still for non-abelian symmetries, where the tensor products of two multiplets can give rise to many different multiplets. Therefore a computation of a contraction is slowed down by (exponentially) many combinations with increasing rank of the involved tensors. Yet the individual contractions of matching symmetry sectors often involve only small effective block matrices. As a consequence, the above strategy becomes prohibitively inefficient strongly with increasing rank of the tensors. For an efficient way to proceed, one therefore first needs to merge indices into hyperindices (respecting fusion rules in the presence of non-abelian symmetries), and then do the contraction. An efficient non-abelian iPEPS implementation therefore must fuse indices in contractions prior to the actual contraction, while being aware that only legs that point in the same direction can be fused [e.g. see Fig. 6]. After the contraction, the remaining open indices must be given back their original structure. In the presence of non-abelian symmetries, the fusion is effectively taken care of by an additional contraction with unitary tensors, which need to be reapplied on the open indices. This is an extra layer of complication that concerns each and every contraction that involve tensors with rank r 4.
To be specific, we consider the the two-band Hubbard model (discussed in Sec. 7.1 below) with Z 2 ⊗ SU(2) spin ⊗ SU(2) orb , retaining D * = 6 multiplets on each bond. Already the M tensors of rank 5 are complicated objects. However, the numerically most demanding tensors appear during the CTM coarse graining. Here, we typically have to deal with rank-6 and rank-7 tensors, and it depends strongly on the implementation details whether the CTM procedure is still feasible. Let us focus on a typical rank-6 tensor appearing several times in a CTM step, obtained by contracting the following TN diagram, To reduce the rank of this tensor, it is possible to fuse the three indices pointing to the left and to the bottom, respectively. This yields the rank-2 matrix representation, with size 28, 000 × 28, 000 on the multiplet level. Being a rank-2 object, it must be blockdiagonal. The matrix only contains 37 symmetry blocks of larger size (on average, each block consists of 750 2 coefficients). Remarkably, the reduced matrix elements of the latter matrix require slightly less memory (350 MB) than those of the original rank-6 tensor. To a very minor extent, this may be attributed to overhead for organizing the long lists of symmetry blocks in the tensor. More importantly, the rank-6 tensor has significant outer multiplicity [30], which is absent in the rank-2 tensor. Most importantly, however, this simple comparison strongly suggests that the symmetry blocks in the rank-2 matrix representation are densely populated by the entries in the rank-6 tensor. Now how do the two different representations perform in terms of contraction speed? To compare them, we consider the next step of the CTM scheme, which requires forming the upper part of the environment, by contraction the following tensor network, both in the rank-6 and rank-2 representation The speed of the contraction vastly differs. Contracting two rank-2 objects results in 37 contractions of the block-diagonal rank-2 objects, which is performed with QSpace [30] in about one second of CPU time. In contrast, we had to terminate the contraction of the rank-6 tensors after four hours (!) of calculation time. In the latter case, 10 9 individual contractions allowed by symmetry. Although the effort for each of these contractions is minimal, having to process their vast number step by step leads to a significant overhead, and thus to a drastic decrease in numerical efficiency.

Examples
Our main goal here is to illustrate the potential of non-abelian iPEPS, discussing both the benefits and limitations of exploiting non-abelian symmetries, by showing exemplary results for symmetric two and three band Hubbard models. A full analysis of the intricate physics of each of these systems goes beyond the scope of this work and is left for future studies. Whereas the one-band Hubbard model already features important aspects of strongly correlated materials, such as the Mott insulator transition or the emergence of d-wave superconducting pairing, for a multi-band Hubbard model a number of fascinating phenomena emerge from the interplay of different electron orbitals which cannot be captured by an effective model with a single band. Both intra-atomic Coulomb exchange or the presence of crystal field splitting can give rise to a number of intriguing effects, such as the existence of an orbital-selective Mott insulating phase, where only one orbital becomes insulating while the other retains its metallic properties [84][85][86][87][88]. In order to understand this physics from a theoretical perspective, it is clearly necessary to go beyond a single-band system and study multi-band generalizations of the Hubbard model.
In addition to perspectives in strongly correlated materials, multi-band high-symmetry models, such as SU(N ) Hubbard models or related Heisenberg models give rise to fascinating new types of quantum states including exotic magnetically ordered phases. These are not only of general academic interest but recently have also become experimentally accessible in the context of cold atoms [89,90].
The exponentially large quantum many-body Hilbert space and the ensuing strong electronic correlations pose an extreme challenge to numerical approaches. Besides, one also has to deal with an enlarged parameter space that substantially adds to the complexity of these systems. For instance, the spinful symmetric two-band Hubbard model with only onsite interactions already contains additional parameters such as Hund's interaction energy in comparison to its single-band version. Therefore, wide regions of the phase diagram of these models remain blank and there is a compelling need for developing numerical methods that can reliably deal with such systems in an unbiased way.

Spinful two-band Hubbard model
In this section, we demonstrate that fermionic iPEPS enhanced with non-abelian symmetries is a valuable ansatz to deal with symmetric complex multi-band systems in 2D. As a first example, we consider the repulsive Hubbard model with M = 2 bands and spin and orbital degeneracy on the square lattice. Specifically, we consider the Hamiltonian [91], with hopping amplitude t between nearest-neighbor sites ij , spin index σ ∈ {↑, ↓}, orbital index m = 1, . . . , M , and site occupationn i ≡ mσn imσ . We take t := 1 as unit of energy, throughout. We tune the average occupation via the chemical potential µ in Eq. (76b). But whem computing the ground state energies, we compute the expectation values of the Hamiltonian in Eq. (76a), otherwise. The chemical potential in Eq. (76b) was offset by µ 0 such that µ = 0 corresponds to half-filling in the presence of a finite onsite Coulomb energy U . Overall, the Hamiltonian in (76) features both an SU(2) spin and SU(2) orbital symmetry, which we exploit in our iPEPS implementation. We ignore local Hund's coupling. Therefore spin and orbital index become interchangeable, resulting in 4 equivalent flavors. Overall, this actually leads to an enlarged SU(4) symmetry of 4 spinless flavors (not exploited here). For the ground state of a given average filling n = n(µ), set via Eq. (76b), we define the ground state energy per site e 0 , the bond energy e ij 0 , and the generalized spin-singlet pairing amplitude ∆ ij as the expectation values We study the Hamiltonian (76) for finite hole hoping by tuning µ ≤ 0 (which is equivalent by particle-hole transformation to particle doping µ ≥ 0). To our knowledge, the phase diagram of this system is largely unknown away from integer filling. However, some interesting results are available for certain points in parameter space.
At half-filling n = 2, several studies based on a sign-problem-free determinant quantum Monte-Carlo method addressed the magnetic properties of the model [92][93][94]. Their findings support the existence of long-ranged antiferromagnetic (AF) order for larger interactions U ≥ 2 [93]. Interestingly, the AF order does not show a monotonic behavior with respect to U ; instead, it exhibits a maximum around U ≈ 8 and then decreases again towards larger interactions strengths. Whether or not the long-ranged AF order persists in the limit U → ∞ remains an open question. A previous QMC study of the corresponding Heisenberg model found no AF order but potentially a gapless spin-liquid phase in this regime [95]. Another recent work based on variational QMC [96] addressed the Mott transition of the half-filled Hubbard model, finding a critical coupling U c ≈ 11 for the case of degenerate bands (their ansatz is rather biased, however, as it only accounts for a non-magnetic solutions).
In this section, we present a first step towards a systematic iPEPS study of the symmetric two-band Hubbard model (76) that, in addition to half-and quarter filling, also investigates arbitrary doping regimes. The main challenge for iPEPS in the context of such a two-band model is the strongly enlarged local Hilbert space. In total, we need to deal with four different flavors of fermions (2 spins × 2 orbitals) resulting in a local state space dimension d = 16 per site, larger by a factor of four relative to the d = 4 in the one-band version.
To treat systems with a large local state space within iPEPS (or other TN approaches) one can follow two different strategies, as illustrated in Fig. 7: (a) Either one keeps a lattice as a single unit with a large local state space (and hence preserves its symmetry), or (b) artificially splits it, for the sake of the iPEPS simulation, into smaller sublattices. Strategy (a) is hardly feasible for standard iPEPS techniques, even when incorporating all abelian symmetries of the system. For (b), a natural choice is to split the lattice into two interleaved sublattices, one for each orbital. The drawback, besides an artificially broken lattice symmetry, is that iPEPS then has to handle longer-ranged interactions and correlations in its ansatz. This necessitates swap gates in the implementation of imaginary time evolution, which generates an additional source of error.
Here we follow strategy (a) because this preserves the orbital SU(2) symmetry, where we can fully exploit all available non-abelian symmetry. Specifically, with finite doping in mind, we incorporate Z 2 ⊗ SU(2) spin ⊗ SU(2) orb symmetry. This way, the local state space with d = 16 is reduced to an effective multiplet dimension d * = 6. At the same time this enables us to retain up to D * = 6 multiplets on each virtual bond, which corresponds to an effective bond dimension of D = 24 states [cf. Table. 1]. This enables us to run simple-update simulations for a wide regime of parameters, the results of which are presented in the following.
We start with the half-filled case n = 2, i.e., µ = 0 in (76b), to benchmark against existing determinant projector QMC data [98]. The results of this analysis are summarized in Fig. 8. Panels (a,b) show the normalized ground-state energies per site versus the interaction strength U obtained from a simple-update iPEPS simulation on a 2 × 2 unit cell. The various bond dimensions D * = 3, 4, 5, 6 in Fig. 8(a,b) are made up of dominant multiplets which emerge dynamically from the iPEPS simulations for each D * . They are listed in Table 1, for completeness. The extrapolated energies for 1/(D * ) 2 → 0 are determined by polynomial fits as depicted in Fig. 8(c). The convergence of our data with respect to the environmental  bond dimension χ * are shown in Fig. 8(d). Note that QMC simulates finite-size systems with periodic BC, hence its ground state energy, specifically so in Fig. 8(a), is expected to still increase with increasing system size, as it converges from below. Nevertheless, we find good agreement, to within 1%, of our extrapolated energies with the QMC results, confirming the reliability of our approach.
At half filling, following the work of Ref. [93], we expect the presence of long-ranged AF order for all values of U considered in Fig. 8. This is also supported by the Mott plateau seen in Fig. 9(b,d) at half-filling. Since by construction our iPEPS is SU(2) spin invariant, however, a direct measurement of the local magnetization is not possible. Nevertheless, we expect that the symmetry-breaking AF order still to be present, yet symmetrized and hence only accessible via static spin-spin correlations over longer distances. In the context of symmetric iPEPS simulations for a spin-1 2 Heisenberg model, we have observed (not shown) that the two-fold degeneracy in the AF ground state manifests itself as a spontaneous formation of row or column stripes which, in agreement with the AF state itself, breaks translational symmetry within the unit cell. Interestingly, we here also observe such an effect in the iPEPS wavefunctions in the 2-band Hubbard model as shown in Figs. 8(e,f). For U = 4 [ Fig. 8(e)], we clearly observe that two out of eight independent bonds in the unit cell carry a substantially reduced energy. This suggests (at least) a 4-fold degenerate ground state. Based on this loose connection, we will refer to the symmetry-broken regime as the AF regime where the strength of the spatial symmetry breaking in our simulations may roughly correlate with the AF magnetic moment. For U = 10 [ Fig. 8(f)] the "AF order" is weaker than at U = 4, consistent with the finding of Ref. [93] that the strength of AF order decrease for U → ∞. Ultimately, of course, the precise AF nature needs to be studied via long-ranged spin-spin correlations. This is left for future work.
Next we turn to the case of arbitrary filling away from half-filling, which is equally accessible to iPEPS, but not to QMC. We focus on small to intermediate interactions, U = 4 and U = 8. By symmetry, it is sufficient to consider only the case of finite hole doping, δ ≡ 2 − n > 0, i.e., n < 2. For the 2-band case, this regime has not been explored in detail by other methods so far. Figure 9 summarizes our iPEPS results as a function of filling, tuned by means of a chemical potential [cf. Eq. (76b)]. Figures 9(a,c) show the ground-state energy per site, e 0 /U , as a function of δ for D * = 5 and 6.
The dependence of the filling n = 2−δ on the chemical potential is shown in Figs. 9(b,d). For either U , the systems are in the AF regime for zero or small doping δ, as inferred from the symmetry-broken states depicted in Figs. 9(1,4). For U = 4 we find an energy minimum around δ 1.2. In this regime, we still observe a significant dependence of the energy on bond dimension D * , hinting at a strongly entangled ground state. For U = 8, for the same range in chemical potential [ Fig. 9(b,d)], we reach a smaller range in doping [ Fig. 9(c)]. Since here the interaction strength is comparable to the non-interacting bandwidth is W = 8, we also see a Mott plateau at n = 1 [ Fig. 9(d)] that is absent for U = 4 [ Fig. 9(b)] [99]. At zero filling, i.e, δ = 2, the ground state energy is zero, i.e. with Eq. (77a), e 0 (n = 0) = 0 [similar as in Fig. 9(a)] irrespective of the strength of U . Therefore the data in Fig. 9(c), already turning negative, will necessarily also reach a minimum somewhere in the regime for 1 < δ < 2. In addition to antiferromagnetism, we also expect superconducting order to play an important role in the two-band Hubbard model at finite hole doping. To check for the presence of d-wave superconductivity, we measure a generalized singlet-pairing amplitude ∆ ij [Eq. (77c)]. The results for different values of U and δ are displayed in Fig. 10. We find that, indeed, superconducting order is present for the entire doping range 0 < δ < 1 for all considered interaction strengths. Two effects that will require further attention in the future, are the suppression of superconductivity at δ = 1, and the fact that ∆ decreases with increasing interaction strength. Both appear justified on intuitive grounds, however: Charge fluctuations are suppressed with increasing interaction strength, specifically so at integer filling. Moreover, for filling n 1, local double occupancy is strongly suppressed for sizable U , yet double occupancy is required for finite ∆ to start with. We also observe strong inhomogeneity of ∆ ij across different bonds. This may indicate a tendency toward spontaneous symmetry breaking of the orbital symmetry that is conserved by construction in our iPEPS implementation, or to the fact that the actual ground state breaks translational symmetry in a different way. Simulations on different unit-cell geometries are needed to shed light on this issue.
In conclusion, we have presented first fermionic iPEPS simulations of the two-band Hubbard model, which incorporates spin-and orbital SU(2) symmetry explicitly in the TN ansatz. The excellent agreement of our results found at half-filling with QMC data encouraged us to explore also the hole-doped regime, where our initial results uncover a number of intriguing features. Going forward, much work remains to be done to fully understand the guiding mechanisms and phases in this regime. This includes the study of longer-ranged spin-spin correlators, the comparison to simulations on different unit cells and unveiling the dependencies of various quantities such as energy and d-wave pairing as a function of interaction strength and doping more carefully. Since in the present model spin and orbital flavors are equivalent (e.g., there is no onsite Hund's coupling J), the efficiency of iPEPS could be further enhanced by exploiting the full SU(4) flavor symmetry present in the Hamiltonian within QSpace [30]. fter fully understanding the phase diagram in this parameter regime, it will be highly interesting to study the effects of finite Hund's coupling J on the emergence of superconductivity and other competing orders. Moreover, it would also be worthwhile to analyze whether abelian iPEPS simulations are numerically feasible in a modified setup involving separate sublattice for the two bands (c.f. Fig. 7). This would yield a different perspective on the ground-state properties of the model, especially in the context of spontaneous symmetry breaking.

Three-flavor Hubbard model
In addition to basic SU(2) symmetries, QSpace [30] also provides a convenient framework for the incorporation of more complex non-abelian symmetries such as SU(N > 2). To explore the potential of this feature within fermionic iPEPS, we consider a symmetric spinless threeflavor Hubbard model where we fully exploit the SU(3) flavor symmetry. Its Hamiltonian has the same form as in (76), except that the composite index (m, σ) is replaced by a single flavor index, m = 1, 2, 3. Choosing µ 0 = U here, this again also ensures that µ = 0 corresponds to half-filling. In contrast to the spinful case, however, the fact that N = 3 is odd implies that half-filling is metallic, unless symmetry broken (see below). Only integer filling results in Mott or Heisenberg physics for larger U [99].
Although systems with a total of three symmetric flavors are not naturally realized by the atomic configuration of any real electronic material, SU(N > 2) realizations of the fermionic Hubbard model currently attract a lot of attention in the context of cold-atom experiments based on alkaline earth-like atoms such as ytterbium [89,90], where such systems have become directly accessible in highly controlled setups. SU(N ) symmetric systems feature a number of exotic phases and magnetic properties, which are of interest from a condensed matter perspective. In addition, they are also relevant for other fields, for example in the context of studying lattice gauge theories for quantum chromodynamics [100].
So far, little is known for the spinless SU(3) Hubbard model on the 2D square lattice. Some work has been done for the weak to intermediate coupling limit, where one expects the emergence of a flavor density wave breaking the translational symmetry of the lattice [101]. At half filling in particular, it is expected that two flavors occupy the same lattice site whereas neighboring sites exclusively host the third flavor, such that a bipartite twosublattice structure emerges. This is motivated by the following consideration: a site with single occupancy transforms in the defining three-dimensional representation 3 of SU(3), whereas a doubly filled site is a fully filled site with one hole, which transforms in the conjugate representation3. Within the symmetry broken setting above then, neighboring sites could, in principle, bind into a singlet configuration.
At integer filling n = 1 and in the strong coupling limit, the model can be mapped onto an SU(3) Heisenberg model in the defining 3 representation (physically equivalent, for n = 2, this becomes the dual3). This is believed to favor a three-sublattice order with finite magnetic moments [102]. On intuitive grounds, note that for an SU(3) Heisenberg model in the 3 representation, a multiple of three sites is required to form a singlet. This is not naturally suited to the square lattice, and hence results in frustration, eventually giving rise to a three-sublattice order.
We have again reduced the numerical complexity of our model system by fully incorporating the non-abelian SU(3) symmetry in the fermionic iPEPS ansatz. To this end, the full local fermionic state space, d = 8, can be reduced to d * = 4 multiplets. We then performed simple-update calculations with a multiplet bond dimensions up to D * = 6. Again, the symmetry sectors are dynamically adapted during the optimization. We illustrate examples of the relevant multiplet contributions encountered in iPEPS simulations with varying D * at half filling in Table 2.
We performed iPEPS simulations on both 2 × 2 and 3 × 3 unit cells with two and three different tensors, respectively, to slightly bias the emergence of the two-and three-sublattice order expected from the considerations discussed above. Any tendency towards spontaneous symmetry breaking of SU(3), are, however, symmetrized by our setup. Figures 11(a,b) summarize our results for the ground-state energy per site, e 0 /U , as a function of filling, n , both at weak coupling U = 1 and stronger coupling U = 6. In either case, the simulations on both unit-cell geometries surprisingly yield very compatible ground-state energies.
Only in the half-filled case at U = 1 [ Fig. 11(a)] the 2 × 2 cluster gives a slightly lower ground-state energy in comparison to its counterpart on the 3 × 3 unit cell by about 3%. Interestingly, in both cases (wavefunction 1 and 2) we observe a strong translational symmetry breaking in the form of modulation of the occupancy on different sites. This is in qualitative agreement with Ref. [101], which predicts a phase with two-sublattice order with single and double occupancy on neighboring sites. This is almost realized by wavefunction 1 shown in Fig. 11, with occupancies N ≈ 1.19 and N ≈ 1.81 on neighboring sites. For the 2 × 2 cluster, this also goes hand in hand with a pinning of the occupation at average n = 1.5 [ Fig. 11(b,d)], suggesting that the system energetically prefers a translationlly symmetry broken state. The density modulation are substantially suppressed on the 3 × 3 unit cell, where we find two sites having the same occupancy N ≈ 1.58 while slightly fewer particles occupy the third site  Table 2: Typical multiplet configurations on the auxiliary bonds obtained from SU(3) symmetric iPEPS simulations on the square lattice Hubbard model with three equivalent flavors. The different rows show the results for increasing multiplet bond dimension D * (left column) at half filling. The SU(3) multiplet labels are in Dynkin form, where we adopt the compact QSpace [30] notation. For the center column we use the same notation as in Table 1.
N ≈ 1.32 at essentially no pinning of the average occupation when changing the chemical potential. The density-wave modulation disappears both in the case when the occupation significantly deviates from the half-filled case, and also for stronger interaction, as illustrated by the wave functions 3, 4, 5, and 6 in Fig. 11. As already pointed out with Figs. 11(b,d), the occupancy is clearly not a smooth increasing function of the chemical potential, which drives the filling. While the 2 × 2 unit cell shows plateaus -and hence favors -half-filling, this is not the case for the 3 × 3 unit cells. The situation is completely reverse, however, at integer filling n = 2 as seen in Figs. 11(b,d). At this filling, a 2 × 2 unit cell cannot be in a singlet configuration, but has residual spin. Hence there is a certain degree of frustration in this setup. By contrast, the 3 × 3 unit cell can host a singlet configuration at n = 2. Interestingly, the 3 × 3 unit cell already shows charge locking for the case of rather smaller U = 1, which may be due to frustration in the present case. Eventually, however, this will require a more thorough analysis based on an extrapolation of D * → ∞.
Locking of charge at integer filling is typically a signature of Mott physics, which is to some extent also expected in the three-flavor model at n = 2 [99,103]. However, locking may also occur if the occupation inside an enlarged unit cell changes by integers. This effect may be physical, e.g., as suggested above, in that 3 and3 bind into singlets, which occurs at half-filling. The effect may also be artificial, in which case it depends on numerical details and should become less pronounced with increasing D * . This can be observed for the plateau at filling n = 1.5 (data not shown).
In summary, nevertheless, based on the earlier arguments we do expect that in the present case the 2 × 2 unit cell is more suitable for the half-filled case, whereas the 3 × 3 unit cell is a better fit for integer filling. Furthermore, it should be possible to reveal additional information about the flavor order by studying (i) longer-ranged correlators and (ii) switching off the SU(3) in favor of two abelian U(1) symmetries and explicitly allowing spontaneous breaking of the flavor symmetry.

Conclusion
In this review, we attempt to give an overview of the rapid developments of iPEPS, which has reached a remarkable sophistication over the last few years. A large part of the review, addressed to newcomers to the field, is dedicated to to two widely used ground state search methods: simple-update and full-update. Simple-update is very competitive in run-time, while full-update yields highly accurate results that are important to characterize ground states of correlated electrons. Besides that, we present a comprehensive technical detail about using non-abelian symmetry in iPEPS, where a seemly formidable computational overhead can be avoided by careful implementation. Two non-trivial examples, the two-band Hubbard model and the three-flavor Hubbard model, are included to show how exploiting symmetry can be useful. All in all, we hope that this review will motivate more efforts to the development of 2D tensor network algorithms, which have the potential for achieving crucial for advances in computational studies of correlated electrons.

Funding information
The Deutsche Forschungsgemeinschaft supported BB, JWL and JvD through the Excellence Cluster "Nanosystems Initiative Munich" and the Excellence Strategy-