The Tensor Networks Anthology: Simulation techniques for many-body quantum lattice systems

We present a compendium of numerical simulation techniques, based on tensor network methods, aiming to address problems of many-body quantum mechanics on a classical computer. The core setting of this anthology are lattice problems in low spatial dimension at finite size, a physical scenario where tensor network methods, both Density Matrix Renormalization Group and beyond, have long proven to be winning strategies. Here we explore in detail the numerical frameworks and methods employed to deal with low-dimension physical setups, from a computational physics perspective. We focus on symmetries and closed-system simulations in arbitrary boundary conditions, while discussing the numerical data structures and linear algebra manipulation routines involved, which form the core libraries of any tensor network code. At a higher level, we put the spotlight on loop-free network geometries, discussing their advantages, and presenting in detail algorithms to simulate low-energy equilibrium states. Accompanied by discussions of data structures, numerical techniques and performance, this anthology serves as a programmer's companion, as well as a self-contained introduction and review of the basic and selected advanced concepts in tensor networks, including examples of their applications.


Introduction
The understanding of quantum many-body (QMB) mechanics [1] is undoubtedly one of the main scientific targets pursued by modern physics. Discovering, characterizing, and finally engineering novel, exotic phases of quantum matter [2] will be the key for the development of quantum technologies [3], quantum computation and information platforms [4], and even lead to an enhanced, non-perturbative understanding of the building blocks of reality, including high energy settings [5,6]. The research development is hindered by the inherent difficulty of such problems, by the extremely small amount of exactly solvable models [7] and the limitations posed by the various approximate approaches that have been proposed so far [8].
From the computational point of view, the obstacle for addressing quantum manybody problems is the exponential growth in the dimension the configurations space with the number of elementary constituents of the system (number of quantum particles, lattice sites, and so forth). Exact diagonalization techniques [9] are thus restricted to small system sizes, and inapt to capture the thermodynamical limit of QMB systems, which characterizes the macroscopic phases of matter.
Approaches relying on a semi-classical treatment of the quantum degrees of freedom [10], such as mean field techniques e.g. in the form of the Hartree-Fock method [11], have been applied with various degrees of success. It is well known, however, that while they are an accurate treatment in high spatial dimensions, they suffer in low dimensions [12], especially in 1D where entanglement and quantum fluctuations are so important that it is not even possible to establish true long-range order through continuous symmetry breaking [13].
On the other hand, stochastic methods for simulating QMB systems at equilibrium, such as Monte Carlo techniques, have been extremely successful [14]. They have the advantage of capturing both quantum and statistical content of the system by reconstructing the partition functions of a model, and have been adopted on a wide scale. Difficulties arise when the stochastic quasi-probability distribution to be reconstructed is non-positive, or even complex (sign problem) [15,16].
An alternative approach to address these intrinsic quantum many-body problems is the construction of a computationally manageable variational ansatz, which is capable of capturing the many-body properties of a system while being as unbiased as possible. Tensor Network (TN) states fulfill this criterion [17][18][19][20][21]: By definition, their computational complexity is directly controlled by (or controls) their entanglement. In turn, TN states can span the whole quantum state manifold constrained by such entanglement bounds [22,23]. Consequently, TNs are an efficient representation in those cases where the target QMB state exhibits sufficiently low entanglement. Indeed, tight constraints on the entanglement content of the low-energy states, the so-called area laws of entanglement, have been proven for many physical systems of interest, making them addressable with this ansatz [24][25][26][27][28]. TN states have the advantage of being extremely flexible and numerically efficient. Several classes of tensor networks, displaying various geometries and topologies, have been introduced over the years, and despite their differences, they share a vast common ground. Above all, the fact that any type of QMB simulation relies heavily on linear algebra operations and manipulation at the level of tensors [29].
Accordingly, in this anthology we will provide a detailed insight into the numerical algebraic operations shared by most tensor network methods, highlighting useful data structures as well as common techniques for computational speed-up and information storage, manipulation and compression. As the focus of the manuscript is on numerical encoding and computational aspects of TN structures, it does not contain an extended review of physical results or a pedagogical introduction to the field of TNs in general. For those two aspects, we refer the reader to the excellent reviews in Refs. [17,30].
The anthology is thus intended both as a review, as it collects known methods and strategies for many-body simulations with TNs, and as a programmer's manual, since it discusses the necessary constructs and routines in a programming languageindependent fashion. The anthology can be also used as a basis for teaching a technical introductory course on TNs. We focus mostly on those tensor network techniques which have been previously published by some of the authors. However, we also introduce some novel strategies for improved simulation: As a highlight, we discuss the single-tensor update with subspace expansion, a special version of the ground state search algorithm for loop-free tensor networks, which exhibits high performance and is full compatible with symmetric TN architectures.

Structure of the Anthology
The anthology is organized as follows: In Sec. 1 we review shortly the historical background and some general theoretical concepts related to tensor network states. In Sec. 2 we characterize the data structures which build the TN state, and we list the standard linear algebra tools employed for common tasks. A brief review on embedding symmetries into the TN design is provided in Sec. 3, alongside numerical techniques to exploit the (Abelian) symmetry content for computational speed-up and canonical targeting. In Sec. 4 we introduce tensor network structures without loops (closed cycles in the network geometry) and explain how we can explore possible gauges to give these states favorable features. A generalization of the Density Matrix Renormalization Group (DMRG) applying to tensor networks of this type is detailed in Sec. 5, as well as instructions for practical realizations of the algorithm. We draw our conclusions and sketch the outlook for further research in Sec. 6. A diagrammatic representation of the arrangement of the anthology contents is shown in Fig. 1, where the logical dependencies among the topics are highlighted, and depicted as embedded boxes. This diagram, in turn, reflects the hierarchy among the modules of an ideal TN programming library: Using this hierarchy, the outmost component (ground state search) can be easily replaced with several other physically relevant tasks, such as real-time dynamics (see also Sec. 1.5).  Figure 1: Organization of the anthology. In an embedded-boxes structure, the introduction is followed by a self-consistent treatment of elementary tensors and tensoroperations plus their extension in presence of symmetries. On this foundation we build up and investigate (loop-free) tensor networks and algorithms. The parts about symmetries are optional and can be skipped. As this manuscript is intended as a practical anthology, the structure is chosen such that it can be directly mapped into numerical libraries, organized in the same hierarchy, starting from a kernel of numerical linear-algebra routines which are widely available and thus not covered in this anthology.

Short history of Tensor Networks for Quantum Many-Body Problems
The Density Matrix Renormalization Group (DMRG), introduced in the early '90s [31,32], was developed as a reformulation of the numerical renormalization group [33,34], where now the renormalized, effective, degrees of freedom were identified in the density matrix eigenspaces instead of in the lowest energy manifold of real space subsystems [35]. Eventually it was realized that DMRG can be recast as a minimization algorithm over the class of finitely-correlated states, which admit a formulation as Matrix Product States (MPS) [23,36,37]. In time it became clear that he reason for the success of DMRG is the limited entanglement present in ground states of shortrange interacting Hamiltonians, as it obeys the area laws of entanglement [26,[38][39][40][41][42], and such scaling is well captured by MPS for one-dimensional systems. On the one hand, such reformulation into MPS language [22,30] opened a clear path for substantial improvements and expansions of the DMRG technique. Prominent examples are: exploration of finite temperatures [43], direct addressing of the thermodynamic limit [44,45], and alternate paths to out-of-equilibrium evolution [46,47], including direct access to the density of states and spectral properties [48,49]. On the other hand, the discovery encouraged the engineering of new classes of tailored variational wave functions with the common desirable feature of storing the QMB information in compact tensors, i.e. easy to handle numerical objects, and controlling the entanglement through a network of virtual connections [17]. Noteworthy classes of tensor networks that have been proposed towards the quantum many-body problem are: Projected Entangled Pair States (PEPS) [50,51], weighted graph states [52], the Multiscale Entanglement Renormalization Ansatz (MERA) [53][54][55], Branching MERA [56], Tree Tensor Networks (TTN) [57][58][59][60], entangled-plaquette states [61], string-bond states [62], tensor networks with graph enhancement [63,64] and continuous MPS [65]. The numerical tools we will introduce in the next two sections indeed apply to most of these TN classes.

Tensor Network States in a Nutshell
A tensor network state is a tailored quantum many-body wave function ansatz. For a discrete system with N components, the QMB state can be written where {|i s } s is a canonical basis of the subsystem s. The complex coefficients of this Hilbert space wave function, T i 1 ...i N , determine the state uniquely. The TN prescription expresses the amplitudes tensor as the contraction of a set of smaller tensors T [q] over auxiliary indices (sometimes also called virtual indices), namely for a network with Q nodes and L links. Each link connects and is shared by two tensors only, Fig. 2(b). Here a tensor T [q] can possess any number of physical (a) (b) T [1] T [4] T [3] T [2] i 1 i 2 i 3 , which are connected through L = 5 auxiliary links. The original tensor object is obtained once again by contracting tensors over shared indices, which are represented by the links connecting them. For a detailed introduction to the graphical notation, see Sec. 2.
indices {i} q = {i s : s is physical link at node q} as well as any number of auxiliary indices {α} q = {α η : η is auxiliary link at node q}. However, to make the ansatz computationally efficient for QMB states, the number of links (indices) connected to it should not scale with the system size N . Similarly, the auxiliary dimensions D η = #{α η } (where the symbol # denotes the cardinality of a set) should be non-scaling quantities. Conversely, the number of tensors Q will scale with N . Throughout this anthology, we use a graphical, compact notation to express tensors and tensor networks, and the typical operations performed on them. These diagrams allow us to recast most of the equations in a form which is less verbose, often clearer, and mnemonically convenient, such as in Fig. 2 which represents Eq. (2). More details about the diagrammatic representation of TNs will be provided in Sec. 2 and Sec. 3.
It is easy to show [66] that the entanglement entropy E of the TN state |Ψ QMB under system bipartitions satisfies rigorous bounds, which depend on the TN geometry. In fact, the TN state is a wave function ansatz whose only constraint is a direct control of its entanglement properties, and is thus considered an only slightly biased ansatz. For the same reason, it is typical to express TN states in real space, where the N components are the lattice sites: it is within this setup that one can exploit the area laws and successfully work with TN states having low entanglement content [26,[38][39][40][41][42]. Tensor network state descriptions in momentum space representation have also been proposed and implemented in simulations [67][68][69][70][71].

Many-Body Statistics
One of the major features of the tensor network ansatz is its flexibility: In fact, it can be adapted for several types of local degrees of freedom and quantum statistics. The typical scenarios dealt with are the following: (i) Spin systems -Spin lattices are the natural setting for tensor networks. The quantum spins sit on a discrete lattice and their local Hilbert space {|i s } has a finite dimension d fixed by the problem under study; e.g. for lattice sites with spin l: d = 2l + 1.
(ii) Bosonic lattice -Boson commutation rules are analogous to the ones for spins, but normally, bosons do not have a compact representation space for their local degree of freedom. Unless some inherent mechanism of the model makes the bosons dhardcore, an artificial cutoff of the local dimension d must be introduced. This is, however, usually not dramatic, since typically the repulsive interactions make highly occupied states very improbable and the consistency of results can be checked by enlarging d slightly.
(iii) Fermionic lattice -The standard way of dealing with fermionic models is mapping them into a spin system via a Jordan-Wigner (JW) transformation [72]. Doing so is very convenient for short-range interacting (parity-preserving) 1D models, as the JW-mapping preserves the short-range nature. Other techniques to encode the fermionic exchange statistics directly in the tensor network contraction rules are known [73,74], but it is argued that these methods are more useful in 2D, where the JW-mapping does not preserve interaction range.
(iv) Hybrid lattices -Tensor networks can also handle hybrid theories which include both bosons and fermions. In this scenario, the two species usually live in two distinct sublattices. Lattice gauge systems such as the lattice Yang-Mills theories are a textbook example of this setting [75]: In these models, matter can be fermionic, while the gauge field is typically bosonic [76,77].
(v) Distinguishable particles -When quantum particles are distinguishable and discrete, they can be thought as the lattice sites, and both their motional and internal degrees of freedom may be described in a first quantization fashion. This scenario appears, for instance, when studying phononic non-linearities in trapped ion systems [78]. In this representation, the motional modes of each particle are often truncated to a compact dimension by selecting d orbitals {|i } to act as local canonical basis, e.g. those with lowest local energy, or the solutions of a mean field approach [79,80].
It is important to mention that the TN ansatz can be extended to encompass variational states for quantum field theories, for instance via the continuous-MPS formalism [65,81]. However, this scenario lies beyond the purpose of this paper and will not be discussed here.

Typically Addressed Problems
Tensor network states are routinely employed to tackle the following physical problems on 1D lattices: (i) Ground states of interacting Hamiltonians, usually (but not necessarily) built on two-body interactions. For the model to be manifestly 1D, the interaction has to be either finite-range or decrease with range sufficiently fast [82]. Ground state search algorithms based on TN technology can usually be extended to target other low energy eigenstates [83].
(ii) Non-equilibrium dynamics of arbitrary QMB states under unitary Hamiltonian evolution. The Hamiltonian itself can be time-dependent, thus encompassing quenches, quasi-adiabatic regimes, and even optimal control [84]. The most common strategies to perform such dynamics on the TN states rely either on a Suzuki-Trotter decomposition of the unitary evolution (TEBD) [46] or on a timedependent variational principle (TDVP) on the TN class [47], however, alternative routes are known [85][86][87][88]. Clearly, these techniques perform best when describing processes with small entanglement growth: Either short time scale quenches, quasiadiabatic quenches [89,90], or controlled dynamics [91]. Long time scale quenches are also accessible for small system sizes [92].
(iii) Annealing and Gibbs states. Strategies to achieve imaginary-time evolution are analogous to real-time unitary evolution, but in order to reconstruct the Gibbs ensemble at finite temperatures one must either perform several pure state simulations or extend tensor networks to mixed states [43,93,94].
(iv) Open system dynamics, usually undergoing a Lindblad master equation evolution (Markovian scenario), with local dissipative quantum channels. Besides the need to reconstruct mixed states, this setup has the additional requirement of designing a scheme for implementing a master equation dynamics into the TN ansatz, either by direct integration [43,93,94] or by stochastic unravelling [95].
Depending on the type of problem one aims to address and on the boundary condition under consideration, either open or periodic, some TN designs are preferable to others. In Sec. 5 we will describe in detail an algorithm for ground state search based on loop-free TNs.

Accessible Information in Tensor Networks
Being able to effectively probe and characterize a relevant quantum many-body state is just as important as determining the state itself to understand the underlying physics and the emergent properties of the system. The TN architectures that we discuss here are meant to describe finite systems, while the thermodynamical limit properties are obtained via extrapolation. The thermodynamical limit can also be directly addressed, e.g. by means of infinite homogeneous tensor networks (such as i-MPS/i-PEPS [96][97][98] or i-TTN/i-MERA [58,99,100]), but these methods will not be discussed here.
Typical quantities of interest for tensor network ansatz states are: (i) State norm Ψ QMB |Ψ QMB − The capability of extracting the norm of a tensor network in polynomial computational time as a function of the number of tensors in the network itself is often simply referred to as efficient contractibility.
Loop-free tensor networks, such as MPS and TTN, have this property built-in (assuming that the link dimensions are bounded from above), as we will remark in Sec. 4. Some tensor networks with loops, such as MERA, also have this property as a result of their specific geometry and algebraic constraints. In general, however, this is not true -the PEPS being the textbook example of a non-efficiently contractible tensor network (although approximate, stochastic methods for PEPS contraction are known [101] [102]) and boundary order, as well as to implement long-range correlations in Jordan-Wigner frameworks.
(iv) Expectation values of operators in a tensor network form − When the observable itself can be expressed in a TN formalism, it is natural to extend the notion of efficient TN contractibility to the state | operator | state expectation value. This is, for instance, the case when measuring the expectation value of a Matrix Product Operator (MPO) over a Matrix Product State, which is indeed an efficiently contractible operation [103].
(v) Entanglement properties − Any bipartition of the physical system corresponds to one or more possible bipartitions of the network. Via contraction of either of the two sub-networks it is then possible to reconstruct the spectrum of the Figure 3: For a given tensor network (a), the link η between tensors T [q] and T [q ] is replaced by a matrix Y and its inverse (b). Contracting Y and Y −1 to their respective neighboring tensors (c) gives an alternative description of the tensor network without changing the physical content of the stored information.
reduced density matrix of either subsystem, and thus extract the entanglement entropy, i.e. the von Neumann entropy of the reduced density matrix, and all the Rényi entropies, ultimately enabling the study of critical phenomena by means of bipartite [104], and even multipartite entanglement [105].

The Tensor Network Gauge
Tensor network states contain some form of information redundancy with respect to the quantum state it describes. This redundancy translates conceptually into a set of linear transformations of the tensors in the network which leave the quantum state, and thus all the physically relevant quantities, unchanged. Since these operations have no impact at the physical level, they are often referred to as gauge transformations, in analogy to field theories. The gauge invariance relies on the fact that performing local invertible manipulations on the auxiliary TN spaces does not influence the physical degrees of freedom. We will therefore introduce the concept of gauge transformation here and use it extensively in the rest of the anthology. Specifically, let η be a given virtual link of the tensor network state from Eqs. (1) and (2), connecting node q to q , with auxiliary space dimension D η (see Fig. 3(a)). , it is insensitive to this transformation. Similarly, the network geometry is unchanged (as long as Y acts on a single auxiliary link). Such operation is sketched using the diagrammatic language in Fig. 3.
Combining multiple (single-link) gauge transformations as in Eq. (3), even on different links, leads to a larger class of transformations, which still leave the quantum many-body state invariant, and can be performed efficiently numerically. In fact, these algebraic manipulations turn out to be very useful tools when designing computational algorithms which operate on TN states. In many architectures (most notably in DMRG) adapting the gauge during runtime is a fundamental step for achieving speed-up and enhanced precision.
We will review linear algebra methods which implement the gauge transformations in Sec. 2, and their symmetric tensor counterparts in Sec. 3. We will explore the advantages of employing gauge conditions in Sec. 4 and Sec. 5.

Tensors and Tensor Operations
Tensor networks organize numerical data in the form of interconnected tensors. In practice, these tensors are double-precision complex numerical data in the computer memory (i.e. RAM or hard drive); they should be stored and organized in a way as to efficiently perform linear algebra operations on them. This section formalizes the notion of a tensor and introduces a set of basic operations serving as the toolbox for designing various tensor network geometries and respective algorithms.
In the following, we first introduce tensors formally and graphically (Sec. 2.1). We then present operations acting on these tensors with technical details and extend the graphical notation (Sec. 2.2).

Tensors
We define a tensor T with n links (n ∈ N 0 ) as an n-dimensional array T i 1 i 2 ...in of complex valued elements. It can be seen as a mapping where each complex scalar element T i 1 i 2 ...in is accessed by a tuple (an ordered integer list) of n indices i 1 , i 2 , . . . , i n . The index i r at position r takes values in the index set I r = {1, . . . , d r } ⊂ N and is referred to as the r-th link of the tensor. For numerical purposes, the link dimension d r := #I r is always finite. Tensors and links allow intuitive graphical representations as network nodes and network edges respectively (see Fig. 4). We do not need to formally distinguish virtual and physical links at the level of basic operations for links and tensors. The distinction between virtual and physical links is instead treated at the level of the network, as we will see later in Sec. 4.
A zero-link tensor is a complex scalar. A one-link tensor of dimension d 1 is a d 1 -dimensional complex vector. A two-link tensor of dimensions d 1 × d 2 is a (rectangular) complex matrix with d 1 rows and d 2 columns, and so forth. A detailed example of a three-link tensor is given in Fig. 5. In general, the n-link tensor is an element of the complex space C d 1 ×···×dn with #T := dim(T ) = r d r independent components. We notice that in certain cases, e.g. for problems with time-reversal symmetric Hamiltonians, the tensor elements can be restricted to real values (see also Appendix A).
For our following discussion, we assume, in accordance with standard programming, that tensor elements are stored in a contiguous section of linearly addressed computer memory (array). Specifically, each element T i 1 i 2 ...in can be stored and addressed in such a linear array by assigning a unique integer index (address offset) to it (see an example in Fig. 5). A simple way to implement this is by the assignment offset(i 1 , i 2 , . . . , i n ) := n r=1 (i r − 1) Figure 4: Graphical representation of (a) a zero-link tensor (i.e. a scalar), (b) a one-link tensor (a vector), (c) a two-link tensor (a matrix) and (d) a tensor with an arbitrary number n of links. Indices of a tensor (i.e. its links) can be denoted with arbitrary symbols; the ordering of the open link-ends reflects the order of the indices in the tensor.
which is known as "column major" ordering. There are, indeed, other models for storing elements. For efficient coding, one should consider the respective advantages of the ordering method (many programming languages and compilers use an intrinsic row major ordering, after all). In Sec. 3, we extend the definition of the tensor object: We will also consider tensors with additional inner structure (given by symmetry), generalizing diagonal or block-diagonal matrices.

Basic Tensor Operations
All tensor network algorithms involve certain basic operations on the tensors, ranging from initialization routines, element-wise operations such as multiplication by a scalar, complex conjugation T * , or tensor addition and subtraction, to very general unary or binary operations [106]. This section presents selected examples from two important categories of tensor operations, namely basic manipulations, and more involved linear algebra operations (see e.g. Fig. 6). Tensor manipulations usually operate on the elements or the shape of a single tensor, including position, number and dimensions of the links. They often reallocate and reshuffle data within the computational memory, but generally require no FLoating point OPerations (FLOPs). On the other hand, linear algebra operations such as contractions and decompositions require FLOPs, such as complex sum and multiplication. Tensor network algorithms, operating extensively on the level of tensors, will spend the majority of their runtime with basic tensor operations. In numerical implementations, it is therefore important that these operations are highly optimized and/or rely on robust and efficient linear algebra libraries such as the Linear Algebra PACKage (LAPACK) [107], the Basic Linear Algebra Subprograms (BLAS) [108][109][110]. Here we will not enter such level of detail and instead outline the algorithms and give estimates on overall computational cost and scaling behavior. Nevertheless, our advice, for readers who want to program tensor network algorithms, is to dedicate specific In the table we list all tensor elements and their indices (in columnmajor order) using the ordering (offset) from Eq. (4). In the additional columns to the right we also added the respective indices after the tensor is reshaped into a matrix and a vector (see fusing in Sec. The tensor T could e.g. encode the three normalized states , spanning a subspace in the two-qubit Hilbert space {|i 2 , i 3 }. Their expansions can be read from the rows of M, with columns corresponding to (i 2 , i 3 ) = (1, 1), (2, 1), (1, 2), (2, 2) from left to right.
functions and subroutines to each of the following operations. It is worth to mention that there are various open-source numerical libraries which handle tensor manipulations with similar interfaces, such as the iTensor [111] or the TNT library [112].

Tensor Manipulations.
Link Permutation. Reordering the link positions i 1 , . . . , i n of a tensor T by moving a link at position l in tensor T to a position k = σ(l) in the new (permuted) tensor T p : c d a b c σ Figure 6: Basic tensor operations. Common TN operations can be separated into elemental steps including: (a) Link permutation. Graphically, the permutation of links of a tensor T is shown by intertwining the links (σ, shaded area) to obtain the new order in T p . (b) Link fusion: Two or more links of a tensor T are joined to form a single one in the new tensor T f . This can be seen as a contraction with a fuse-node F (see Sec. 2.2.2). (c) Link splitting: a single link of a tensor T is split into two or more resulting in a tensor T s . The splitting is equivalent to the contraction with a split-node S. (d) Tensor contractions and (e) tensor decompositions.
As an example, imagine a four-link tensor and a permutation σ(1) = 2, σ(2) = 3, σ(3) = 1, σ(4) = 4. The tensor after permutation then has elements T p c,a,b,d = T a,b,c,d (see Fig. 6(a)). For tensors with two links, permutations correspond to matrix transpositions, M T ab = M ba . Carrying out a permutation has a computational complexity bounded by O(#T ), i.e. linear in the number of elements that must be rearranged in memory.
Link Fusion. Fusing links is an operation that combines m − k + 1 adjacent links at positions k, . . . , m into a single new fused link j ∼ (i k , i k+1 , . . . , i m ) which replaces the original links in the fused tensor (see Fig. 6(b)): The combined index j takes values in the Cartesian product of all fused links J = I k × · · · × I m . Typically we use an integer enumeration within all possible combinations of the fused link indices. Fusing can be realized in negligible computational cost O(1) as with the proper enumeration no elements have to be moved in memory (cf. Eq. 4). As an important example, a tensor can be converted into a matrix by simultaneously fusing both sides of the link bipartition 1, . . . , k − 1 and k, . . . , n into row and column indices a and b respectively (see also Fig. 5): Through link fusion, matrix operations become directly applicable to tensors with more than two links. Important examples include multiplication and factorization, which are discussed in Sec. 2.2.2. Fusing arbitrary combinations of non-adjacent tensor links can be handled by permuting the selected links into the required positions k, . . . , m first.
Link Splitting. Splitting of a link at position k of the original tensor T is the inverse operation to fusing. It produces a tensor T with one original link j k replaced (via inversion of Eq. (5)) by m − k + 1 new links i k , . . . , i m of dimensions d k , . . . , d m , their product being equal to the dimension d k of the original link, so that j k = fuse(i k , . . . , i m ) ( Fig. 6(c)). This operation is uniquely defined once the split dimensions d k , . . . , d m are set. Note that similar to fusing, splitting operations can also be performed in non-scaling time O(1), but may require a link permutation after execution. A special case of splitting is the insertion of an extra one-dimensional virtual link (dummy link) at any position, which may be useful when a certain tensor shape must be matched. Imagine, for example, that some TN-algorithm interface formally asks for a matrix-product operator (see also Sec. 5.1), but all we have is a simple product of non-interacting local operators. Joining those by dummy links allows one to match the generic interface in an elegant manner. A similar situation arises when computing a direct product of two tensors, for which we could simply employ the contraction operation, given we join both tensors by a dummy link, as proposed in Sec. 2.2.2.
Link Index Remapping: Compression, Inflation, and Reorder. A typical class of shape operations on a tensor are those that rearrange the index values from one (or more) of the connected links. These link remapping manipulations either reorder the index values, or increase/decrease the link dimension itself. Compressing one or more links connected to a tensor T i 1 ...in means selecting a subset of their allowed indices and discarding the rest, thus actually reducing the link dimensions. Link compression is carried out extracting a subset of the linked tensor elements by means of subtensor readout S j 1 ...jn := T i 1 ...in for j r ∈ J r ⊆ I r , r ∈ {1, . . . , n} and we call S a subtensor of T . The subtensor still has n links, and each may have smaller dimensions, depending on the number #J r of indices kept per link. Subtensors arise naturally as a generalization of block-diagonal matrices for instance due to the consideration of symmetry constraints, as we will see in Sec. 3.5.1. More specifically, let a link index i r undergo a mapping If M r is injective, the result is a subtensor of reduced link dimension(s). Furthermore, if M r is invertible, it is a permutation, which allows one to access elements in a different order. The latter is useful for sorting or aligning elements (e.g. after the fuse-operation of 2.2.1) into contiguous blocks for fast memory access. Finally, the inflation (padding) of a link corresponds to an increase of its dimension, and compression is its (left-) inverse operation. Inflation is performed by overwriting a subtensor S in a larger tensor holding the padded elements (see Sec. 5.1.3), by means of subtensor assignment T i 1 ...in := S j 1 ...jn for j r ∈ J r ⊆ I r , r ∈ {1, . . . , n} .
The remaining elements in the larger tensor T must then be initialized by the user, e.g. set to zeros. Manipulating, overwriting or extracting subtensors from or within larger tensors is an integral part of variational subspace expansion techniques [113,114]. Further use cases for subtensor assignment and readout arise in the construction of MPOs [103]  have a subset of k links in common -say d a n−k+j = d b j =: d s j for j = 1 . . . k -we can contract over these shared links s j . The result is a contracted tensor C spanned by the remaining n + m − 2k links {a 1...n−k , b k+1...m }. Graphically we represent a contraction by simply connecting the links to be contracted over ( Fig. 6(d)). The elements of C are obtained by multiplying the respective elements of A and B and summing over the shared links C a 1 ...a n−k b k+1 ...bm = s 1 ...s k A a 1 ...a n−k s 1 ...s k B s 1 ...s k b k+1 ...bm .  Figure 8: Variants of contractions. (a) Direct product of two matrices, or two-link tensors A ij and B mn . By attaching a one-dimensional "dummy" link to both tensors, we can indicate the contraction C ijmn = A ij · B mn . Formally, a subsequent (column major, Eq. Equation (8) can be cast into a matrix multiplication after fusing the shared links into a single link s = fuse(s 1 , . . . , s k ) of dimension d s = d s 1 · · · d s k and the remaining links of tensors A and B into row-and column indices a and b of dimensions d a = d a 1 · · · d a n−k and d b = d b k+1 · · · d b m , respectively: In the most common scenario for a tensor contraction, the links are not ordered as in Eq. (8). We then proceed as follows (see also Fig. 7): We first permute the links, so that tensors A and B appear as in Eq. (8). Then the link fusion is carried out, followed by the matrix multiplication as in Eq. (9). Finally we split the row and column links again and possibly permute a second time to obtain the desired link positions at C.
The contraction can be a costly operation in terms of computational complexity. Carried out straightforwardly as written in Eq. Implementing the tensor contraction via the matrix multiplication of Eq. (9) has a twofold advantage: It enables efficient parallelization [109,110,115] and it allows us to exploit optimized matrix multiplication methods [107,108,116]. As an example for the latter, the complexity of a contraction of square matrices can be improved from O(d 3 ) to O(d 2.376 ) when using optimized linear algebra methods [117].
However, an overhead for the required permutations must be considered. A careful initial analysis of the arguments can often help to avoid some of the permutations, or to replace a permutation on a large input or output tensor at the expense of one or two additional permutations on smaller ones. Such optimization is possible by exploiting the full freedom we have in rearranging the internal order in which links are fused into column-and row indices and also by utilizing implicitly transposed arguments available with many matrix-multiplication implementations.
Special Cases of Contractions. Some operations which are very common to tensor network architectures and algorithms can be regarded as special instances of tensor contractions. We list the most relevant ones in this paragraph.
(i) Fusing k links of a tensor T can be achieved by contracting a sparse k + 1 link tensor, the fuse-node F ≡ F i 1 ...i k ,j defined by 0 otherwise over those links i 1 . . . i k of T that shall be fused into j. Analogously, the splitoperation can be carried out by contracting a corresponding split-node S : ,j in a matrix notation. In this formalism, it is clear that fusing and splitting are mutually inverse operations, i.e. F † F = 1 j×j and FF † = r 1 ir×ir , since F is unitary by construction. While possible, fusing or splitting by such a contraction is less efficient than the enumeration procedure of Sec. 2.2.1. Nevertheless, fuse-nodes play an important role in the context of symmetric tensors where fusing and splitting is carried out according to group representation theory (see Sec. 3). We also employ them to encode the fusion graphically, as in Fig. 6(b).
(ii) The Kronecker-product or matrix-direct product C a 1 ...a n−k b k+1 ...bm = A a 1 ...a n−k B b k+1 ...bm (10) corresponds to the case when no links are shared in the contraction of Eq. (8). This operation can be achieved (up to a permutation) by attaching a shared dummy link on both tensors and contracting over it, as visualized in Fig. 8(a) with a practical application in Fig. 11(b).
(iii) In analogy to the multiplication with a diagonal matrix, we can contract a "diagonal" two-link tensor D ij = δ i,j λ i over a single link into some other tensor T . This operation can be carried out in time O(#T ) scaling linearly with the number of tensor elements. We represent those (often real) diagonal matrices or weights λ i over a link graphically by a thin bar as depicted in Fig. 10, emphasizing their sparsity.
(iv) The (partial) trace is a variant of the contraction, involving a single tensor only. It is depicted in Fig. 8(b). This operation sums up all tensor elements that share similar indices on k specified link-pairs at the same tensor where we contracted the first k links at positions 1 . . . k with the following k links at positions 2k + 1 . . . 2k. If the tensor T has n links, the resulting tensor tr[T ] has n − 2k links. If k = n/2, the tensor trace is equivalent to a matrix trace over the given bipartition of links. The complexity of the operation equals the product of dimensions of all paired links and all remaining links (if present), thus (v) A tensor norm, defined as a generalization of the Hilbert-Schmidt-or Frobeniusnorm by can be computed in O(#T ). As shown in Fig. 8(c), this is equivalent to the contraction of T with its element-wise complex conjugate T * over all links.
(vi) In principle, we can also regard permutation as a contraction Fig. 6(a)). However, this is not the most efficient numerical approach.
As a general remark, we stress that finding the optimal strategy to contract multiple tensors efficiently is usually a difficult problem [118]; nevertheless it is a fundamental issue, as contractions are often the bottleneck of the computational costs. This question ultimately encourages the usage of specific network geometries whose optimal contraction strategies are known (see e.g. Ref. [119]). Such geometries will be further discussed in Sec. 5.
Tensor Decompositions. A tensor decomposition takes a single tensor as input, and replaces it with two or more output tensors (see Fig. 6(e)). For every link of the input tensor, there will be a similar link attached to one of the output tensors. Additional "internal" links may connect the output tensors among each other.
For a tensor decomposition to be meaningful, we demand that the contraction of those output tensors over their shared internal links restores the input tensor within a certain precision. The simplest example for a tensor decomposition is the inversion of a contraction: we take the contracted tensor as input and replace it by the tensors that we originally contracted. In practice, however, we are more interested in special decompositions that equip the output tensors with favorable properties, such as an advantageously small size or certain isometries that come in handy when working with larger tensor networks (see Sec. 4).
The most notable decompositions are the QR-and the singular value decomposition (SVD), both instances of so-called matrix factorizations [120]. As a recipe for such an (exact) tensor decomposition, first choose a bipartition of the tensor links i 1 , . . . , i r and i r+1 , . . . , i n of an n-link tensor and fuse them into single row and column indices a and b respectively. If a different combination of fused links is required, we apply a permutation beforehand. We then decompose the matrix into factor matrices A ak , B kb , and eventually the diagonal matrix λ k , such that the matrix multiplication yields Finally, we split the links we initially fused to restore the outer network geometry. Thus we obtained tensors that contract back to T .
QR Decomposition. Following the procedure of Eqs. (12) and (13), the QR decomposition splits the tensor T into two tensors Q and R such that where R kb is an upper-trapezoidal matrix and Q ak is a (semi-)unitary matrix with respect to k (or left isometry), meaning that it satisfies a Q ak Q * ak = δ k,k (as shown in Fig. 9). Finally, T is equal to the contraction of Q and R over the intermediate link k, which has the minimal dimension Singular Value Decomposition. The SVD is defined as being the maximum dimension of the fused links. Even though the SVD carries a computational cost overhead with respect to the QR decomposition (a direct comparison is shown in Fig. 48), it is often useful to have both available, since the SVD has several advantages: (i) In common tensor network algorithms, the singular values reveal entanglement properties. This is true, in particular, for loop-free TNs (see Sec. 4).
(ii) Singular values equal to numerical zeros can be removed from the intermediate link.
The dimension d k of the latter is then reduced to the actual matrix rank of T ab , and the factors V ak , W kb become full rank matrices which can be extracted via subtensor readout (c.f. Eq. (6)) from the retained intermediate link indices.
Subtensors: Figure 11: Two strategies for exponentiating a Kronecker product of two tensors A ⊗ B (i.e. hosting the matrix elements of two local operations on a lattice). Both demonstrate the power of combining several basic tensor operations with a unified graphical notation. (a) The exact exponential exp {A ⊗ B} is formed from first expanding the Kronecker product to a full four-link tensor (by contracting over an inserted dummy link, cf. Fig. 8 (a)). Fused into a square matrix M, this tensor can be exponentiated with matrix numerical routines. Finally, we split the fused links into the original configuration. (b) The Taylor-series expansion up to (small) order n can be rewritten as a contraction of smaller tensors A and B over a third, shared link k of dimension d k = n + 1. To this end, scaled powers of A, B are inscribed as subtensors into A , B at the respective index values of k, as depicted. Note that in practice, conditions on numerical stability apply.
(iii) Small singular values can be approximated with zeros, effectively further reducing the link dimension d k . This truncation, basically equivalent to a link compression as we discussed in Sec. 2.2.1, usually generates an error on the "compressed" state, but in most tensor network geometries the discarded values can be used to estimate an upper error bound (see e.g. Ref. [30]).
(iv) Both output tensors are semi-unitary (isometries) with respect to k, regardless of the amount of singular values kept. This may prove advantageous in establishing specific tensor network gauges, as we will see in Sec. 5.
By discarding negligible singular values (compressing the tensor network) we relax Eq. (15) to an approximated equality, but it can be motivated physically and has a significant impact on performance of TN algorithms (see Sec. 5). If the number of singular values hereby removed from the intermediate link becomes large compared to d k , one can in addition take advantage of efficient approximate algorithms for matrix-factorizations [122][123][124][125], some of which offer good control of the errors arising both from the stochastic method itself and from the discarded singular values.
Other Operations. Other relevant linear algebra operations are those restricted to square matrices, such as diagonalization, which is typically carried out either exactly [107] or via approximate methods such as Lanczos [126], Jacobi-Davidson [127], or Arnoldi [128] as well as exponentiation, usually performed via Taylor or Padé expansion [129] or using the EXPOKIT package [130]. Note that tensor operations, especially when encoded with unified and compatible interfaces, are a powerful and flexible numerical toolbox. For example, in Fig. 11 we combine several basic operations to exponentiate local interaction operators, a central task in time-evolution algorithms like TEBD [46].

Abelian Symmetric Tensors and Operations
Symmetries are ubiquitous in nature, and both in classical and quantum mechanics they provide an essential advantage by reducing the full problem into a set of decoupled, simpler sub-problems: Symmetries are generally expressed in terms of conserved quantities (Noether's theorem); in quantum systems, such conserved quantities are identified by of operators commuting with the Hamiltonian. It follows that these operators and the Hamiltonian can be simultaneously diagonalized: Once the operator is diagonalized, the problem for the Hamiltonian becomes block-diagonal. In numerical contexts, the role of symmetries is then twofold: they provide a substantial computational speed-up, and they allow for precise targeting of symmetry sectors, in a "canonical ensemble" framework. More precisely, one can study any variational problem by restricting only to those states having a well-defined quantum number. This is a valid, robust, more direct approach than using a "grand canonical" framework (e.g. selecting the average number of particles by tuning the chemical potential), which is computationally more expensive as it requires to consider multiple symmetry sectors at once.
Here, we first review the quantum framework for Abelian symmetries and provide examples of the Abelian groups appearing in typical problems (Sec. 3.1). We then discuss symmetries that can be encoded in tensor networks (Sec. 3.2). In the rest of this section, we upgrade the tensors data structures and operations introduced in the previous section to include Abelian symmetry content. We discuss in particular, how to organize the symmetry information in the tensors (Sec. 3.3) and discuss how networks built from such tensors displays the same symmetry properties (Sec. 3.4). Finally, we review how tensor operations benefit from the inclusion of symmetry information (Sec. 3.5).

Quantum Symmetries in a Nutshell: Abelian Groups
In quantum mechanics a symmetry is a typically unitary representation U (g) of a group G of transformations protected by the dynamics, i.e. commuting with the Hamiltonian H: [U (g), H] = 0 ∀ g ∈ G (we briefly discuss anti-unitary representations in Appendix A). As a consequence, it exists an eigenbasis of the Hamiltonian such that each energy eigenstate belongs to a single irreducible representation (irrep) subspace.
Specifically, symmetric eigenstates will be of the form |ψ ,m,∂ and satisfy H|ψ ,m,∂ = E ,∂ |ψ ,m,∂ where labels the irrep subspace, and is often referred to as sector, or charge, or (primary) quantum number. m spans the states within the irrep subspace (secondary quantum number), and the energy level E ,∂ is independent of m as a consequence of Schur's lemma. Finally, ∂ contains all the residual information not classified by the symmetry (symmetry degeneracy label). In this sense, the symmetry group acts as U (g)|ψ ,m, is the -sector irrep matrix associated to the group element g. Symmetries in quantum mechanics can be either Abelian (when [U (g), U (g )] = 0 ∀ g, g ) or non-Abelian (otherwise). From a computational perspective, the former are typically easier to handle as the related Clebsch-Gordan coefficients (fusion rules of the group representation theory) become simple Kronecker deltas [131]. In tensor network contexts, handling fusion rules is fundamental (see Sec. 3.5), thus addressing non-Abelian symmetries is somehow cumbersome (although definitely possible [132][133][134]), requiring the user to keep track of Clebsch-Gordan coefficients for arbitrary groups. For this reason, in this anthology we focus on addressing Abelian symmetry groups: Even when simulating a non-Abelian symmetric model one can always choose to address an Abelian subgroup [135], gaining partial speed-up (in this case, unpredicted degeneracies in the energy levels are expected to occur). Some examples of Abelian reduction strategies are mentioned later in this section.
Abelian symmetries have one-dimensional unitary (complex) irreps. Such irreps take the form of phase factors W [ ] (g) = e iϕ (g) , where the phase ϕ depends on the quantum number and group element g, while no secondary quantum numbers m arise. Then, the symmetric Hamiltonian is simply block-diagonal in the sectors and has no additional constraints.
In the following we present the Abelian symmetry groups commonly encountered in quantum mechanics problems, listing irreps, group operations and fusion rules for each of them. Additionally, for later use, we define the irrep inversion → † as the mapping to the inverse (or adjoint) irrep, defined by ϕ † (g) = −ϕ (g) ∀g∈G. We stress, for later convenience, that inversion is distributive under fusion, ( ⊕ ) † = † ⊕ † . We will also generally write for the identical irrep ident = 0. This irrep exists for every group: It fulfills ϕ 0 (g) = 0 ∀g∈G and is self-adjoint.
(i) Z n discrete planar rotation group, also known as group of the oriented regular polygon with n edges. This is the cyclic Abelian group with n elements, and satisfies g n = 0 (the identity group element) for any element g ∈ Z n . The parity group Z 2 is a particularly relevant case. (ii) U(1) continuous planar rotation group, or group of the oriented circle. This is the symmetry group related to the conservation laws of integer quantities, such as total particle number ( is this number), an extremely common scenario in manybody physics.
(iii) Z infinite cyclic group, or the group of integers under addition. This group is connected to the conservation law of a quantity within a real interval, usually [0, 2π) (such as the Bloch wave-vector conservation under translation of a lattice constant in a periodic potential problem). Group elements parameter: g ∈ Z Group operation: Examples: Discrete translations on an infinite 1D lattice.
(iv) Multiple independent symmetry groups. For a model exhibiting multiple individual Abelian symmetries, which are independent (i.e. mutually commuting and having only the identity element of the composite symmetry in common), the composite symmetry group is given by the direct product of the individual symmetry groups, and is Abelian. Typical composite groups are combinations of the aforementioned basic groups (Z n , U(1), Z). Here we provide a symmetry combination example built from two individual symmetries, generalization to more symmetries being straightforward. In this scenario, one can select the irrep label k for each elementary subgroup k (or subsymmetry), making the composite irrep label actually an ordered tuple of scalar labels, which fuse independently. Group elements parameter: g 1 g 2 = g 2 g 1 , with g 1 ∈ G 1 and g 2 ∈ G 2 . Irrep labels: = [1] , [2] , every element in this ordered tuple is an allowed scalar irrep label of the corresponding elementary symmetry group, with ident = (0, 0). Irrep phases: ϕ (g) = ϕ [1] (g) + ϕ [2] (g) [2] , where ⊕ k is the fusion rule of subgroup k. Examples: XYZ model for an even number of sites: Z 2 × Z 2 = D 2 (π-rotation along any of the three Cartesian axes, applied at every site). D 2 is the simplest noncyclic Abelian group, also known as the group of the rectangle, or Klein four-group.
Regarding the reduction of non-Abelian symmetry groups into their largest Abelian component, typical examples follow. The full regular polygon group D n (present in the standard Potts model) is often reduced to Z n : the rotations are kept, the reflections are discarded. The unitary group of degree N , U(N ), is often reduced into U(1) ×N by considering only the diagonal unitary matrices, which are defined by N independent angles. Similarly, SU(N ) is usually reduced to U(1) ×(N −1) (N angles with 1 constraint). Abelian reduction can be often performed naively, even for more complex symmetry groups: For instance, the symmetry group of the spin- 1 2 Fermi-Hubbard model, which is SO(4)( ∼ = SU(2) × SU(2)/Z 2 ) [137], is often treated just as a U(1) × U(1) symmetry (particle conservation for each spin species).

Encodable Symmetries in Tensor Networks
Not every symmetry (regardless of its group) can be efficiently encoded in the tensor network formalism: Depending on how a symmetry transformation acts on the QMB system it may be suitable or not for TN embedding. Symmetries that are known to be particularly apt for implementation in tensor networks [106,134,138,139] are those which transform independently each degree of freedom without making them interact, namely those satisfying U (g) = j V j (g), where V j (g) is the local representation of group element g at site j. V j (g) acts only on the degrees of freedom at site j, and it may depend explicitly on j. We refer to this symmetry class as "pointwise" symmetries, because they act separately on every point (site) of the real space. There are two main sub-types of this symmetry class which have a major impact on physical modeling: (i) Global pointwise symmetries: These symmetries have support on the full system, i.e. the V j (g) are non-trivial (different from the identity) for every site j. Usually these symmetries also have a homogeneous pointwise representation, that is, V j (g) does not explicitly depend on j. Typical examples are conservation of the total particle number or total spin magnetization.
(ii) Lattice gauge symmetries [140,141]: These pointwise symmetries have restricted supports, which usually extend just to a pair or plaquette of neighboring sites (V j (g) is the identity elsewhere). In turn, lattice gauge models often exhibit an extensive number of such symmetries (e.g. one for every pair of adjacent sites), which makes them again a powerful computational tool.
Most condensed matter system also include symmetric content which is not of the pointwise form. For instance, this is the case for lattice geometry symmetries, such as translational invariance, lattice rotation/reflection and chiral symmetry. These symmetries play often an important role, and characterize some non-local order properties of various lattice systems (e.g. topological order) [142]. Although there are indeed proposals on how to address explicitly these other symmetry classes into specific tensor network architectures [69,143,144], they can not be properly treated on a general ground, and will not be discussed here.
Having reviewed which classes of Hamiltonian symmetries we can efficiently use, we will now provide in-detail documentation on how to exploit these symmetries to achieve a computational advantage on TNs. Such construction relies on precise targeting of quantum many-body states having a well-defined quantum number: In this sense we will talk about invariant and covariant states, as we will see in Sec. 3.4. These network are built on symmetry-invariant tensor objects and link objects, which we will introduce in the next section.

Constructing Symmetric Tensors
In the construction of symmetric tensors, links (both physical and auxiliary) become associated with group representations and tensors become invariant objects with respect to the linked representations. This allows us to upgrade the data structures of links and tensors and their manipulation procedures to accommodate the symmetry content. Ultimately, symmetric links and tensors are the building blocks to construct symmetryinvariant and symmetry-covariant many-body TN states, as we will see in Sec. 3.4.

Abelian Symmetric Links.
A tensor link is upgraded to be symmetric by associating a quantum number to each link index i ∈ I. The number of different quantum numbers¯ can be smaller than the link dimension d. If we then distinguish indices having the same quantum number by a degeneracy count ∂ = 1 . . .∂ , we identify and call the tuple ( , ∂) the (Abelian) symmetric link index. It is useful to organize the symmetric link indices with the same quantum number in a∂ -dimensional degeneracy space or sector D such that the link index set I decomposes into their disjoint union From a physical perspective, we associate a specific unitary representation of the group G with the link. Namely, each tensor-link index i identifies a specific irrep of group G, labeled by the quantum number (or charge) and acting as a phase multiplication e iϕ (g) . In this sense, one can also interpret a symmetric link as a link equipped with a diagonal, unitary matrices group W (g) ∈ C d×d (or representation of G) in which the diagonal elements are irreps e iϕ (g) of G, each appearing∂ times in accordance with the index decomposition of Eq. (17): Thanks to the group representation structure embedded in the links, tensors can share information related to the symmetry.

Abelian Symmetric Tensors.
A tensor with all n links symmetric is called symmetric (invariant) tensor if the tensor is left unchanged by the simultaneous application of the n representations W r (g) of G associated with each tensor link r: . . .

T
. . . Figure 12: Graphical representation of an Abelian symmetric tensor (left) with n directed links. Incoming links are drawn as arrows directed towards the tensor. By definition, the tensor is invariant under application of the representations W rout (g) of G associated with all outgoing links r out (e.g. 3 and 4) and inverse representations W † r in (g) associated with the incoming links r in (e.g. 1 and 2).
With the superscripts † r ∈ {1, †}, we specify on which links the acting representations are meant to be inverted. Thus, we distinguish between incoming links, where the representation is inverted, and outgoing links, where it is not inverted. Such formalism allows for a convenient graphical representation of links as arrows directed away from the tensor or into the tensor (see Fig. 12). Every internal link will then be going out of one tensor and into another: Such requirement guarantees that the group action on internal links of a tensor network acts trivially (as a gauge transformation) W (g)W † (g) = 1.
The product n r=1 W †r r (g) in Eq. (19) represents a symmetry of the tensor and is a prototypical global pointwise symmetry because it acts on all the links of the tensor (global) but as a tensor product of local operations (pointwise).

Inner Structure of Abelian Symmetric Tensors.
With the link-indices i r replaced by the associated quantum numbers r and degeneracy indices ∂ r , the invariance Eq. (19) reads and we conclude that a tensor T is symmetric if it has nonzero elements T ∂ 1 ...∂n 1 ... n only where n r=1 e iϕ r (g) †r = 1. For our convenience, we introduce the Abelian irrep-or quantum number fusion rule and require that the quantum numbers from all the links fuse to the identical irrep where the sign of the individual phases is to be taken opposite for incoming and outgoing links (due to the inversion ϕ † = −ϕ ). On this basis we define the structural tensor We refer to any set of quantum numbers 1 , . . . , n that fuse to the identical one as a matching set of quantum numbers, or match for short. In addition to the structural tensor, according to Eq. (20), a symmetric tensor is identified by degeneracy (sub-)tensors R 1 ... n ∂ 1 ...∂n . The latter are defined over the degeneracy subspaces D 1 ⊗ D 2 ⊗ · · · ⊗ D n of all possible matches 1 , . . . , n : The maximal number of (nonzero) elements of a symmetric tensor T hence reduces to the number of elements inside the degeneracy tensors which is often much smaller than the number of elements of its non-symmetric counterpart (see later in this section for more detailed discussion). In Fig. 13 we give a graphical representation of a symmetry invariant tensor. Attaching a dummy selector link (holding a single, non-degenerate irrep select ) to a symmetry invariant tensor is a practical and yet rigorous way to build a covariant tensor (Fig. 14).

Symmetric Links and Tensor Data
Handling. We have shown in Eq. (24) that symmetric tensors are characterized by an inner block structure. This block structure emerged from the links being organized in quantum numbers and corresponding degeneracy spaces of Eq. (17), together with an assumed tensor invariance under symmetry transformations associated with the quantum numbers (Eq. (19)). In this section, we define all the necessary symmetric tensor operations to run symmetric TN algorithms on a computer. First of all, we define the following three numerical objects, suited for object-oriented programming.
Abelian Symmetry Group Object. This object is a composition of a set of allowed quantum numbers : e iϕ (g) is irrep of G , the quantum numbers fusion rule ⊕ and the quantum number inversion operation → † .
. . . Figure 13: Inner structure of an Abelian symmetric tensor: The underlying cyan shape on the right represents the structural tensorδ. Here it readsδ 1 ..
with inversion ( †) of the quantum numbers on incoming links (such as links 1 and 3). Only sets 1 . . . n which are matches contribute a degeneracy tensor R 1 ... n , depicted as non-symmetric tensor (gray shape) defined over degeneracy indices ∂ 1 . . . ∂ n (black lines). Figure 14: A covariant symmetric tensor is a symmetric (invariant) tensor with one link being a quantum-number selector link. This special link holds only a single, nondegenerate quantum number select . With this artifice, the structural tensor takes the usual formδ 1 2 3 select = δ For practical purposes, it is often sufficient to represent (a subset of) the group's quantum numbers as integers. In the presence of multiple symmetries, acting as a direct-product group G = G 1 × · · · × G K of K independent symmetry groups G k , k = 1 . . . K, it is convenient to operate with ordered tuples of individual quantum numbers instead: = [1] , . . . , [K] . These tuples obey element-wise fusion-rules, as described in (iv) of Sec. 3.2.
Symmetric Link Object. The symmetric link object encompasses a (possibly sorted) It is therefore invariant under a G = Z 2 group of parity symmetry operations: Take W r (g=0) = 1 and W r (g=1) = σ z , e.g. [W r (g = 1)] j,j = e i(j−1)π , on the single-qubit links r = 2, 3. Then the tensor transforms as Since the tensor is symmetry-invariant, we can recast it as a symmetric tensor using the following data structures: (a) The three symmetric tensor links r = 1, 2, 3 as i r = ( r , ∂ r ). (b) The symmetric tensor itself, defined by its list of matches L (in this example all four possible matches are present), plus a degeneracy subtensor R 1 2 3 ∂ 1 ∂ 2 ∂ 3 ×δ 1 2 3 for each match in L. (c) The graphical representation, where we consider the last two qubit-links outgoing and the first link incoming in accordance with the inversion on W † 1 in the defining invariance. Although the Z 2 symmetry is self-adjoint and link-directions can be neglected in this example, this is not true in general. Figure 16: Symmetric tensor network, composed of symmetric tensors. A global pointwise operation on the physical links (left) is absorbed by the network as follows: (a) We apply a gauge transformation with the representations on the virtual links. (b) Now we use the invariance property of each tensor to absorb all surrounding transformations at each tensor. We see that the application of the global pointwise symmetry j V j (g) is equivalent to the application of the phase shift Rη(g) = e iϕ select (g) . The representation at the selector link thus determines the sector select of the physical state.
list of¯ different quantum numbers of the group G and their corresponding degeneracies∂ ≥ 1.
Symmetric Tensor Object. A symmetric tensor T object consists of: (i) n symmetric link objects.
(iv) For each match ( 1 , . . . , n ) within L, a degeneracy tensors R 1 ... n . This is defined as an ordinary, non-symmetric tensor (defined in Sec. 2.1) over the corresponding degeneracy spaces with index-dimensions∂ 1 , . . . ,∂ n which can be queried from the links. Fig. 15 shows a practical example of this data structure. An important optimization is that only matches that come with non-vanishing degeneracy tensor R 1 ... n = 0 have to be listed in L. We say that these matches are "present" in T . In order to unify the handling of multiple symmetries in direct product groups with tuples = [1] , . . . , [K] of K individual quantum numbers, one can define simple integer sectors s and use them to index quantum numbers ≡ (s) within a given link. This way, the entries of L can be replaced by matching sectors (s 1 , . . . , s n ).

Symmetric Tensor Networks
For a symmetric Hamiltonian H, different symmetry sectors = are dynamically decoupled. Therefore, we can always restrict each TN simulation to span a single sector select of the symmetry, without loss of generality. In particular, we will construct, via a symmetric tensor network ansatz, quantum many-body states which are either invariant or covariant under the action of the (pointwise) symmetry. This construction will rely, of course, on symmetric tensor objects.
(i) Invariant states are those which satisfy U (g)|Ψ = |Ψ , i.e. they belong by definition to the sector = 0, the identical irrep. Let now the QMB state |Ψ be a tensor network state, given by Eqs. (1) and (2). Let us also associate a representation W η (g) of the symmetry group G to every auxiliary space (or nonphysical link) η. Then it is easy to see that if every tensor is invariant under the action of the group, i.e. it fulfills Eq. (19), the state |Ψ is symmetry invariant (see Fig. 16): We name this construct a symmetric (invariant) tensor network. For loop-free network geometries the reverse is also true [22]: it is possible to decompose an arbitrary invariant state into the symmetric tensor network formulation We will investigate in detail such construction in Sec. 5.
(ii) Covariant states are states |Ψ that belong to a single, selected irrep select of the symmetry (thus actually extending the concept of invariance to other irreps than the identical one, ident = 0). For Abelian symmetries they are the eigenstates of the whole symmetry group U (G), and transform as U (g)|Ψ = e iϕ (g) |Ψ . They are necessary to address a variational problem for a given (conserved) number of particles .
Encoding covariant states in TN language is a simple extension of the invariant case. Specifically, one can set every tensor to be invariant, as in Eq. (20), but then introduce an additional, special virtual linkη. This selector link is attached to a single tensor (any tensor fits, see e.g. Fig. 16), has dimension Dη = 1 and its symmetry representation Rη(g) is fixed to the select irrep Rη(g) = W [ select ] (g). Equivalently, one can think of a covariant symmetric tensor network as a TN made of a single covariant tensor (i.e. an invariant tensor equipped with a selector link), as in Fig. 14, while all the other tensors are invariant, as in Fig. 13. This is an effective way to target the covariant subspaces while using the same data structure used for invariant states: The symmetric tensor.

Symmetric Tensor Operations
So far, we focused on the upgraded data-structures and fundamental group operations. Now, we will equip symmetric tensors with the same basic functionality that we provided for ordinary tensors in Sec. 2.2. In the modular programming of tensor network algorithms, these symmetric operations constitute an outer layer with respect to the basic tensor operations, as we sketched in Fig. 1. Again, element-wise operations (such as scalar multiplication, complex conjugation or addition and subtraction) are not covered in detail. It is easy to cast them into a symmetric form by carrying out the respective ordinary operation on all degeneracy tensors independently. Such method extends naturally to the linear algebra operations, discussed in Sec. 3.5.2.
3.5.1. Link-and Tensor Manipulations. As for the non-symmetric tensors, we again first discuss operations that involve a single symmetric link or tensor.
Symmetric Tensor Initialization. Four notable ways to create a symmetric tensor object are (see Fig. 17): (A) Initialize from user-provided elements. Create the respective degeneracy tensors containing the provided elements and check the fusion-rule of the corresponding matches.
(B) Initialize with zeros. No match will be present, hence no memory allocated for degeneracy tensors.
(C) Fill all degeneracy tensors with random elements. First, we search for all possible matches given the directed links. We make all of them present, and we initialize the respective degeneracy tensors with elements provided from a random number generator.
(D) Set up an "identity" matrix. this requires a duplicate symmetric link. We then define a two-link tensor connected to the two links, with opposite directions. Then from all quantum numbers in the links we form matches ( , ), and create the corresponding degeneracy tensors, each equal to the identity: R , Note that it might be advantageous to use sparse diagonal matrices for degeneracy tensors (see Sec. 2.2.2).
Symmetric Link Inversion. This operation inverts a symmetric link: it maps all quantum numbers from the list of the link into their inverses → † . Degeneracies remain unchanged∂ (old) . After this, it might be necessary to re-sort the quantum numbers, and thus the degeneracies accordingly.
Inversion of a Link in a Symmetric Tensor. This operation inverts a link r connected to a tensor while preserving the invariance of the tensor. It acts as follows: (i) The corresponding symmetric link r is inverted (see above). (ii) The direction of link r is flipped. (iii) The list of matches is preserved, however every match is manipulated so that r → † r at link r. Notice that this preserves the fusion rule since the direction of r has been flipped and ⊕ †r r = ⊕( † r ) †r † for every and r . (iv) The degeneracy tensors are unaltered. Multiple links connected to a tensor can be inverted, simultaneously or sequentially.
( 1 , ∂ 1 ) Output: Symmetric Link Index Manipulations. Reducing, enlarging, and rearranging the set of indices of a symmetric link all belong to this class of transformations. For every irrep present in link r, its degeneracy indices ∂ ∈ {1,∂ } can be compressed, inflated, or rearranged as in Sec. 2.2.1. Additionally, quantum numbers can be entirely removed from a link, such as when∂ is compressed to 0 (in which case should be explicitly removed from the list), or added anew to the link, in which case a nonzero degeneracȳ ∂ new should be assigned to it.
Two special cases of symmetric link index manipulations should be mentioned here, because they play an important role in algorithms: Index Manipulations at a Symmetric Tensor. Index manipulation within a link of a symmetric tensor impacts the degeneracy tensors separately. On each degeneracy tensor, it acts as ordinary index inflation, reordering or restriction (as described in Sec. 2.2.1). Two special situations may arise: (i) When quantum numbers get entirely removed from a link (i.e. degeneracy drops to zero), corresponding matches and degeneracy tensors must be removed. (ii) When new quantum numbers appear on a link, new matches become possible and can be assigned nonzero degeneracy tensors. For the sake of efficiency, avoid creating new matches unless the related degeneracy tensors are to be filled with nonzero elements.
Symmetric Tensor Element Access. An individual symmetric tensor element can be accessed from its quantum numbers and degeneracy indices. Similarly, an entire degeneracy tensor can be addressed by providing the quantum numbers (match) alone. A more general interface to address tensor elements is provided by subtensors (see Sec. 2.2.1). Also in the symmetric case, a subtensor S is obtained from T by reducing the set of degeneracy indices in all links (see Sec. 3.5.1). As a consequence, S is a subtensor in T if all its degeneracy tensors are ordinary subtensors in the corresponding degeneracy tensors of T .
Subtensor Readout. This operation reads out a subset of elements at specified indices from the tensor T and stores them into a new tensor S, according to a link-Input: Tensor T : Tensor links: Subtensor S: Subtensor links:

Output:
Tensor T with subtensor S inscribed:   Fig. 15. Top: T and the subtensor S come with similar symmetry and number of links, but the second degeneracy index ∂ 1 = 2 has been removed from the degeneracy subspace D 1 =0 of the invariant quantum number in the first subtensor link. This is an exemplary link-index reduction, mapping all the lower degeneracy indices (here ∂ ≡ 1) from S onto the same indices (∂ ≡ 1) in T . Bottom: Output tensor T after assignment, with all elements from S inscribed into respective subtensors of T . index reduction given for every link r. Matches in S are the subset of all matches from T whose quantum numbers r have not been deleted from the respective link. Since each degeneracy tensors of S is a subtensors in the respective degeneracy tensors of T , this operation performs a block-wise ordinary subtensor extraction.
Subtensor Assignment. This operation overwrites elements at specified indices in a tensor T with the content of a subtensor S. Again, the respective element indices are specified by a link-index reduction for every link r, such that we can perform blockwise ordinary subtensor assignment on degeneracy tensors of S and T . It is however important to be aware that degeneracy tensors associated with non-present matches are implicitly filled with zeros. To this end, we compare the lists of matches L T and L S of both tensors T and S, respectively. In detail, we distinguish three cases (see Fig. 18 for an example): (i) Matches in L T ∩L S , present in T and S: We perform ordinary subtensor assignment between the respective degeneracy tensors of similar matches.
(ii) Matches in L T \ L S , present in T but not S: We set corresponding degeneracy tensor elements in T to zero, equivalent to ordinary subtensor assignment where the degeneracy tensor of S is filled with all zeros. In the special case where the respective degeneracy tensors of T and S are of same dimensions, we can instead remove the former from L T as it becomes entirely zero.
(iii) Matches in L S \ L T , present in S but not T : We add these matches to the list of L T , initialize the corresponding degeneracy tensors to zero, and finally inscribe the subtensor from S via ordinary subtensor assignment.
Permutation of Symmetric Links at a Tensor. The reordering of symmetric tensor links from ( r , ∂ r ) into σ(r) , ∂ σ(r) , given a permutation σ, is equal to a permuting all present matching quantum numbers and their associated degeneracy tensors according to σ, and finally keeping track of the symmetric tensor's link directions † σ(r) . The computational cost is dictated by the ordinary permutations on the degeneracy tensors, which is at most linear in the number of elements.
Fusion and Splitting of Symmetric Links. The fuse-and split-operations in Sec. 2.2.1 can be conveniently adapted to symmetric tensors. However, taking into account the special structure of symmetric links, we cannot rely on the simple enumeration scheme of Eq. (5) any more. Instead, we must resort to a more general permutation which is still computationally cheap but also respects the extra symmetry structure of the links. We organize this paragraph as follows: First we define the symmetric link-fusion as a bijection between multiple directed symmetric links and the combined index of the fused link. This mapping is then summarized in the form of a (symmetric) fusenode. Finally, we describe how to obtain the corresponding fused or split tensor in full symmetric representation. Link Fusion: When fusing symmetric links, the goal is to replace multiple links of an n-link tensor T with given directions { † r } at positions r = k, . . . , m by a single, again symmetric fused link j whose associated symmetric index ( , ∂) is a bijection from the original link indices We want to choose this mapping such that the group representation W (g) associated with the fused link and each group element g ∈ G is the matrix direct product (or Kronecker product) of the representations on the original links i.e. such that in agreement with the defining Eq. (19) the fused tensor T remains symmetric. This requirement fixes and we recover the quantum-numbers fusion rule. Note that Eq. (26) holds for an outgoing fused link. To obtain an incoming fused link, has to be inverted. For the degeneracy indices ∂, any enumeration over fused link indices with the same quantum number can be adopted. Such degeneracies arise from two sources: the degeneracies already present in the original links, and "collisions" of similar quantum numbers obtained from different combinations fuse( k , . . . , m ) = fuse( k , . . . , m ), as Eq. (26) is not bijective for itself. We adopt the following strategy: On each combination k , . . . , m we use the ordinary fusion D k × · · · × D m → D k ... m (Eq. (5) Fig. 15) and the fused ( b , ∂ b ) given in (b). The elements are detailed in (c). We have only two matching quantum numbers and hence two degeneracy tensors, which appear as blocks in the block-diagonal matrixrepresentation M shown in (d). Here we present rows and columns in ascending order (degeneracies being the fast index) according to ( a , ∂ a ) = (0, 1) , (0, 2) , (1, 1) and ( b , ∂ b ) = (0, 1) , (0, 2) , (1, 1) , (1, 2). Note that b in this example also corresponds to the non-symmetric link index of Fig. 5, but here we have reordered the fused symmetric link indices b , ∂ b according to quantum numbers b .  Figure 21: Fusing T (Fig. 15) into a vector V j ∂ j : The three-link-fusion ( j , ∂ j ) = fuse(( 1 , ∂ 1 ) , ( 2 , ∂ 2 ) , ( 3 , ∂ 3 )) is detailed in (a). The fused link is summarized in (b) and (c), where the degeneracy∂ j originates from two sources: For one, the original link ( 1 , ∂ 1 ) already comes with degeneracy∂ 1 =0 > 1. Furthermore, the collisions of combinations 1 , 2 , 3 into similar quantum numbers j are cumulated in ∆ α j (see Sec. 3.5.1). The resulting vector has only non-trivial support in the identical sector j = 0, as it must again be invariant. If we again order the indices ascending in j , ∂ j with fast increment in the degeneracy, we obtain the vector representation in (d). Graphically, the vector would be depicted as shown in (e). Note that we would have obtained the same result by fusing links ( a , ∂ a ) and ( b , ∂ b ) at the matrix M from Fig. 20, as link fusions can be nested. by adding the appropriate degeneracy offset ∆ α for each collision: Here α ↔ ( k , . . . , m ) uniquely enumerates all collisions into a given , and ∆ α := α <α∂ α amounts to the cumulate partial degeneracy obtained from fusing all preceding colliding degeneracy subspaces of dimensions∂ α := #D α =∂ k × · · · ×∂ m . This choice of concatenated partial degeneracies makes fusing and splitting of subsequent links at symmetric tensors computationally cheap, because none of the original degeneracy tensor elements needs to be reordered in memory. It only involves moving contiguous sections of memory holding complete degeneracy tensors into, or from, the offset positions Eq. (27) within the fused degeneracy tensors.
Examples are given in Figs. (20) and (21), where we iterate over all possible combinations k , . . . , m in column major ordering (fast increment in k ) and obtain α = 0, 1, . . . as collision counter for each fused quantum number . Simultaneously, we increment the cumulate degeneracy offset with every collision by ∆ α+1 = ∆ α + m r=k∂ r , such that we finally obtain the fused link total degeneracy∂ given we started from ∆ 0 = 0.
Fuse-Node: The mapping of the symmetric link fusion in Eq. (25) is well-defined through Eqs. (26) and (27). Consequently, it is practical to store the relevant information of this mapping in a separate numerical object, the (symmetric) fuse-node F. As we previously introduced, the fuse-node is a special symmetric tensor defined over the links k, . . . , m and corresponding link directions copied from T , plus the fused link ( , ∂) in direction opposite to that on the fused tensor T : The fused tensor T is then obtained by simply contracting F into T over the original links which shall be fused, as shown in Fig. 19. Splitting up the fused link into the original links restores T and can be achieved by a second contraction of T with the split-node F † over the fused link. F † equals the fuse-node but has all link-directions inverted. Since splitting a link is the inversion of fusing, their contraction over the fused link is again the identity F † F = 1.
However, owed to its sparse structure, storing F in the form of a symmetric tensor and carrying out fuse-and split-operations by actually contracting such a fuse-or split-node is computationally inefficient. Therefore, we avoid expanding Eq. (28) into tensor elements and instead store the relation , α ↔ k , . . . , m between the fused quantum number with corresponding collision index α, and each possible combination of quantum numbers k , . . . , m from the original links. One possibility to achieve this is to enumerate the latter by a scalar index similar to Eq. (4), which can also easily be inverted.
Finally, we store the degeneracy offsets ∆ α in the fuse-node, along with the original symmetric links and the fused link, including all link-directions. Such a fuse-node can be viewed as a fully equipped network-node. One can then define further basic operations for these fuse-nodes -just like we did for tensors. The permutation is a common example, which is frequently combined with link-fusion: Permuting a fuse-node merely involves rearranging some link-positions, collision-indices and degeneracy offsets, which is much faster than permuting the fused links in a tensor before fusing or after splitting.
We now outline two algorithms that, given one such fuse node, create the fused-or split tensor. For efficiency, simultaneous fusions of arbitrary (not necessarily adjacent) disjoint groups of links of a tensor can be handled with minor adaptations, allowing the algorithm to process multiple fuse-nodes simultaneously. The total computational cost of both algorithms will be composed of ordinary non-symmetric fusion-and splitoperations on all degeneracy tensors, e.g. it will still be at most linear in the number of symmetric tensor elements. While for ordinary fuse-or split-operations it was possible to directly edit the input tensor in cases where no permutations were required, both our symmetric fuse-and split-algorithms generally reorder the degeneracy spaces. Therefore, a straightforward implementation generates a distinct output tensor object without modifying the input; the ordinary fuse-and split-operations might need to be adapted accordingly.
Fused Tensor: The complete link-fusion at a symmetric tensor T could proceed as follows: (i) Initialize the fused tensor T with n = n − m + k links similar to T except having the links and link directions from positions r = k, . . . , m replaced with the fused link.
(ii) In all tuples of matching quantum numbers present in T , replace k , . . . , m by the fused and remove possible duplicates ("collisions"). This yields the list of matches L of T .
(iv) Go through the list L with matching quantum numbers of T again, but this time fuse the links (k, . . . , m) at each degeneracy tensor R 1 ... n and place these as subtensors into the respective degeneracy tensors of T , which might have a larger degeneracy dimension∂ ≥ m r=k∂ r due to collisions. The subtensor's exact position in the degeneracy subspace D is determined by the interval of degeneracy offsets ∆ α + 1, ∆ α+1 , where α is the enumeration index of collision k , . . . , m into as specified with the fuse-node F.
Note that if the original tensor had non-present matches, the fused tensor does also not necessarily have all matching quantum numbers present in the fused list L . This must be considered when e.g. fusing a tensor into a block-diagonal matrix M ∂a∂ b a b in order to perform matrix operations. For example, in the exponentiation of a square matrix with one incoming and one outgoing link (such as a time-evolution operation), non-present matches ( a , b = a ) / ∈ L must be identified and explicitly reinstalled with identity degeneracy tensors R a b = 1 as result from the exponentiation of the implicit zero blocks.
Split Tensor: Restoring the original links ( k , ∂ k ) , . . . , ( m , ∂ m ) in place of a fused link ( , ∂) at some splitting-position k in the fused tensor T yields the split tensor T .
Step-by-step, the split tensor can be obtained from T of order n and the respective fuse-node F as follows: Note that the tensor T , at which we split a link, does not necessarily have to be the result of a fuse-operation. The required fuse-node could also be obtained from the fusion of similar links at another tensor, or it might have been created manually. If however we subsequently fuse and split the same links, with no further operations performed in between, the original tensor is restored: T = T . However, degeneracy tensors with all elements equal to zero may appear in such a process. For sake of efficiency, one can remove those from T in a post processing step after splitting. (iii) For numerical purposes, it may be advantageous to avoid fusions and keep as many links as possible, since that gives us the freedom not to store certain matching sectors ( 1 , . . . , n ) ∈ L and their corresponding degeneracy tensors R 1 ... n that would otherwise appear as blocks of zeros within the enlarged degeneracy tensors R 1 ... k−1 of the fused tensor.
(iv) The tensor symmetry invariance is not broken by adding a one-dimensional "dummy" link (in either direction) which carries only the identical quantum number = 0 (see Fig. 22).
Downgrade of a Symmetric Tensor into an Ordinary Tensor: The operation of recasting a symmetric tensor (and tensor network) can be useful for debugging and interfacing symmetric to non-symmetric tensor objects. For example, it can be used to measure a non-symmetric observable O for a symmetric state. The links are first downgraded from symmetric links to ordinary links, where each downgraded link dimension D = ∂ is the sum of its previous degeneracy space dimensions∂ , and an index mapping is chosen. The degeneracy subtensors R are assigned into the new ordinary tensor at the right positions, and the remaining tensor elements are filled with zeros.
Link Clean-up. This is a special operation upon a symmetric link, which employs information from the surrounding network links to wipe out redundant data on the link itself. Its goal is to remove indices while preserving the TN state manifold, and it finds extensive use in the random symmetric TN initialization (see Sec. 4.3.1). Namely, assume we want to clean-up the link connecting the network nodes q and q (tensors at these nodes do not have to be defined). We then intersect the link representation with the fusion of all the other (directed) links connected to q and again, with the fusion of all the other (directed) links connected to q . The original link is then replaced by the resulting intersection.

Symmetric Tensor Linear Algebra Operations.
Symmetric Tensor Contraction. In the contraction of two symmetric n-and m-link input tenors A and B over any k shared symmetric links (s, ∂ s ), the (m + n − 2k)-link output tensor C is again symmetric. It is however mandatory that the shared links are similar in quantum numbers and respective degeneracy dimensions∂ s , and that each of them is either directed from A to B or vice-versa. For example, the symmetric version of Eq. (8) reads where (a,  Fig. 8), now cast into a symmetric form: (a) Direct product. Between any two tensors, a dummy link can be inserted e.g. to indicate a contraction. With symmetric tensors, such intermediate product links may acquire a deeper meaning as quantum number "transfer link" when they carry a non-degenerate non-identical quantum number. A standard example is the correlation function c † ⊗ c of particle creation and annihilation operators c † and c under a U(1) particle-conservation symmetry as e.g. in a Hubbard model. In this context, the transferlink may be directed from c † to c and carries a single quantum number = 1 which accounts for the particle hopping associated with the operation c † ⊗ c. Contracting the (covariant) tensors representing both operators over the intermediate link yields again a symmetric invariant tensor. (b) Partial trace, which can be carried out as a contraction over pairs of similar, mutually incoming and outgoing symmetric links on the same tensor. (c) Hilbert-Schmidt norm, performed as contraction with the hermitian conjugate T † which extends the complex conjugation by the simultaneous inversion of all links. R a 1 ...a n−k b k+1 ...bm in C are obtained by adding up (outer sum in Eq. (29)) the standard contractions (inner sum) of degeneracy tensors from the input tensors, directly exposing the block-wise structure of such a procedure.
It seems natural to transform all involved tensors into (block-diagonal) symmetric matrices and carry out the contraction as a (block-wise) matrix-matrix multiplication, similar to ordinary contraction Sec. 2.2.2. Indeed, symmetric fuse-, split-and permute operations allow us to implement this sequence of operations in a simple an elegant manner. However, this may not always be the most efficient strategy, as it does not fully exploit the possible absence of matches within the fused tensor.
In the following, we present a specialized algorithm, and discuss the possible advantages over the "fuse-multiply-split" strategy afterwards: Namely, one first identifies pairs of subsets L s 1 ... , are then to be contracted for all available combinations a 1 , . . . , a n−k and b k+1 , . . . , b m of the quantum numbers not contracted over. The resulting contracted tensors are then stored together with an intermediate, yet non-unique list L C of tuples (a 1 , . . . , a n−k , b k+1 , . . . , b m ). As we required all contracted links to have pairwise opposite directions on tensors A and B, these tuples are already matching quantum numbers in the sense that where the † superscripts only yield the inverse quantum numbers on the incoming links of A and B respectively. In a final reduction step, recurrences of similar entries have to be identified and purged from the intermediate list, while the associated degeneracy tensors must be added up into the final degeneracy tensors R a 1 ...a n−k b k+1 ...bm . Together with their -now unique -list L C , we obtain the contracted and again symmetric tensor C.
Even though our example Eq. (29) again resembles a matrix-multiplication (after fusing row-and column-links), it should be noted that the set of matches L obtained by our specialized procedure usually contains only a subset of all possible matches (a 1 , . . . , a n−k , b k+1 , . . . , b m ) that were allowed by Eq. (30). It is therefore not advisable to first permute the full symmetric input tensors A and B into (block-diagonal) matrices as in Sec. 2.2.2, since the symmetric link-fusion might force us to represent some of the previously absent degeneracy tensors as blocks of zeros in memory -thereby giving up computational speed 1 and precision.
The computational complexity O tot of the symmetric tensor contraction is governed by the degeneracy tensor contractions, each of which can be carried out independently and is bounded linearly in all involved degeneracy dimensions by O deg = O k r=1∂ sr n−k r=1∂ ar m r=k+1∂ br . Consequently, if the degeneracy dimensions of all involved links are peaked around certain sectors, the total cost is governed by the largest contraction of degeneracy tensors involved. If however degeneracies come in flat or slightly peaked distributions over the active number of sectors in the involved links, we have S ≤ k r=1s r n−k r=1ā r m r=k+1b r / ā max ·b max . Consequently, when compared to the full, non-symmetric contraction, we observe a speed-up up to ∝ā max ·b max , that is, quadratic in the maximum number of (active) symmetry sectors among links of A and B.
In a similar way, we can also cast all contraction variants discussed in Sec. 2.2.2 into fully symmetric operations, depicted in Fig. (22).  Figure 23: Block-wise symmetric tensor decomposition (e.g. QR or singular value decomposition). From top to bottom: First the symmetric tensor is fused into a matrix over the indicated splitting. The resulting degeneracy tensors form a block-wise matrix and can be decomposed independently and at often greatly reduced computational cost. Their factors are separated into two symmetric tensors connected over a symmetric splitting link, weighted by singular values in the case of a SVD. Finally, the original links are split again with the help of fuse-nodes that were obtained in the initial fusing step.

Sec. 2.2.2)
. We require that the output tensors of such symmetric decompositions will be element-wise equal to the outcome of the corresponding non-symmetry aware operations. However, we must also require that these output tensors are all symmetric tensors again, and that they are mutually connected by directed symmetric links.
The usual QR-and singular value matrix decompositions indeed fulfill these requirements. Focusing on the SVD, we proceed by restating the procedure in Sec. 2.2.2 for symmetric tensors: Specifically, we want to decompose an arbitrary n-link symmetric input-tensor T over a certain link-bipartition (1, . . . , r), (r + 1, . . . , n) into two (semi-) unitary, symmetric output tensors V and W (holding the singular vectors) and a nonnegative diagonal matrix λ (holding the singular values). Both output tensors shall be connected to λ through an intermediate symmetric link (k, ∂ k ), Therefore, we do the following: (i) Fuse the symmetric input-tensor T over the chosen link-bipartition into a matrix over row-and column links (a, ∂ a ) = fuse( 1 , . . . , r ; ∂ 1 , . . . , ∂ r ) and (b, ∂ b ) = fuse( r+1 , . . . , n ; ∂ r+1 , . . . , ∂ n ) and keep the fuse-nodes. Due to the symmetry invariance, we obtain a block-diagonal matrix in which the degeneracy tensors R ab play the role of blocks.
(ii) Fix the direction of the intermediate link (k, ∂ k ) on V to be opposite to that of a, and find the set of quantum numbers k from the set intersection of row-and column quantum numbers a and b † b (where † b = † inverts quantum numbers if link b is incoming at V, and † b = 1 otherwise). Then identify all blocks R ab = R k by the single block-index k := a, which replaces row-and column quantum numbers a and b (related throughδ ab ).
(iii) Carry out the usual matrix decomposition on all degeneracy tensors independently (Fig. 23). For a SVD, we obtain block-wise matrix-factors V k and W k holding the left-and right singular vectors, multiplied by their respective singular values which we combined into a vector (or diagonal matrix) λ k : The degeneracy dimensions∂ k = min ∂ a ,∂ b of the factorization can be truncated by the exclusion of e.g. small singular values. To this end it is often necessary to first compute λ k for all blocks k and then globally identify the discarded singular values among them. This care is particularly relevant when the implemented Abelian symmetry is a subgroup of a larger non-Abelian one (as we discussed in Sec. 3.1): λ values which are quasi-degenerate should be consistently all kept or all discarded even when they are distributed across different sectors, as they are attempting to spontaneously form a multiplet of the underlying non-Abelian symmetry.
(iv) Compose symmetric output tensors V ak ∂a∂ k , W kb ∂ k ∂ b and λ k ∂ k from the obtained degeneracy tensors (or subtensors therein if selected singular values shall be discarded) According to our choice of intermediate quantum numbers, each k comes with exactly one contributing pair a, b that fulfillsδ ak =δ kb = 1.
(v) With the help of the fuse-nodes obtained in step (i), split the links a and b into the original links 1 . . . r on V and r+1 . . . n on W.
We note that a QR-decompositions follows the exact same procedure yielding two output tensors, one of which is unitary with respect to the intermediate link. Formally, such intermediate link is always given by the link-intersection of the fused partitions, or a subset therein if we compress the link, as we discussed in Sec. 2.2.2. Since the QR and SVD have computational cost highly nonlinear in their operands' dimensions, their symmetric versions can gain a lot of efficiency by operating independently on multiple smaller blocks instead of one large matrix. For example, a square matrix defined over two similar links comes at a full decomposition cost of O(d 3 a ) in the absence of symmetric inner structure. On a symmetric tensor, it might gain a factorā 2 , quadratic in the number of active sectors of a flat or slightly peaked distribution of degeneracy dimensions. Furthermore, these block-wise operations allow for a natural parallelization of the algorithm encoding.

Computational Cost Estimation.
The resources spent in a symmetric tensor operation depends on the number and dimensions of the degeneracy tensors and are thus highly problem-dependent. Nevertheless, we can isolate common scenarios and give cost estimates for the respective operations. As it will turn out, the symmetric operations presented here often display a highly favorable computational complexity compared to equivalent ordinary tensor operations. The latter usually constitute the worst case scenario for symmetric tensors. Therefore, provided that data-structures and algorithms with only a negligible overhead (as outlined in Sec. 3.3) are used for the internal quantum-number bookkeeping, we will always benefit from exploiting the additional inner structure.
We stress that all basic tensor operation listed in Sec. 2.2, on the level of degeneracy sectors, make use of the ordinary procedures presented for non-symmetric tensors. Therefore, we can easily estimate the cost of symmetric operations. Namely, if the complexity of a certain operation upon degeneracy tensors scales like O deg, 1 ... n = O ∂ p 1 1 · · ·∂ pn n , where p r are the maximal power coefficients for the degeneracy dimensions∂ r and, in total, n links are involved, then its cost on a symmetric tensor is bounded by 1} is an operation-specific symmetry constraint on the involved quantum numbers. Depending on the operation under consideration (we saw complexity examples in Sec. 2.2.2), this constraint can be even stricter than the "structural tensor"constraintδ formulated in Eq. (23) (which would e.g. be valid for a block-wise operation on a single tensor). In any case, we haveδ 1 ... n ≤δ 1 ... n .
We can therefore assess an upper-bound to the computational cost for a symmetric operation, corresponding to where O full is the complexity of the ordinary, non-symmetric operation. We observe that the symmetric operations outperform the ordinary ones for two reasons: (i) Due to the constraintsδ posed by symmetry (invariance), the number of blockwise operations between degeneracy tensors, S = 1 ... nδ 1 ... n , is rigorously restricted. Due to the sparsity of the allowed "block-wise" operations on degeneracy tensors, we can naively upper-bound their number by S ≤¯ 1 · · ·¯ n . However, since those degeneracy operations are only performed over symmetry-invariant quantum numbers, they at least fulfill the constraintδ. This allows us to further tighten down the number of combinations in S to S ≤¯ 1 · · ·¯ n /¯ max , where¯ max = max r¯ r is the largest number of sectors found among all links r = 1, . . . , n. When specifying an operation, we can possibly formulate even tighter restrictions, as one encounters e.g. for the symmetric contraction (Sec. 3.5.2). In practice, one finds even smaller S due to non-present matches.
(ii) The most costly operations are those that scale nonlinearly (p r > 1) with the dimensions. Those operations are also the ones to benefit the most from operating on smaller blocks.
For completeness, we need to account for the actual degeneracy dimensions∂ r . To this end, we sketch two extremal and one common degeneracy distributions of the symmetric links and study their cost estimates for all basic symmetric tensor operations. In particular, we demonstrate that we can benefit the more from symmetric tensors operations, the broader the indices of links are spread over different quantum numbers.
Flat distribution. We define a flat distribution as a distribution characterized by constant degeneracy dimensions∂ r = d r /¯ r , which only depend on the total dimension d r and the number of sectors (different quantum numbers)¯ r of a given link r, but not on the quantum numbers r itself. Consequently, all degeneracy tensor operations come at a similar cost, totaling to Out of the degeneracy distributions discussed here, the flat distribution is optimal for symmetric tensor operations, because it allows computations to benefit the most from the additional inner structure. Tensors with this distribution usually have many small degeneracy tensors, which significantly reduces cost in operations. The flat distribution is occasionally realized in symmetries with small finite numbers of available quantum numbers, e.g. Z n with small n, such as in the Ising-model (see Sec. 5.3).
Highly peaked distribution. In the extremal case of a narrowly peaked distribution, we can expect an operation with a dominating large degeneracy dimension.
The computational cost will then be dominated by these maximal degeneracy dimensions, approaching the full dimensions d r of the peaked links. In this case, the cost of performing the symmetric operation becomes roughly equal to performing the equivalent operation on non-symmetric tensors: O sym = O deg = O full . In this worst case scenario, symmetric tensors cannot distinguish themselves by any relevant inner structure any more and basic operations cannot benefit from symmetry invariance.
Slightly peaked distribution. The most common scenario in groups with more than a few accessible quantum numbers is a distribution of degeneracy over a wider, but effectively limited, range of quantum numbers¯ eff . If the degeneracy dimensions are more or less evenly distributed within such a range, we can resort to the results obtained for the flat distribution, replacing¯ by the distribution width¯ eff . Sometimes, degeneracy concentrates around certain quantum numbers, resembling e.g. a Gaussian distribution. The narrower such a distribution becomes, the more the cost will be dominated by the largest degeneracy dimensions in the involved links. Then, the effective number of active quantum numbers¯ eff on those links becomes very small, resulting in less advantageous cost estimates. We will encounter such scenarios in Sec. 5.3.

Loop-Free Tensor Networks
In this section, we address general properties of loop-free tensor networks, i.e. TN states whose network graph contains no cycles. In Fig. 24 some examples are given, the most prominent of which are the linear MPS [22,23], the hierarchical binary Tree Tensor Network (bTTN, discussed in detail in Sec. 5.2) [57,58], and the flat tensor product state, which finds applications in quantum chemistry [145]. Tensor networks without cycles can exploit the full power of the TN gauge transformations: Smart gauging establishes rules on the network which can tremendously reduce the number of necessary contractions, thereby enhancing computational speed (see Fig. 25). It has been argued that the computational advantage of working in such loop-free framework in fact enables the usage of larger link dimensions, and, in turn, it may partially compensate for the lower scaling of entanglement captured with respect to a "loopy" tensor network [146]. The textbook example is the comparison between a tree tensor network [57][58][59] which allows for efficient algorithms, but suffers from the fact that the maximal entanglement between two subsystems depends on the chosen bipartition of the system (entanglement clustering), and a MERA [53][54][55] where the entanglement distribution is "smooth", but calculations are expensive.
Loop-free TNs exhibit advantages also in relation to symmetries. In fact, while upgrading a generic TN (describing a covariant state) to its symmetric version may require an overall increase in link dimensions, for loop-free TNs the upgrade can always be performed in a link dimension-preserving way (see a proof in Sec. 4

.2.4).
In the following, we first introduce some general aspects of the structure of loop-free TNs (Sec. 4.1). We then discuss different possible gauges in loop-free TNs and their application (Sec. 4.2). Finally, we show how to randomly initialize symmetric ansatz states for loop-free TNs (Sec. 4.3.1).

General Aspects of Loop-Free Tensor Networks
The underlying graph structure of a loop-free TN defines a distance dist(a, b) between any two nodes a and b of the network, which is equal to the number of links encountered on the shortest path between them. In particular, for loop-free TNs such a path is always unique. It is practical to call all nodes which share the same distance d to some specified node c a level, L(d; c). We further introduce the maximal link dimension D := max η {D η } among all virtual links, which usually exceeds the physical link dimensions {d s }. As it will become apparent in the following, the maximal link dimension plays a fundamental role in any TN ansatz.
The computational complexity of algorithms operating on loop-free TNs is a smooth polynomial of D. This can be seen, for instance, in the calculation of the scalar product between two QMB states represented in the same loop-free TN geometry. Following a level-structure towards some center node, one can subsequently contract all tensors at the currently outermost level of the ket-state with their bra-state counterparts into temporary nodes, while absorbing any temporary nodes on links in between. The cost Knowledge about the gauge then allows us to simplify of the numerical calculations, here the calculation of the expectation value of some local observable ψ|O i |ψ . The bra TN is drawn as the vertically flipped version of the ket TN, indicating hermitian conjugation of all its tensors (d): By applying the rules, we see that most contractions (that would be necessary in a network without a gauge) do not have to be performed explicitly numerically. We are left with a much simpler network (e) where only the contractions that cannot be "annihilated" using gauge rules have to be executed. of these contractions is then upper bounded by O D Z+1 , where Z is the maximal number of links of a tensor in the network. Tighter bounds can be identified in specific situations, e.g. when certain link dimensions are known to differ substantially from D (such as in MPS, where one link at each tensor carries the physical dimension). Another immediate consequence of the absence of loops is that every link between two nodes q and q induces a bipartition of the QMB system. The Hilbert spaces of the two partitions are spanned by the canonical basis states {|i s : dist(s, q) < dist(s, q )} s and {|i s : dist(s, q) > dist(s, q )} s , respectively, i.e. by fusing all physical links s ultimately connected to either q or q after removing the bipartition link between the two nodes from the network. At any virtual link η, we can therefore rewrite the loop-free TN state by means of a Schmidt decomposition where |A k and |B k are states described in the (contracted) TNs of the two partitions and the Schmidt values λ k are sorted descendingly. On this basis, we make the following observations: • A loop-free TN can be viewed as a simultaneous Schmidt decomposition over all virtual bonds. This point of view is emphasized in the canonical gauge of Sec. 4.2.3.
• At each network-bipartition, the bond dimension D provides an upper bound to the Schmidt rank, which measures bipartite entanglement.
• Discarding the smallest singular values achieves the optimal single link compression. Precisely, given a virtual bond η, we wish to reduce its prior dimension D η to some smaller dimension χ. We achieve the compression now by simply truncating the summation at k = χ in Eq. (33) and discarding (setting to zero) the smallest Schmidt values in the decomposition. Thus, we obtain the following truncated state with normalization constant Any other way of discarding λ k 's leads to a larger error in quantum state fidelity, according to the Eckart-Young-Mirsky theorem [147]. Note that while optimal for a single link, compressing a quantum state by discarding Schmidt values on multiple bonds simultaneously is not the optimal way of doing so (as each compression is a local non-unitary operation; see techniques for an enhanced compression approach, e.g. Ref. [30] and Sec. 5.1.6). The most common purpose for link compression in loop-free TN algorithms, such as the ground state search portrayed in Sec. 5.1, is to preserve the bond dimension D throughout the simulation: This is often ensured by re-compressing bond dimensions after temporary growths.
• Similarly to performing a bipartition through link-cutting, detaching an n-link network node from a loop-free TN induces a partition into n disjunct subspaces. Each virtual index then contributes variational degrees of freedom in form of associated tensor elements.

Gauges in Loop-Free Tensor Networks
In Sec. 4.2.1 we review the freedom introduced by gauges specifically in the framework of loop-free TNs. We present the specific gauges that are related to the orthogonality properties of the tensors in the network. We dub these gauges center gauges and discuss in particular the unitary gauge (Sec. 4.2.2) and its refinement, the canonical gauge (Sec. 4.2.3). Both are of great practical and conceptual importance. They install isometries in each tensor, which strongly simplify contractions of the network with its conjugate (see again Fig. 25). They also allow for efficient local error estimation and application of local operators in time-evolution algorithms and measurements.
In variational algorithms, they maintain the network state's normalization during local optimization (see Sec. 5). Conceptually, the canonical gauge provides an understanding of the loop-free TN ansatz as simultaneous, independent, truncated Schmidt decompositions over all lattice bipartitions induced by the network geometry. Then, we introduce the symmetry gauge (Sec. 4.2.4): we demonstrate that any loop-free network ansatz state of a global pointwise symmetry (with fixed sectors as introduced in Sec. 3.3.4) can be recast into a symmetric TN, i.e. made entirely of symmetric tensors, through gauge transformations. Remarkably, this is compatible with both unitary and canonical gauge, meaning that all those gauges can be installed at the same time, and without increasing the bond dimension. Once installed, the symmetry-gauge is strictly protected for any algorithm formulated in the symmetric tensor operations that we described in Sec 3.
This last property is shared by the real gauge (Sec. 4.2.5), which can be useful in the presence of time-reversal symmetry.

Gauge Freedom.
In the search for the optimal gauge, it is helpful to understand that the whole freedom we have in choosing the tensor-elements to encode a specific ansatz state |Ψ in a loop-free network with given bond dimensions is entirely determined by link-local gauge transformations as introduced in Sec. 1.7. That means, the only possible difference in the choice of tensors are matrices X, Y inserted as products X · Y on virtual links and contracted into the adjacent tensors. As we demonstrate in the following, these matrices will fulfill Y · X = 1, but X · Y = 1 in general.
Gauges on Virtual Links and Minimal Bond Dimension. Let the TN have a virtual link η of dimension D η . We can perform a Schmidt decomposition over the original state |Ψ between the two subnetworks connected by η, and possibly obtain a lower bond dimension χ ≤ D η . We now discuss how to gauge-transform the tensor network via local operations to obtain the explicit decomposition.
The reduction of dimension is non-trivial as it requires the insertion of a pair of matrices X and Y of dimensions (D η × χ) and (χ × D η ) on the link. Theoretically, the existence of the gauge matrices can be shown as follows: • In the given TN, we contract over all links except for the selected bond η and obtain two tensorsÃ andB for the respective bipartitions. We can rewrite these tensors as matricesÃ andB. As every physical site of the system belongs to one of the two partitions, we collect the physical degrees of freedom of the respective subspaces in indices a and b. We can then write the coefficient tensor of the TN state as Ψ a,b = kÃ a,kBk,b where k ∈ {1, . . . , D η } is the index of bond η.
• Starting from the original state |Ψ , we can also compute an exact Schmidt decomposition Ψ a,b = χ k=1 A a,k B k,b where we already absorbed the Schmidt values into the matrices A or B and the indices a and b label the physical degrees of freedom as above. χ denotes the Schmidt rank of Ψ and we assume that χ < D η , i.e. for the description of Ψ a lower bond dimension than given in the TN above suffices.
• We will now determine two gauge matrices X and Y which can then be inserted on bond η and absorbed into adjacent nodes, thereby reducing the original linkdimension from D η to χ, without modifying the encoded state: Equating A · B = A ·B reveals the local gauges and where the expressions in square brackets now define the anticipated gauges X and Y . The right-inverse (left-inverse) B −1R (A −1L ) exists due to full column-rank (row-rank) χ from the lossless compressed Schmidt decomposition. A quick check then assures the existence of X −1L = Y and Y −1R = X respectively: • By repeating the procedure on all virtual links of the network, we obtain the gauges that transform an arbitrary given TN into an equivalent TN of minimal bond dimensions equal to the Schmidt rank on the respective bond.
It follows that two loop-free TN representations TN 1 and TN 2 of the same state |Ψ with the same geometry, are connected by a link-local gauge transformation: We can explicitly construct two sets of local gauges {X 1 , Y 1 } and {X 2 , Y 2 } which transform either one into the (same) TN with minimal bond dimension. Starting from the un-isometrized network (a), tensors with largest distance are isometrized first, the next tensor towards the center is updated (b). Then the isometrization continues on the next level (c). This procedure is repeated until the center (red) is reached.
Gauges on Physical Links. On physical links, gauge transformations shall be restricted to unitary transformations to keep the associated basis states orthonormal. Such a change of basis can e.g. be helpful for entirely real Hamiltonians, as these allow us to solve eigenproblems with entirely real TNs, therefore consuming less computational resources. Gauge transformations on physical links can also be used to install the symmetry gauge (more on this in Sec. 4.2.4). Note that within symmetric TNs, we are restricted to symmetric gauge matrices, and the local basis can only be changed within degenerate manifolds associated to the quantum numbers.

Unitary Gauge.
The unitary gauge is always defined with respect to a single selected node c of the TN and therefore also referred to as "central gauge" (some literature refers to this as the canonical gauge, while we prefer to adopt this term for the gauge defined in Sec  (iv) Advance inwards to d → d − 1 and repeat from step (i) until d = 0 is reached. After this procedure all tensors have been updated toT [q] . Also, virtual links might turn out reduced in bond dimension from D η toD η , since our QR-decompositions applied in step (i) automatically selects the minimum between D η and the product of the remaining link dimensions.
Note that the iteration presented above indeed implements a global gauge transformation, because it does not change the physical state of the network: neither QR-decompositions nor the propagation of R-transformations over virtual bonds alter the contraction of the affected tensors over shared virtual links. Computational complexity is dominated by the QR-factorizations (see Sec. 2.2.2), upper-bounded by O D Z+1 where D is the maximal bond dimension and Z the maximal number of links of a tensor. For algorithms operating in unitary gauge, a common task is to move the center from one node c 1 to another node c 2 (as we will see in the variational search for groundstates in Sec. 5). This can be achieved by a sequence of subsequent QR-decompositions of all nodes q on the path between c 1 and c 2 , scaling linearly with the distance, O(dist(c 1 , c 2 )). In every step, we obtain a possibly different non-unitary part C [η] which is passed over the shared bond η from the re-isometrized nodeT [q] to the next tensor on the path towards c 2 . This process is shown in Fig. 27, and yields the isometry relation of the unitary gauge, shown in Fig. 28(a) where C [η] and C [η] are the (non-unitary) center matrices defined on any two links η andη of q respectively, as they might be encountered on a path moving the center. This relation holds for every network-node q and transforms the tensor T [q] , which is isometric over η, intoT which is isometric overη. If we had access to the single-sided inverses (C [η] ) −1 , the re-isometrization in Eq. (39) could be performed by local contractions avoiding costly matrix factorizations (see Fig. 28(b)) and the unitary gauge transformations could be understood in terms of linklocal gauges as discussed in Sec. 1.7. However, the inversion requires the matrix C to be of full rank, and we need to find a way to reduce the possibly oversized matrix-dimensions first: it exists a very elegant gauge that realizes such minimal bond dimensions together with a very efficient re-isometrization rule. We will discuss this gauge in the following section.

Canonical Gauge.
A canonical tensor network [46] is obtained from singularvalue decomposing the unitary-gauge center matrices C [η] on all bonds η, as in γη,βη .
unitary gauge (3) can therefore be introduced on the respective links at will, see Fig. 30). They are absorbed into the adjacent nodes, thereby updating the network tensors T [q] →T [q] into the canonical tensors as illustrated in Fig. 29. Singular values that are zero or fall below a numerical threshold are discarded, so that the virtual links η attain minimal bond dimension χ ≤ D η equal to the center-matrix rank.
A practical approach to installing the canonical gauge does not have to find the center matrices C [η] on all bonds explicitly. It is sufficient to install the unitary gauge with a single center-bond η towards which all tensors are isometries. Then a series of SVDs (instead of simple QRs) for moving the unitary center, performed in reverse order from the center outwards, establishes the center matrix subsequently on all bonds (as in Fig. 31).
Due to the orthonormality induced by the isometries, every SVD of the center matrix, which can be considered as a reduced amplitudes tensor, also achieves a Schmidt decomposition of the complete TN state, with Schmidt values λ [η] and Schmidt rank D η on every virtual link [149]. Hence, the canonical gauge can be viewed as the explicit and simultaneous Schmidt decomposition of the QMB state on all lattice bipartitions induced by the loop-free network geometry.
By uniqueness of the Schmidt decomposition (up to phases and degeneracies) when (1)  (red) for all links of a network, we can perform a re-isometrization of a node (and consequently of the full network) using only numerically efficient contractions with these "link-weights" and their inverse (see Sec. 2.2). This approach is much faster than the iterative sequence of QR-decompositions in Fig. 27. Due to the minimal bond dimension in canonical gauge, the inversion is always possible. Moreover, the Schmidt values can be efficiently stored.
ordering the Schmidt values descendingly, the canonical gauge becomes similarly unique. This gauge implements minimal bond dimensions simultaneously on all network links, which cannot be further reduced without altering the state. The main reason for actually installing the more costly canonical gauge is that the isometry-relation Eq. (39) translates for the canonical gauge into a mere multiplication with diagonal matrices (scaling tensors with link-weights, cf. Sec. 2.2.2 and Sec. 3.5.2), as shown in Fig. 32: Again, the tensor T [q] , which originally is isometric over η, turns into an isometryT [q] over another of its virtual linksη. As we can easily afford to store and invert the Schmidt values λ [η] of all bonds, we are able to quickly re-isometrize the TN with only local operations. A possible danger when encoding this step is that the inversion of small Schmidt values (if not discarded in a compression) can result in numerical instabilities. However, workarounds to this have been discussed [150]. The re-isometrization relation of the canonical gauge allows for parallelized algorithms (see Ref. [149]) and eliminates directional dependency of errors, arising for instance in the compression of virtual links performed by many TN algorithms. The canonical gauge can therefore be the preferred choice in algorithms driven by (local) unitary evolution, e.g. real-time evolution algorithms, because a unitary operation does not alter the Schmidt decomposition except for the bonds it acts on [149]. Hence, (costly) re-intallation can be avoided. Up to a certain extend, this can also be true for operations which are only approximately unitaries or identities. This includes the aforementioned compression of virtual links, if it is carried out by truncated SVDs which locally restore the gauge.

Symmetry Gauge.
In the symmetric gauge, all tensors in the TN representing a covariant state become symmetric tensors as discussed in Sec. 3. Contrary to the gauges discussed so far, its installation may require a (fixed) transformation in the physical basis. While in practice, one usually never explicitly transforms into symmetric gauge, and no simple recipes exist for such operation, we show that such a transformation is theoretically possible for any TN state: (iii) Contract the whole TN into a single tensor Ψ. This tensor is now by definition invariant or covariant and can directly be written as a symmetric tensor as in Eq. (20). In the covariant case, select = 0, we need to attach a corresponding quantum number selector link onto this tensor.
(iv) Decompose this tensor again into the same geometry, but now resorting to exact symmetric tensor decompositions as described in Sec. 3.3.4. The details of such decompositions (bond dimensions etc.) are not important. We can always obtain the original (loop-free) geometry, representing the same state (basic decompositions do not alter the state); hence we have already shown that it is a gauge transformation.
(v) Install the canonical gauge in the symmetric TN. As it relies only on basic QRs, SVDs, fusing (and subsequent splitting) operations, and contractions -all of which are well-defined symmetric tensor operations -we can always install the canonical gauge in a symmetric TN. By uniqueness of the Schmidt decomposition (up to degeneracy and ordering of singular values), we can obtain the same minimal bond dimension as in a non-symmetric TN. Note that the local basis transformations in step (ii) explicitly do not alter the bond dimensions, because as local operations they cannot increase the entanglement (Schmidt rank).
Note that such a construction is also possible in loopy geometries. However, the canonical gauge of Sec. 4.2.3 does not exist in such networks and the loops introduce further constraints on symmetric links that may lead to larger bond dimensions. The symmetric tensor ansatz can still be beneficial for a loopy network, but a trade-off between the speed-up through the use of symmetric tensors and the unfavorable increase of bond dimensions must be made [151].

Real Gauge.
In the real gauge all tensor elements are in the real domain. As such, it is not restricted to loop-free networks. If it can be installed in the tensors of a TN state and the operators (Hamiltonian) acting on it, the real gauge can increase computational speed and reduce the variational space by 1/2. This is, e.g., the case in diagonalization problems, where it can usually be installed in the presence of a timereversal symmetry (see Appendix A). As for the symmetry gauge, the TN operations we are considering can always be implemented such that they yield a real output if the input is real.
When time-reversal and global pointwise Abelian symmetries are present, we can often find a local basis-transformation that simultaneously installs the real gauge on top of the symmetric gauge, by exploiting the freedom of phases, or more generally unitary transformations in degeneracy spaces, in the local physical basis.

Preparation of Symmetric TN Ansatz States
The initialization of symmetric tensor network states presents an additional level of difficulty with respect to initializing ordinary TNs: This is due to the fact that we need to assign a symmetry group representation to every virtual link, and these representations must be such that the symmetric tensors contain actual matches. In the pathological scenario of a tensor not containing any match due to the link representations, the TN is representing no state at all (zero vector). As as rule of thumb, the more matches are allowed, the less information in the TN is wasted. Constructing meaningful initial representations for virtual links is in general a non-trivial, non-local problem that involves fusion rules of the symmetry, but can nevertheless be treated on a general ground.
The simplest approach is to initialize the symmetric TN state as a product state, i.e. in any configuration which respects the symmetry and for which every virtual link holds only a single sector of degeneracy one. Consider, for instance, a system with N physical sites, each of which can hold up to n particles, and a U(1)-symmetry, which fixes the total particle number N p . We could assign these particles to physical sites and, following the selection rules of the symmetry, assign a single sector of degeneracy one to every virtual link of the network.
In a more sophisticated approach, we could use knowledge about the expected spatial particle distribution in the state of interest (usually the ground state). As discussed earlier, any link η of a loop-free tensor network introduces a bipartition of the underlying physical system into e.g. x and N − x sites. Typically, the expected value of particles in the subsystems then establishes the most relevant sector of η with the largest degeneracy. Around that sector, and in the range of possible quantum numbers, one can now distribute degeneracy dimensions with a certain variance until the maximal bond dimension is exhausted. Suppose we have no prior knowledge about the distribution of the particles and their correlations. For fermions or hard-core bosons we could then make a binomial ansatz and consider initializing the sector (equal to the number of particles in the subsystem of length x) with degeneracȳ Additional information, such as a harmonic confinement of the system, can be taken into account by centering the largest degeneracies around particle expectations from e.g. a Thomas-Fermi density distribution. If we expect an insulating behavior where the quantum numbers are spatially locked, a narrowly peaked degeneracy distribution with small variance might be a good ansatz. Obviously, such initialization schemes are highly problem-dependent. As a very general initialization method, we now introduce a scheme based on randomization, which makes no a priori assumption and thus can be always adopted.

Randomized Symmetric Ansatz States.
In this construction, we start with a loop-free TN where the geometry, the physical links and the selector link are defined and a maximal bond dimension D is given, while other virtual links and symmetric tensors are to be constructed. The recipe we present guarantees that the network is non-trivial in the end, and that all the degeneracy tensors in the network may indeed contribute to the state (and do not drop out in a contraction to the complete amplitudes tensor). This recipe can be applied to an arbitrary symmetry group or loop-free TN geometry, (i) We choose a center node c and assign a level-structure to the network, as we discussed in Sec. 4.1.
(ii) Starting from the outer levels (branches) and moving inwards (towards the root), we fuse the outer links (already defined) of a tensor and assign the inner link to be the exact fusion of the outer links. We iterate this step towards the center node, so that the outer links are always well-defined (they possess a representation), by recursion. Option: We can limit the dimensions of the inner links by intersecting the exact fusion of the outer links with a user-defined "maximal" link (for symmetric link intersection see Sec. 3.5.1). This option presents a possible bias to the random state, but it can make the approach numerically more efficient.
(iii) Once we reach the center node and all the links have been defined by the previous step, we start to clean-up the links from the center outwards. First, we clean-up all the virtual links connected to the root tensor. Then we move outwards along the branches until all the virtual links are cleaned-up. At this stage, no truncation has been performed yet.
(iv) Select randomly a virtual link whose total dimension is larger than D. Within this link, remove randomly a single symmetric link index. Then clean-up all the other virtual links, starting from the closest ones in the distance and moving outwards, to update this change. No link is completely emptied by the clean-up procedure as long as D > 1. Repeat this step until all virtual links have dimension D or less. This completes the virtual links setup.
(v) Finally, generate the tensors. Within each tensor, include all the possible matches given the links, and assign a random degeneracy tensor to every match (see Sec. 3.5.1). Eventually, install a gauge.
Note that in the reduction step (iv), at every iteration we reduce a link dimension by one and then propagate the truncation: This is a conservative choice, which keeps the probability of failure of the scheme as low as possible. Nevertheless, quicker reduction strategies can be considered (always follow a dimensional reduction with a clean-up round of the whole link network). Now we have highlighted the main features of loop-free TNs, and identified the complete freedom in defining the tensors and links therein, exploited in the unitary, canonical and finally the symmetric gauge. In the next section we take advantage of those features and describe exemplarily a TN algorithm which searches ground states in a loop-free TN.

Ground State Search with Loop-Free Tensor Networks
In this section we discuss a robust scheme for achieving ground states (GS) of Hamiltonians by means of tensor networks with no loops (or cycles) in their geometry. This is a natural extension of the DMRG algorithm [30,31] adapted to any tensor network architecture without cycles [145,146,152]. The absence of cycles leads to several computational advantages: we have already encountered one of them (namely the TN gauge freedom) in the previous section and we will exploit it extensively in what follows. Another striking advantage of a loop-free geometry is the possibility to formulate local optimization problems, which are at the heart of efficient variational energy minimization procedures, such as simple eigenvalue problems. This feature is not present in TNs with cycles, where often no exact strategies for the solution of local optimization problems are available, and one has to rely on approximate approaches (as occurring e.g. for MERA [153], PEPS [98,154]). Moreover, loopy tensor networks often imply performing more expensive tensor contractions, and dealing with symmetries is less straightforward (as mentioned in Sec. 4.2.4): this is, for instance, the case for periodic boundary DMRG, where specific methods must be engineered to work around these issues [155].
For these reasons, we choose to focus on ground state search algorithms for loop-free TN setups instead. This will break down the linear algebra demands to only three basic procedures (contraction, SVD/QR decomposition, simple eigenvalue problem) and take advantage of adaptive TN gauges [145,146].

Ground State Search Algorithm for the Adaptive-Gauge Loop-Free TN
In this section we will describe in detail the algorithm we adopt to achieve the QMB ground state variationally with loop-free TNs. The basic idea is to break down the diagonalization problem for the full QMB ansatz into a sequence of smaller, manageable diagonalization problems, each of which is formulated for a small subset of tensors (in the simplest case for one single tensor, as illustrated in Fig. 34(a)). Starting from some (e.g. random) initial state in the manifold parametrized by the TN ansatz, we thus employ an iterative scheme of local tensor optimizations ("sweeping", see Fig. 34(b)) that converges to the state which best approximates the true QMB ground state. This approach is akin to what is known in applied mathematics as domain decomposition [156,157].
Before listing the algorithm in Sec. 5.1.2, we first describe in detail the high-level operations (based on the elementary ones defined in Sec. 2 and 3) that it will rely on. Note that what follows applies both to symmetric and non-symmetric loop-free TNs, with the obvious requirement that in the symmetric case all tensors and tensor operations have to be formulated in a symmetry-aware fashion as detailed in Sec. 3. The variational tensor forms the "optimization center", while the fixed tensors form an "environment". The latter can be efficiently contracted with the Hamiltonian H, leading to a reduced "effective" Hamiltonian H eff . (b) By solving the reduced optimization problem sequentially (for varying locations of the optimization center), the energy expectation value is gradually reduced. This iterative minimization is commonly called "sweeping".

High-Level Operations.
In what follows, we consider a loop-free TN of tensors T [q] sitting at network nodes q = 1 . . . Q, mutually joined by virtual links η (with dimension D η ). We can arbitrarily choose any node to be the center node: Here, we exclusively work in the unitary (center) gauge (see Sec. 4.2.2) and extend the concept of a root to the notion of an effective Hamiltonian.
Hamiltonians. For the GS algorithm we consider Hamiltonians of the form and we focus on Tensor Product Operator (TPO) [145] interactions of the form where h

[s]
p,k(s) is the k-th local operator of the TPO acting on site s and the operators are interconnected by links µ = (k, k ), labeled by the two operators k and k they connect. The sum runs over all TPO links {µ}, and we define {µ} k(s) ⊆ {µ} as the set of all links µ attached to the k-th operator of the TPO. The outer product can be restricted to sites on which H p acts non-trivially. TPOs can be viewed as a generalization of Matrix Product Operators (MPOs) [30,45,158], widely used in the context of MPS, to arbitrary TN geometries. Some examples for how a TPO H p can look like are given in Fig. 35. Computational efficiency is maintained by keeping the number and bond dimensions of TPO links small. Specifically, we consider interactions efficient if they are compatible with the TN geometry, in that every network bipartition over any virtual link η corresponds to a cut of two non-trivial TPO links (see Fig. 36). More generally, the condition to be met in order to keep computational efforts manageable is that the maximal total dimension D µ of links crossing any bipartition should be non-scaling (or at least weakly scaling) with the system size. Typical 1D lattice Hamiltonians fulfill this criterion, including sums of nearest-neighbor interactions, long-range two-site interactions, and TPOs in alignment with (i.e. having the same arrangement as) the physical sites, as exemplified in Fig. 35.
As the representation in Eqs. (43) and (44) is not unique, it has to be decided individually whether e.g. a large number P of lightweight interactions (i.e. with small bond dimensions D µ ) is more efficient than an equivalent representation with smaller P but more or larger TPO links {µ}. On the one hand, there are certain interaction types (e.g. exponentially decaying interactions) which are known to have a very efficient global TPO representation, characterized by a constant, non-scaling link dimension D µ [30]. On the other hand, it has been noted that a single global TPO representation can be less efficient than treating the individual interaction terms separately [145], for example when interactions are long-range and inhomogeneous (e.g. in spin glass models or quantum chemistry applications). In particular, the algorithm can easily exploit the locality (i.e. the spatially restricted non-trivial action) of individual interaction terms, as shown in the next paragraph.
Renormalization. To begin with, every physical TN link s on which the "bare" Hamiltonian term H p acts non-trivially is associated to exactly one local operator h [s] p,k . We renormalize the system by mapping all local Hamiltonian operators acting on a region (bipartition part) S ζ , formed by the lattice sites separated over link ζ, into a single "renormalized" Hamiltonian term h [ζ] p,k , where k is the set of all local operators of the TPO in the renormalization k(ζ) := ∪ s∈S ζ k(s); in particular, k(s) = k(s) on physical sites.
We now define a recursive relation that practically defines renormalized Hamiltonians on all virtual network links (to be performed for each contribution H p separately). We start with collecting all terms on all links {ξ} q not directed to the root at some node q and contracting over the mutual virtual links among those: The result combines k(ζ) = ∪ ξ k(ξ) physical operators of the TPO. To obtain the renormalized Hamiltonian over link ζ we have to contract over the network links {ξ} q p,k(η) , each with a different p. Not for every p in the Hamiltonian's sum Eq. (43) there must be a tensor defined at every link η; the missing ones are implicitly treated as identities. (b) Renormalizing the local interactions from the links ξ, ξ , ξ towards the virtual link ζ. Each interaction p is contracted independently into a renormalized term, and the global sum over p is not carried out explicitly. An exception are renormalized operators p that have no TPO links left (for example the two independent interactions indicated in red), which can already be summed into a single new renormalized tensor. (c) The effective Hamiltonian involves the contraction of all renormalized Hamiltonian terms over all tensor links {η} c at the root node c. Often, the final sum over p is postponed, e.g. until contracting the individual terms h [c] p,eff into the tensor T [c] as described in Sec. 5.1.3 (solving the reduced optimization problem). The figure also shows the root tensor (semi-transparent) to highlight how it is acted upon by the effective Hamiltonian, which is a key ingredient for the GS algorithm. with the node-tensor T [q] and its complex conjugate as depicted in Fig. 37(b): When working in the unitary gauge, the tensor T [q] appears as an isometry over the links {ξ}. Therefore, it is immediately clear that the result of Eq. (46) will be the identity whenever Eq. (45) produced an identity, i.e. whenever the TPO acts trivially on all links {ξ}. This feature can save a lot of computational time on renormalizing interactions with spatially restricted (local) support. Another optimization which can be applied at this stage is to carry out the sum p over all renormalized, non-interacting parts where "env" stands for the TN contraction over all tensors except T [c] , i.e. the environment to the root node. With this definition, the energy expectation value E of the TN state can be written as (cf. Fig. 34(a)) In practice, to obtain H [c] eff , we do not need to contract the entire environment, if we have already the renormalized Hamiltonians (as defined in the previous paragraph) on all links of node c available. Then, Eq. (47) reduces to effective Hamiltonian terms (see Fig. 37 p,eff .
The whole process of building the effective Hamiltonian is summarized in Fig. 39.
Projectors. Adding projective operators to the Hamiltonian is useful e.g. for exploring degenerate GS manifolds or for targeting excited states (we will comment further on this paradigm in Sec. 5.1.5 below). Precisely, we apply the ground state search algorithm to the Hamiltonian H = H + H p where H p = ε p |Ψ p Ψ p | to find the first excited state of H, assuming its ground state |Ψ p is known and where ε p is much larger than the energy gap. Here we discuss the explicit construction of projective summands H p = ε p |Ψ p Ψ p |, where |Ψ p is expressed as a TN with the same geometry as the variational TN ansatz state. The scaling factor ε p will play the role of an energy penalty for occupying the state |Ψ p . The projectors are thus defined via the (not necessarily isometric) tensors T

[q]
p with possibly different dimensions in the virtual links (and symmetry sectors if they are symmetric; but the global symmetry group has to be the same). Projective operators can be implemented within the same structure and almost the same equations as for the renormalized and effective Hamiltonians. For a graphical representation, see Fig. 40 Note that we dropped the internal TPO label k here because the h p are defined on every link without TPO links. It is important to keep in mind, however, that the projective terms H p have global support. Therefore here we can never carry out the sum p , in contrast to renormalized TPO interactions with local support which have no TPO p have no TPO links, but in contrast to the rule for TPO interactions they cannot be summed up over p (every projector results in a separate renormalized two-link tensor on each link). c) The effective projector at the root c is given by the contraction of the renormalized projectors over all links with the conjugated projector-tensor T  links left. Furthermore, isometry can no longer be exploited here, and the resulting renormalized projectors h [ζ] p will in general be different from the identity. The effective projector at a node c becomes ε p Ψ env |Ψ p Ψ p |Ψ env , which is analogous to the definition of the effective Hamiltonian. Written out explicitly it reads whereT Note that this lengthy expression is rather simple to compute, by contracting all renormalized projectors into T p,eff , see Fig. 40(c). The whole process of building effective projectors is summarized in Fig. 41.
Some specific applications, such as variational addition of TNs in Sec. 5.1.6, employ projective operators |Ψ p Ψ p | onto superpositions of TN states. Those are of the form |Ψ p = i ε p,i |Ψ p,i with all |Ψ p,i given as separate TNs sharing the same geometry. In such a case, one computes renormalized projectors h [ζ] p,i for each state in the superposition separately -just as before, by simply replacing the single projector index p with p, i in Eq. (50). A more relevant adjustment is to be made in the effective Hamiltonian, which now reads i,j ε p,i ε * p,j Ψ env |Ψ p,i Ψ p,j |Ψ env . Correspondingly, as a replacement for the single factor ε p in Eq. (51), one additionally has to sum over superposition states in Eq. (52):T Nevertheless, evaluating such an effective Hamiltonian remains a simple computation by successive absorption of all effective projectors into T [c] * p,i .

Ground State Search Algorithm in Detail
We now describe the algorithm, which is based on the high-level operations introduced in the previous section, and follows the flowchart shown in Fig. 42. In the following, we list additional explanations and remarks, useful when implementing such an algorithm.
Setup and Configuration Tasks.
(i) Definition of the TN geometry. From the physical point of view, the TN should be designed such that it complies best with the system and the Hamiltonian being targeted. On the one hand, the TN has to be compatible with the interaction terms in the Hamiltonian. On the other hand, it has to be capable of hosting the required amount and spatial distribution of entanglement, controlled by its virtual links. Finally, the entanglement arises from the interactions and (at least n n+1 Optimization loop • . . . for ground states) can often be theoretically proven to be upper bounded (e.g. by area laws [26]). Fig. 43 summarizes schematically these requirements. At the purely technical level, any network fulfilling the following requirements is allowed: • The network has N physical links, supporting the system Hamiltonian H (defined in Eq. (43)). • The network is acyclic: for any pair of nodes q, q there exists a unique path connecting q and q . • The dimension of any virtual link in the network does not exceed D (bond dimension of the TN). • While in principle it is not prohibited, the employment of tensors with less than three links is redundant and may deadlock symmetries. Absorbing one-link and two-link tensors into adjacent tensors is strongly suggested, as it leaves the variational TN state manifold unchanged while reducing the technical issues with symmetries (see Sec. 5.1.3).
(ii) Definition of sweep sequence: this is the rule according to which the optimization centers are chosen sequentially in the optimization loop. In principle, any sequence that visits every tensor of the TN (and every variational link, in case of multi-tensor optimization) is allowed. The specific form of the optimal sweep sequence, i.e. the one that achieves fastest energy convergence within the shortest computation time, depends on the TN structure and the model under consideration. For regular hierarchical networks (e.g. binary TTNs, see Sec. 5.2) it is natural to select a sweep sequence based on the distance from the physical links: Assign an optimization precedence p := min s [dist(c, s)] to every optimization center c, where s is a physical link. Within a sweep sequence, optimization centers with smaller p take precedence over centers with larger p. Centers sharing the same p are chosen in such a way that the distance that has to be covered in order to visit all of them is minimal. For less regular networks (e.g. TNs for quantum chemistry problems) other schemes (such as depth-first search with backtracking) have been proposed [145].
(iii) Initialization of TN state: an obvious choice is to start from a random state (see Sec. 4.3.1 on how to construct such a random state for a symmetric TN with a defined global quantum number). This choice guarantees the least possible bias and therefore bears the smallest risk of ending up in a local energy minimum state, which would corrupt the convergence to the GS (by definition the global energy minimum state). Another valid option is to start from an already computed GS, previously obtained with the same algorithm on another Hamiltonian, assuming the symmetries and quantum numbers are the same.
(iv) Targeting of initial optimization center c: • Transform the network into unitary gauge with respect to c (see Sec. 4.2.2).
• Build up the effective Hamiltonian for c, according to the procedure outlined in Sec. 5.1.1.
In the following, we will refer to an optimization center c for which these two tasks are carried out as the root node of the network. In case of multi-tensor optimization, the root node is constituted by two or more network nodes. This is achieved by contracting some adjacent nodes {q} to a single node c. The renormalized Hamiltonian terms and isometry properties have to be computed on all remaining links and tensors, respectively. After the root node has been set up, the effective Hamiltonian is given by Eq. (49). The energy expectation value E of the TN state is then (see Eq. (48)) where T [c] is the root node tensor. Now, the generic local (reduced) optimization problem consists in determining the variational tensor T [c] in such a way that it minimizes E. This optimization problem is solved sequentially for all centers c in the following optimization loop.
Optimization Loop. Since E(s) is expected to approach the GS energy asymptotically from above, a possible convergence criterion is to demand where is a constant (positive) threshold. In practice has to be adapted according to the bond dimension D in order to avoid unnecessary sweeps that would push the convergence too far beyond the accuracy limit imposed by finite bond dimension. Usually, the bond dimension dependence of this accuracy limit (and hence the form of (D)) is well modeled by some polynomial or exponential decay (or combination thereof), e.g. one might choose the ansatz (D) = a+b D −c exp(−D/d). In practice, the parameters a, b, c, d need to be determined empirically by some fitting procedure because the form of the decay highly depends on the model parameters (system size N , proximity to a critical point, etc.).

Optimizer -Strategies.
Here we describe in detail various possible strategies when interfacing with the reduced optimization problem (while how to practically perform the numerical optimization routine will be discussed in Sec. 5.1.4). The most commonly employed schemes for this purpose are the single-tensor update and the double-tensor update, where the optimization centers consist of one tensor and two tensors, respectively. We additionally introduce a single-tensor update with subspace expansion.
The three schemes differ in the complexity of the variational problem they solve, and have different requirements on computational resources. The first two schemes, single-and double-tensor update, are a generalization of DMRG to arbitrary loopfree TN architectures. They are directly related to the traditional single-center site and double-center site DMRG [31] respectively. Finally, the subspace expansion allows to reconcile the improved convergence behavior of the double-tensor update with the favorable computational cost scaling of the single-tensor update. This scheme is similar in spirit (though not the same) to known techniques for curing the shortcomings of the traditional single-center site DMRG [113,159].
The single-tensor update is the most straightforward and computationally inexpensive scheme.
However, it can experience serious shortcomings: If the Hamiltonian under consideration exhibits a certain symmetry, the same symmetry will usually be present in the effective Hamiltonian of the reduced diagonalization problem. As a consequence, local optimizations are constrained to preserve the link representations on the individual links of the optimization center. The present quantum numbers (sectors) and their associated degeneracies can therefore not be handled variationally. This issue is particularly acute for symmetries that are encoded in the TN ansatz (e.g. by the methods described in detail in Sec. 3): in that case the mentioned constraints are hard and will be exactly (by construction of the tensors) respected in each local diagonalization. For TNs made from tensors that are unaware of symmetry rules the constraints are softer: In this case they are only metastable configurations which can be violated through numerical fluctuations, caused e.g. by finite precision floating point operations or by the fact that the environment (i.e. the tensors surrounding the optimization center) does not obey the symmetry. Nevertheless, the existence of these metastabilities still often leads to poor convergence behavior (as observed for example in the standard single-site DMRG algorithm [159]).
The standard way to cure this issue is to formulate the diagonalization problem not for a single tensor, but for several (usually two) adjacent tensors that previously have been contracted together. After optimizing this compound tensor, the original tensor structure is restored by means of SVDs. The accompanying truncation of the state spaces associated with the compound's internal links provides the variational handle lacking for external links, ss illustrated in Fig. 45. The downside of the multitensor optimization strategy is its higher computational cost resulting from the increased number of links of the involved tensors.
In order to avoid the increase in computational costs, we can employ so-called subspace expansion techniques. These techniques still operate on a single tensor T (1) , but temporarily expand the state space of one selected virtual link η o of T (1) in order to allow for a variation of the symmetry representation on this link. While the same ideas apply to ordinary and symmetric TNs, we now describe the technique considering symmetric tensors since some extra steps are required for this case. As a first step, an expansion of the selected link's state space can be achieved in the following way: Start by fusing all the links of T (1) other than η o to one large link η 1 . Repeat the operation for SVD diag. Figure 45: (a) Single-tensor update: the reduced diagonalization problem is formulated for one tensor. The optimized tensor is given by the eigenvector of H eff with the smallest eigenvalue. Note that link representations are preserved if H eff obeys some symmetry. (b) Double-tensor update: the reduced diagonalization problem is formulated for two tensors. The number of links of the optimized tensor is given by the number of links on which H eff acts. Link representations of these links are again preserved, in contrast to the one of the link which joins T [1] and T [2] : This link results from an SVD and can differ from the original link, if the SVD performs a non-trivial separation of the links of T opt . Note that if one of the two tensors is a two-link tensor, this link separation is trivial and the link created by the SVD will be the same as the external link of the two-link tensor.
tensor T (2) , which is connected to T (1) through η o , to obtain another large link η 2 . Now determine the intersection η max := η 1 ∩ η 2 (see Sec. 3.5.1). Note that the resulting η max has the full link representation accessible to the double-tensor optimization scheme, i.e. it can capture any link resulting from an SVD of the optimized double-tensor. Since our goal is to reduce the computational cost with respect to the double-tensor optimization, we introduce a parameter d pad to truncate all degeneracies∂ η max = d pad (for all sectors η max in η max ). The resulting link η max pad is used to pad (see Sec. 3.5.1) the link η o , according to η o := (η o + η max pad ) ∩ η max (the additional intersection makes sure that no redundant degeneracies are generated by the padding operation). The parameter d pad now allows to smoothly interpolate between a single-tensor update and a double-tensor update. The latter limiting case can be achieved by choosing d pad sufficiently large such that η o = η max , while the former limiting case is obtained by setting d pad = 0 which results in η o = η o . Note that in case the link η o exceeds the minimal bond dimension determined by the Schmidt rank (see Sec. 4.2.1), that is to say, in case there exist some sectors for which∂ ηo >∂ η max , the procedure to obtain η o may also cause some sectors to have reduced degeneracies, effectively removing redundant indices from η o .
The optimization itself is carried out in the following way (see Fig. 46): First the link η o at the tensors T (1) and T (2) is replaced by the expanded link η o . The hereby newly created tensor elements in the two tensors are filled (via subtensor assignment, see Sec. 3.5.1) with random entries (this step corresponds to the subspace expansion). Then, the tensor T (1) is made the root node of the TN and it is optimized as usual (by diagonalizing H eff ). Since the subspace expansion also affects T (2) , this tensor has to be optimized as well (analogously to T (1) ), keeping the same expanded link η o . The two alternating optimizations of T (1) and T (2) are repeated until convergence is achieved (usually one or two repetitions are enough). Finally, the original bond dimension D is restored via an SVD, discarding the smallest singular values (see Fig. 46). We will show and discuss numerical results obtained with this method in Sec. 5.3.
A diagrammatic comparison of the various optimization strategies is presented in Fig. 46.

Optimizer -Execution.
A crucial step of the optimization routine (common to all update schemes) is the diagonalization of the effective Hamiltonian H eff as this step usually constitutes the computational bottleneck of the whole GS algorithm. Thanks to the central gauging we are merely dealing with a standard eigenvalue problem (as opposed to a generalized eigenvalue problem arising in TNs with loops in their geometry). While this already provides a computational advantage, the identification of a fast eigensolver is still of utmost importance. Two observations are helpful in this context: H eff is usually sparse 2 , and only the ground-eigenstate of H eff has to be computed. Therefore, direct diagonalization algorithms (such as tridiagonal reduction, implemented by e.g. LAPACK's zheev [107]) are not convenient since they compute the whole spectrum, for which they usually require the explicit construction of the full matrix H eff . It is instead advantageous to employ iterative algorithms based on power methods, e.g. Lanczos/Arnoldi iteration [126,128]. These algorithms exploit the sparsity of H eff , only require knowledge of the action H eff |T [c] on a root node tensor |T [c] (see Fig. 37(c)) and, in addition, are specialized to return only a few eigenpairs. The computational costs of the different update schemes are thus given by the numerical effort of evaluating H eff |T [c] : All schemes share a factor f = N app P D 3 TPO , where N app is the number of operator applications in the iterative eigensolver (see Sec. 5.2 for an analysis of the behavior of N app in a specific TN architecture) and P is the number of terms in the effective Hamiltonian (including possible projectors), labeled by the index p in Eqs. (49) and (51). Here we assumed for simplicity that each TPO tensor (see Eq. (44)) has two virtual links {µ}, all of uniform dimension D TPO . The leading cost for the diagonalization in the single-tensor update is then where {η} 1 is the set of links of the tensor T (1) under optimization. For the single-tensor update with subspace expansion we get a diagonalization cost of where n is the number of times that the tensors T (1) and T (2) are alternately optimized, and the sets {η} 1,2 contain one expanded link η o with bond dimension D η o = D ηo + d pad .
For the double-tensor update the cost for the diagonalization scales like The SVD needed for the re-compression of the expanded link at the end of the singletensor update with subspace expansion has a cost of (see Sec.
while the final SVD in the double-tensor update has a computational cost scaling of In Sec. 5.2 we will see how the general computational cost estimations given here specialize for specific TN architectures. This will also allow for an easier, more evident comparison of the various costs for the different update schemes.
for targeting the n-th excited state we now need to solve the minimization problem Here, P p = |Ψ p Ψ p | is the projector on the p-th eigenstate and p is an energy penalty which has to be set large enough, i.e. at least as large as the energy difference |E p − E n | to the target state. Since the target state is not known at the beginning of the algorithm, this energy difference is also unknown; in practice, one therefore simply estimates a value for p which is guaranteed to be large enough (e.g. by setting p one order of magnitude larger than a typical energy scale of the system). For solving Eq. (57) (51)) to the effective Hamiltonian.

Variational Compression, Addition and
Orthogonalization. An interesting feature of the presented variational ground state search algorithm is that we can readily use it for compressing a generic loop-free tensor network state. This feature then enables further operations such as the addition and orthogonalization of entire tensor networks. For the case of MPSs, the procedure has been described in detail in [30].
Compression of a Tensor Network. Let a state |Ψ be given by some loop-free tensor network of bond dimension D . We then seek the best approximation |Ψ encoded in the same network geometry, but with a smaller bond dimension D < D .
As long as the change in bond dimension affects only one virtual link, an efficient and optimal strategy is to truncate the Schmidt values associated with that bond, as discussed in Sec. 4.1. However, during simultaneous compression of multiple bonds, the truncation errors of such local compressions may unfavorably accumulate into a less-than-optimal result. Namely, the optimal compressed state can be found by running the GS algorithm of Sec. 5.1 in bond dimension D for the Hamiltonian H ≡ −|Ψ Ψ |, i.e. with a single projector as described in Sec. 5.1.1. Ideally, the algorithm then converges to the lowest energy E = − | Ψ|Ψ | 2 that can be achieved with bond dimension D. In other words, one maximizes the quantum fidelity between compressed and uncompressed TN.
While it is still advisable to consider extended single-or double-tensor update schemes for reasons discussed in Sec. 5.1.3, it is however no longer required to invoke a sophisticated eigensolver in the optimization step: Namely, the effective Hamiltonian of a single projector is an outer product of the form as written in all detail in Eqs. (51) and (52). Up to a normalization factor, the lowest eigenstate solution for the root-tensor is thus given by T [c] =T [c] * , which is easily constructed.
Addition of Tensor Networks. A particular need for compression arises in the addition of two or more TNs |Ψ (i) , i = 1, . . . , k. We require that these TN share the same geometry, including the outcome |Ψ , which shall be the closest approximation to the sum that can be achieved within a fixed maximal bond dimension D. Once more, we can find |Ψ by means of the variational GS algorithm, which handles projections on superpositions of TNs efficiently: After setting up the projector |Ψ Ψ | defined by the sum of TNs |Ψ = i |Ψ (i) the algorithm for compression can be run as outlined before.
Note that alternatively, one could also first construct the exact sum as a network of bond dimension D = i D (i) as follows: Set each resulting tensor T [q] to zero and subsequently inscribe the respective tensors T (i) [q] from TNs |Ψ (i) , i = 1, . . . , k, as subtensors (Sec. 3.5.1) into disjunct index ranges [1 η ] for each virtual link η. The resulting network has to be compressed to the final bond dimension D -a task that requires computational resources scaling polynomially with the bond dimension D , analogous to computing TN scalar products. Unless one resorts to specific data structures for the block-diagonal structure in the contractions of the effective projectors, a drastic compression from dimension D is therefore less efficient compared to the direct variational compression of a sum of multiple states with smaller bond dimensions each.
Orthogonalization of Tensor Networks. As an example for the addition of TNs, one might want to orthonormalize a TN state |Ψ 2 with respect to another, |Ψ 1 . This can be achieved (up to a normalization) by subtracting their common overlap as in |Ψ ⊥ := |Ψ 2 − |Ψ 1 Ψ 1 |Ψ 2 . After evaluating the involved scalar product, we can employ the variational addition to obtain |Ψ ⊥ . Note however that the compression involved in maintaining a fixed bond dimension D usually prevents this approach from delivering an exact orthogonal solution. However, it returns the highest possible fidelity to the exact solution |Ψ ⊥ at the expense of strict orthogonality. Of course, such a result can nevertheless be useful, and one can directly extend this procedure to more than two states, e.g. in a (Modified) Gram-Schmidt process. Interestingly, the specific task of orthogonalization can be addressed more directly with variational compression, similar to targeting excited states: For this purpose, we add projectors on all those states that we want to orthonormalize against in the effective Hamiltonian of Eq. (58). These additional projectors have to be multiplied by a positive energy penalty ε p . With ε p → ∞, orthonormality can be encoded as a hard constraint during compression, and the optimizer's eigenproblem takes the form of a simple orthogonalization against effective projectors. Conversely to the additive approach, we now expect full orthonormality at the expense of fidelity to the exact D d l Figure 47: A binary TTN: The black dots at the bottom represent the physical sites of local dimension d. Each link in the l-th layer (counting from below towards the top) has dimension min(d 2 l , D), where D is a constant that we will refer to as the bond dimension. The dangling blue link at the top left tensor has dimension one and carries the global symmetry sector of this symmetric TN state. Its position in the network can be arbitrarily chosen. All tensors are gauged towards a center node (red). solution.

A Practical Application: Binary Tree Tensor Networks
In this section, we discuss a specific loop-free tensor network and see it in action: the binary tree tensor network (bTTN). Unlike MPS, this type of tensor network has the practical advantage of treating open-boundary and periodic-boundary onedimensional systems on equal footing [146]. It can thus efficiently address periodic systems without the limitations typical of periodic-DMRG [155]. The bTTN is composed entirely of tensors with three links. As we will see, this guarantees moderate scaling of computational cost with the bond dimension D. The bTTN architecture with builtin Abelian symmetry handling that we use (as introduced in Sec. 3) is illustrated in Fig. 47. The essential ingredients for achieving the ground state of a many-body Hamiltonian using a loop-free TN ansatz have been outlined previously in Sec. 5.1.2 and in Refs. [145,146,160]. Here, we summarize some practical technical details that are important when implementing a bTTN ground state algorithm: • Choice of decomposition algorithm for central gauging: In order to transform the bTTN into a central gauge, tensor decompositions of the type shown in Fig. 48(a) have to be performed repeatedly (see Sec. 4.2.2). Both the QR decomposition and the SVD are suited to carry out this task. However, as we show in Fig. 48(b), the QR decomposition can be performed in roughly two thirds of the time of the SVD, while exhibiting the same scaling O(D 3.8 ) in the bond dimension (at least in the range D < 500 commonly accessible to bTTN simulations). Therefore, it is advisable to implement any gauging routine in a bTTN by means of QR decompositions, and employ SVDs solely for compression purposes. • Benchmark of diagonalization methods: As mentioned in Sec. 5.1.3, the eigensolver is the component of the GS algorithm which consumes most of the runtime. The performance of this routine therefore crucially influences the whole algorithm runtime and should be first looked at when optimizing the algorithm speed. In Tab. 1 we list three publicly available diagonalization methods, and compare their performances in Fig. 49. Our benchmarks suggest that ARPACK [161] (an implementation of the Arnoldi method [128]) is the most suitable eigensolver in the context of a bTTN ground state algorithm. On a side note, we remark that the setting of a sensible initialization guess can further reduce runtimes of iterative eigensolvers. For the eigenvalue problems occurring in the GS algorithm presented in Sec. 5.1.2, an obvious choice for such an initialization guess is the root node itself (before optimization). Especially during the final sweeps, when the tensor elements are already close to their optimal configuration, this measure can significantly reduce the number of necessary operator applications N app . Conversely, one could reduce N app in the early sweeps of the algorithm by selecting looser starting threshold values for convergence (not included in Fig. 49).
• Computational cost of a bTTN GS algorithm: We observe that the total number of tensors Q scales linear with the system size N , similarly to an MPS: When using a single-tensor update scheme one deals exclusively with three-link tensors, while for a double-tensor update scheme intermediate four-link tensors  ARPACK [161] Implementation of the Arnoldi algorithm [126,128]. Iterative diagonalization, designed to deliver a few eigenpairs of H eff . Requires only knowledge of the action H eff |v on a given vector |v . The number of necessary operator applications N app strongly depends on the size of the Krylov subspace (parameter ncv in ARPACK). A consequence of choosing ncv too small is a fast growth of N app with the problem size dim(|v ) (where we empirically find that "fast" usually means super-logarithmic, see Fig. 49).
Jacobi-Davidson (jdqz ) [163] Implementation of the Davidson [127] algorithm. Requirements and features similar to Arnoldi. Here N app crucially depends on the maximum number of matrixvector multiplications in the linear equation solver (parameter mxmv in jdqz). We achieve best performance when choosing mxmv ≈ 5, i.e. at the lower end of the recommended interval [5 .. 100] (especially when only one eigenpair is demanded). center-gauged towards the tensor q attached to site s. This is shown in Fig. 51 where O is the result of a renormalization of the observable O [s ] (obtained from contractions involving all tensors along the path between sites s and s ), can be evaluated with a very small number of contractions O(log 2 N ), owing to the enhanced reachability of two arbitrary sites s and s due to the hierarchical tree structure (see Fig. 51(b)). This feature not only makes the calculation of observables a computationally cheap task in a bTTN, but also lies at the heart of their predisposition to accurately describe critical systems with algebraically decaying correlations [146].

Binary Tree Tensor Networks -Examples
In this subsection we present bTTN ground state solutions for two different 1D manybody Hamiltonians, both equipped with Abelian symmetries. Namely, we will consider the quantum Ising model in a transverse field (as a typical benchmark model with Z 2 parity symmetry and an easy, well-known analytic solution), and a Bose-Hubbard ring with a rotating barrier [164] (with U(1) boson number symmetry) as a more complex example. In particular, we provide a field test of the different update schemes presented in Sec. 5.1.3 by comparing both their convergence behaviors and computational resource usage. Spin-1/2 Ising Model. We use the following Hamiltonian where σ x,z [s] are Pauli matrices acting on site s. We employ periodic boundary conditions (N + 1 ≡ 1) and fix the transverse field strength |λ| = 1, so that the model becomes critical in the thermodynamic limit. At the critical point, the ground-state energy has a simple closed-form solution [7,165]  The Hamiltonian defined in Eq. (60) has a global pointwise parity (i.e. group Z 2 ) symmetry, dividing the state space into two sectors, one with even and one with odd parity (corresponding to the two representations of Z 2 , see details in Sec. 3.1. The parity is therefore a good quantum number. Specifically, the Hamiltonian commutes with the symmetry group elements U (g) ∈ {1 ⊗N , s σ z [s] }.
For |λ| < 1 (ferromagnetic phase) the ground state of H exhibits symmetry breaking, i.e. in the thermodynamic limit there are two degenerate ground states (one in the even sector and one in the odd sector), while for |λ| > 1 (paramagnetic phase) the thermodynamic ground state is unique and even. For finite system sizes with an even number of sites, the lowest energy state is always in the even sector (i.e. the trivial irrep of Z 2 ); therefore we will target this global sector in our bTTN simulations. Since there are only two possible symmetry sectors ∈ {e = 0, o = 1}, any link representation is defined by two degeneracy subspace dimensions∂ e and∂ o . In this sense, it is easy to perform a convergence analysis for virtual link representations during the algorithm runtime. While the ability of a bTTN to accurately describe the ground state of the critical 1D Ising model has been discussed in detail in Ref. [146], here we focus on a comparison between the different algorithm types (with/without inbuilt Z 2 symmetry handling, single-tensor simple update/single-tensor with subspace expansion update/double-tensor update). In Fig. 52(a) we show the convergence to the ground state energy as a function of the number of optimization sweeps. Since the size of the search space is reduced when exploiting symmetries, the convergence behavior towards the ground state is improved in this case. This is most clearly seen for large system sizes, e.g. N = 1024. In Fig. 53 we compare the two update schemes: Single-tensor update with subspace expansion and double-tensor update, as described in Sec. 5.1.3. We benchmark their capability to converge to the optimal link representations, which are characterized by∂ e =∂ o = D/2 for all links. For the sake of this benchmark, we initialize all virtual links with strongly unbalanced degeneracy dimensions∂ e ∂ o and check whether the ratio∂ e /∂ o converges to one as a function of the number of sweeps. Although the double-tensor update converges with slightly fewer sweeps, it is evident from Fig. 53 that both update schemes eventually manage to reach the optimal link representations.
A careful consideration of the theoretically possible runtime reduction resulting from the presence of two symmetry sectors with a flat distribution of degeneracy dimensions (see Sec. 3.5.3) yields a factor of 1/4 for the specific case of a bTTN singletensor update ground state algorithm. Moreover, the memory consumption is expected to be halved in this case (because half the sectors of each tensor do not couple and therefore do not need to be stored). In practice, we obtained resource savings very In the rotating frame the barrier is at rest, but the bosons experience an artificial gauge field B giving rise to a flux Ω through the ring, which leads to a boson current I(Ω) ∝ ∂E/∂Ω. close to these predicted values (see Fig. 52(b)), demonstrating that symmetric tensor operations can be implemented with very little bookkeeping overhead.
Bose-Hubbard Ring With a Rotating Barrier. A graphical illustration of the system is shown in Fig. 54. In the rotating frame the Hamiltonian can be written as [164] where the barrier of height β is now at a fixed lattice site s b and the (artificial) gauge flux Ω is proportional to the barrier's rotation velocity, t is the hopping amplitude and U the on-site repulsion, while b † [s] , b [s] , and n [s] are bosonic creation, annihilation, and number operators, respectively. The Hamiltonian defined in Eq. (62) has a total boson number conservation symmetry, which is a global pointwise symmetry with group U(1) (see Sec. 3.1 for details). Specifically, H commutes with the total boson number operator n tot = s n [s] , and is thus invariant under V (ϕ) = e iϕ ntot = s e iϕ n [s] . Hence, the total boson number N b (with n tot |Ψ = N b |Ψ ) is a good quantum number which we can enforce exactly by targeting a specific boson filling in a bTTN algorithm with inbuilt U(1) symmetry handling. Since the U(1) group entails many irreps -potentially all quantum numbers = 0, . . . , N b could be present on some links -there is no obvious way a priori to assign the sector degeneracies∂ ( = 0, 1, . . .) at each virtual link. Therefore, we necessarily need to select the∂ via a variational procedure, as explained in Sec. 5.1.3. We compare again the performances of the two different algorithm types: Double-tensor update strategy against single-tensor update with subspace expansion strategy. The results are summarized in Fig. 55, leading to the same conclusions already obtained from the benchmarks on the Ising model: Both algorithms manage to converge to the optimal link representations, with the double-tensor update requiring a slightly smaller number of sweeps. Nevertheless, the overall computational cost of the double-tensor update is significantly higher (both in terms of runtime and memory consumption), as expected (see Sec. 5.2). We verified that the obtained link representations are indeed optimal by comparing the converged ground state energies to the ones resulting from a bTTN simulation of the same model without inbuilt symmetries. (Without symmetries we have no longer a way of fixing N b exactly. Nevertheless a desired global filling can be achieved on average by working in a grand canonical ensemble, i.e. by adding an extra term µ s n [s] to the Hamiltonian and tuning the chemical potential µ.) A numerical comparison of the runtime behaviors of the different algorithm types is shown in Fig. 55(c). It appears that the double-tensor update algorithm with inbuilt U(1) performs roughly ∼ 4 times faster than the single-tensor update algorithm without symmetries, throughout the numerically accessible regime of D (albeit exhibiting a slightly steeper power-scaling). Moreover, the single tensor update with subspace expansion reduces the runtime by another considerable factor (∼ 3 times faster for D = 50), on top of having a favorable scaling in D, demonstrating the huge potential speed-up accessible by performing TN algorithms with symmetric tensors, especially when the underlying symmetry exhibits a large number of sectors.

Conclusions
In this anthology we reviewed some cutting-edge computational methods, based on tensor network states, for simulating low-dimensional many-body quantum systems at finite size, approached from the numerical user's perspective. After introducing the concepts and abstract objects which are the building blocks for every tensor network algorithm, we described in detail how these objects are meant to be embedded in a computer. We introduced which manipulations, including (multi-)linear algebra routines, are useful for simulation purposes, stressing the interfaces we suggest for the computation in order to be efficient in terms of time and memory usage. We showed how the same data structures and routines have to be adapted in order to encode pointwise Abelian symmetries, stressing how the conserved quantities help the computation by sensitively reducing the effective variational spaces involved. We reviewed in detail the geometrical and algebraic properties of loop-free tensor networks, focusing on the advantages presented by various gauges and canonical forms. Then, we gave a full description of an algorithm for capturing the properties of ground states and low excited states of Hamiltonians based on loop-free tensor networks: We presented several optimization strategies, and discussed their respective advantages and scaling, while showing practical numerical comparisons on their precision and performance. Altogether, these components form a useful and versatile toolkit which can deal with many low-entanglement scenarios for one-dimensional quantum systems, encompassing the most common simulations in finite-size realizations.

Outlook and Perspectives
Tensor network methods are still far from having reached their ultimate potential. Despite intrinsic difficulties, several steps have been made in the last decade in the direction of simulating two-dimensional quantum systems: 2D quantum physics is inherently fascinating for the non-trivial topological properties of the phases of matter, and consequent exotic quasi-particle statistics (anyons), equilibrium properties still ruled by quantum fluctuations and entanglement (far from mean field description), but it is more computationally challenging. In tensor network language the challenge translates into the fact that it is difficult to design a tensor network geometry which is both efficiently contractible and capable of capturing the 2D area-law of entanglement [50,98]. Devising powerful, versatile TN methods able to deal with this scenario is with no doubt the challenge of the present decade [51,154]. Regardless, we stress that recently the same loop-free tensor network architectures that we focused on in this work were employed to study topological properties of interacting two-dimensional models (with toric and open boundary conditions), and showed simulation capabilities well-beyond exact diagonalization [166]. Moreover, loop-free TNs seem to be a promising platform also for out-of-equilibrium simulations, a task likely achievable by adapting the Time-Dependent Variational Principle for Matrix Product States [47,167], or similar strategies [85,88], to the more flexible loop-free network geometries, possibly to include finite having to diagonalize H explicitly. Specifically we consider R|HR = T R|T HR * = R|HR * (.1) by means of anti-unitarity, T -invariance of |R and |R , and commutation with the Hamiltonian [170].
In the context of lattice models suitable for TNs, our interest lies on cases where we can transform into the basis |R with a local unitary transformation, i.e. one that acts separately on the physical sites. The simplest example is a Hamiltonian that is already real in |S to begin with, e.g. that is invariant under T = K. As a consequence, H becomes real symmetric and all of its eigenstates have real expansions, too. Then, in TN algorithms, by means of a (generalized) gauge transformation, acting locally on links (see Sec. 4.2.1), we can encode certain cases of anti-unitary symmetry by simply restricting tensors to real elements. When realized, we call this the "real gauge" of a TN (see Sec. 4.2.5).

B. Graphical Notation Legend
Fig. B1 contains the graphical notation legend, reporting the pictorial symbols that we use throughout the manuscript to identify the various TN components (links, tensors, etc.). Gauge Centers: Figure B1: Graphical notation legend. Pictorial symbols for links, tensors, their symmetric counterparts, and geometrical properties (such as isometry) are listed.