Sparse sampling and tensor network representation of two-particle Green ’ s functions

Many-body calculations at the two-particle level require a compact representation of two-particle Green’s functions. In this paper, we introduce a sparse sampling scheme in the Matsubara frequency domain as well as a tensor network representation for two-particle Green’s functions. The sparse sampling is based on the intermediate representation basis and allows an accurate extraction of the generalized susceptibility from a reduced set of Matsubara frequencies. The tensor network representation provides a system independent way to compress the information carried by two-particle Green’s functions. We demonstrate efficiency of the present scheme for calculations of static and dynamic susceptibilities in singleand two-band Hubbard models in the framework of dynamical mean-field theory.

Several methods have been proposed to address the storage issue. Conventional approaches are based on a separate treatment of the low-and high-frequency parts [17][18][19][20][21]. The frequency dependence is treated exactly in a small low-frequency box while in the outside region an asymptotic form is used. This works efficiently at relatively high temperatures. As the temperature is lowered the size of the low-frequency box grows, until it becomes prohibitively large.
Recently, the intermediate representation (IR) basis was introduced as a promising solution to the storage issue [22]. In IR the size of the data grows only logarithmically with the inverse temperature β and the bandwidth. A fitting procedure allows IR expansion of numerical data in the Matsubara frequency domain. Nevertheless, two obstacles remain: the computational cost of the IR expansion and the size of the IR tensor.
Regarding the first obstacle, the input data for the fitting scheme is very large, having a dense support in the Matsubara frequency domain. The fitting procedure thus becomes prohibitively expensive at low T . As for the second one, the IR represents a 2P quantity as a high-order tensor involving spin and orbital dimensions, in addition to those for the IR basis itself. Further compactification of the tensor is required for solving realistic multi-orbital systems at low T . In this paper, we address these two issues. First, we introduce a sparse grid in the Matsubara frequency domain, which contains the desired information about the 2P Green's functions This extends the approach developed in Ref. [23] for one-particle (1P) Green's functions. We introduce an efficient tensor network representation of the IR tensor and a fitting (regression) algorithm to determine it. Reduction of the data to be sampled thanks to the sparse grid makes evaluation of the IR coefficients very efficient and solves the first issue. The tensor regression provides a model-independent way to compress the IR tensor and tackles the second issue.
We demonstrate the performance of the present method in the context of DMFT. First, we test the accuracy of sparse sampling and tensor network representation by calculating the static susceptibility of the single-band Hubbard model on a square lattice. Next, we show the efficiency of the present method for dynamical susceptibility calculation for a two-band Hubbard model with low symmetry.
The paper is organized as follows. In the next section, the IR for 1P and 2P Green's functions is reviewed. Sparse sampling of 2P Green's functions is introduced in Sec. 3. In Sec. 4, the tensor network representation is presented. Its accuracy for computing of static susceptibilities of the single-band Hubbard model is demonstrated in Sec. 5. In Sec. 6, we present numerical results for dynamic susceptibility calculations, in the more demanding context of an ordered phase of the two-band Hubbard model. In Sec. 7, we summarize and conclude.

Intermediate Representation (IR) for Green's functions
Here we review the IR for 1P and 2P Green's functions introduced in [22] and [24].

One-particle Green's function
The IR for 1P Green's functions was introduced in Ref. [24]. The spectral (Lehmann) representation of the 1P Green's function G(τ ) in the imaginary-time domain reads where we assume = 1. The superscript α specifies statistics: α = F for fermion and α = B for boson. The spectrum ρ(ω) is assumed to be bounded within the interval [−ω max , ω max ]. The kernel K α (τ, ω) reads for 0 ≤ τ ≤ β (β is the inverse temperature). Here, the + and − signs are used for fermions and bosons, respectively. The extra ω factor for bosons in Eq. (2) is introduced in order to avoid a singularity of the kernel at ω = 0.
For fixed values of ω max and β, the IR basis functions are defined through the singular value decomposition (SVD) where the singular values S α l (> 0) decrease with increasing l exponentially. For fermions, the Green's function can be expanded as where . The exponential decay of S F l ensures a fast decay of G l if the spectrum is bounded in [−ω max , ω max ]. The accuracy of the expansion can be controlled by applying a cut-off for the singular values. The Matsubara-frequency representation of the Green's function reads

General form of IR for Two-particle Green's functions
The problem of the three-and four-point Green's functions was considered in Ref. [22]. The four-point Green's function can be expressed in the Matsubara domain as where ω, ω , ω stand for r-dependent combinations of ω n , ω n , and ν m listed in Table 1. Similarly, the indices α, α and α take the value F orB depending on r as indicated in Table 1. The indices i, j, k, l denote the flavor (combined spin and orbital), while O is a composite index representing the quadruplet (i, j, k, l). Formally, the tensor G contains full information about G (meaning in particular, information about its values at all bosonic and fermionic frequencies for all flavors).

Simplified form for fixed bosonic frequency
We now derive a variant of Eq. (8), which holds for a fixed bosonic frequency. The principle of the derivation is the same as above. Nevertheless, if one is interested in only a few bosonic  . Each circle or rectangle with N legs represents a N -way tensor. When two tensors share a leg, the summation over the corresponding index must be taken. A rectangle with a diagonal line represents an identity tensor. The N -way identity tensor t is defined as t i 1 ···i N = 1 iff i 1 = · · · = i N , and t i 1 ···i N = 0 otherwise. Table 2: 12 different notations of Matsubara frequencies and statistics for four-point Green's functions in the particle-hole notation at a fixed bosonic frequency.
frequencies, the reduced number of degrees of freedom allows us to represent the 2P Green's function as a tensor of lower rank. In the conventional particle-hole notation, the Green's function depends only on two fermionic frequencies for a fixed bosonic frequency. It is shown in Appendix A that in this case, this frequency dependence can be written as where ω n and ω n are fermionic frequencies, and ν m is a bosonic frequency. The index r relates to the 12 distinct representations generated from the three terms in Eq. (18). The imaginary-time frequencies (iω, iω ) and the statistics of the basis functions depend on r as summarized in Table 2. This expression is formally similar to Eq. (8), being meant to store the full information for a single bosonic frequency. The same sparse sampling strategy, which we shall introduce in Sec. 3, can therefore be employed in both situations.

Graphical representation
For the sake of clarity, we introduce a graphical representation for the tensor operations involving 2P Green's functions. As an example, we present in Fig. 1 the diagram corresponding to the right-hand side of Eq. (9). Each rectangle/circle represents a tensor whose indices are denoted by legs (the nature of the shape does not matter). The set of expansion coefficients G O (r, l 1 , l 2 ; iν m ) appears as a purple four-legged box, representing a rank four tensor. The green rectangle represents the basis functions terms in Eq. (9). A detailed view of their action is shown in the upper panel, where W is introduced as a composite index of (iω, iω ).

Sparse sampling
The sparse sampling scheme was originally proposed for a 1P Green's function in Ref. [23]. For fermions, the expansion of G(iω n ) reads where the number of coefficients N l = l max + 1 determines the accuracy of the expansion. It was shown that the full frequency dependence of a 1P Green's function can be reconstructed from the values of the Green's function on a carefully chosen sparse subset of sampling points in the Matsubara domain. U F l (iω n ) is real (odd l) or pure imaginary (even l), and oscillates around zero. The procedure described in Ref. [23] is based on picking the positions of the extrema of |U F lmax (iω n )|. The procedure generates N l (even l) or N l +1 (odd l) sampling points. The same procedure generates N l + 1 (even l) or N l (odd l) sampling points for bosons.
We extend this procedure in a straightforward manner for the expansion of the 2P Green's function. Each summand indexed by r in Eq. (8) (Eq. (9) ) is handled in turn. For a fixed value of r, the sets of sampling points relative to each factor in the corresponding product of basis functions are built. Then, the triplets (pairs) of direct products of such sets are determined, and make up the r th set of points to be sampled, in Z 3 (Z 2 ).
For simplicity of implementation, we use the same N l for fermions and for bosons in the expansion. In general, for a given value of Λ ≡ βω max , the singular values decay slower for fermions than for bosons. In practice, we determine N l based on a given singular-value cutoff for fermions, and use the same N l for bosons.
As an illustration, for Eq. (8), we obtain N 3 l + O(1) sampling points. The final set of sampling points we need to consider is the union of the sets of sampling points obtained for all r. The size of the union is less than the sum of the sizes of the individual sets, thanks to the overlap between them (in particular at low frequencies). In Sec. 6, we will use Λ = 10 4 and N l = 24 (cutoff value 10 −4 for singular values). For this parameter set, the procedure generates 165 912 sampling points for Eq. (8), which is slightly smaller than 16N 3 l = 221 184 due to the overlap. Figure 2 shows the distribution of these sampling points. One can see that their distribution is more dense at low frequencies, getting sparse at high frequencies. Figure 3 shows the distribution of the sampling points generated for Eq. (9) with N l = 19 (cutoff value of 10 −5 ). We obtain 2 972, 1336, 2 516, 2 972 sampling points for m = −20, 0, 10, 20, respectively. We will use these parameters in Sec. 5.
One technical caveat needs to be pointed out: the expansion of the 2P Green's functions involves the so-called "extended" bosonic basis set [22]. A basis function U B l from this set only exhibits max(0, l − 2) sign changes, due to the extra basis functions at l = 0, 1. The procedure above would thus yield l max −1 = N l −2 or N l −1 sampling points for this basis set. Therefore the actual process is slightly altered from the above description. The sampling points relative to the extended bosonic basis are generated from the extrema of U B lmax+2 instead of U B lmax . This ensures that the number of unknown coefficients matches the number of sampling points.
In the following sections, we will demonstrate that the sampling on the sparse grid is sufficient to evaluate the 2P Green's function with the desired precision for any Matsubara frequency. Figure 4: Graphical representation of the tensor decomposition in Eq. (12). Each circle or rectangle with N legs represents a N -way tensor. When two tensors share a leg, the summation over the corresponding index must be taken. A rectangle with a diagonal line represents a identity tensor.

Tensor network representation
In this section, we introduce an efficient fitting algorithm based on a tensor network representation for the IR tensor. In principle, numerical data on the sampling points can be fitted using either Eq. (8) or Eq. (9) by using the least squares method. The computational load of this naive approach scales as O(N 6 l N 4 orb ) or O(N 9 l N 4 orb ) for two-and three-frequency quantities, respectively. Here N orb is the number of orbitals, and N l grows logarithmically with respect to β. The fitting rapidly becomes too costly at low temperature, especially for three-frequency quantities.

Low-rank tensor decomposition
We introduce the following low-rank decomposition of G: where r runs over different representations and O ≡ (i, j, k, l). This type of tensor decomposition is known as a Canonical Polyadic (CP) decomposition and is widely used in many fields, e.g., for accelerating quantum chemistry calculations by factorizing Coulomb integrals [25][26][27].
For the simplified form with fixed bosonic frequency in Eq. (9), the low-rank decomposition reads which is illustrated in Figure 4. Both expressions become exact for a large enough D. The decomposition is beneficial for the fitting procedure if a good approximation of the full tensor is obtained for a reasonably small value of D. We discuss further in the text how this condition can be checked numerically. Note that the dependence on the orbital indices is not decomposed and they still appear as the composite index O in Eqs. (11) and (12). In the CP decomposition, we do not assume any orthogonality conditions for the decomposed tensors, unlike the SVD of a matrix. Recently, some of the authors and co-workers have proposed strong-coupling formulas for computing momentum dependent susceptibilities in DMFT [28]. Their formulas can be regarded as a special case of Eq. (12) with D = 1 and x dr = δ r1 . Our preliminary results indicate that the further decomposition of the dependence on O into individual spin/orbital indices requires typically an even larger D, which is not beneficial.

Fitting algorithm
We explain how to fit some existing data of the 2P Green's function on the sampling points in the Matsubara domain using Eq. (11) or Eq. (12) without explicitly constructing the big tensors on the left-hand side. The x tensors in the equation can be regarded as free fitting parameters. For instance, for Eq. (11), we define a cost function for the fitting as where || . . . We introduce the small parameter α > 0 in order to regularize the optimization problem. Without this parameter, the problem would be ill-posed due to the overcompleteness of the representation. We have not observed any visible systematic errors in interpolated data for small values of α, i.e., α ≤ 10 −5 . We used α = 10 −8 and 10 −5 in Sec. 5 and Sec. 6, respectively. The minimization of this cost function is a non-convex optimization problem. We found that despite its non-convex nature, the cost function can be minimized efficiently using standard methods starting from randomly initialized parameters. In some cases, we observed the existence of multiple solutions, being different only slightly in terms of the cost function. This issue thus does not matter in practice. We refer the interested reader to Appendix B for more details on the optimization method.

DMFT calculations for single-band Hubbard model on a square lattice
As a test bed for our method, we first consider a single-band Hubbard model on a square lattice at half filling for U = 12t (1.5× the bandwidth W = 8t), where the hopping t = 1 sets the unit of energy. The inverse temperature β = 2.5 is slightly above the antiferromagnetic transition. We use (approximate) Hubbard-I solver, which provides semi-analytic representation of local susceptibilities and thus allows precise analysis of our data compression approach. We first compute and fit the local (impurity) generalized susceptibility X loc by subtracting the relevant disconnected parts from the local 2P Green's function. Interpolating the local generalized susceptibility in the Matsubara frequency domain, we solve the Bethe-Salpeter equation (BSE) and compute the static DMFT susceptibilities of the model. We compute X loc for all spin-orbital components on 1 336 sampling points generated for Λ = 100 and N l = 19 at zero bosonic frequency m = 0. The distribution of the sampling points is shown in the right top panel of Fig. 3. Then, we fit the data using Eq. (12) (G is simply replaced by X loc ). Figure 5 shows how the fitting errors decay as D is increased. We found that the residual of the fit vanishes quickly with increasing D. Figure 6 compares the exact and interpolated values of the local susceptibility in Matsubara frequency space. One can see that for D = 15, the fit matches the exact values on the sampling points and precisely interpolates the data. Increasing D further does not improve the fit substantially, which may be due to the truncation of the basis.
The inversion of the Bethe-Salpeter equations (for the determination of the lattice susceptibilities) directly in the IR and tensor network format is still an open question (see the discussion in Sec. 7). In this study, we execute the inversion itself using the Matsubara representation, based on the interpolation of the generalized susceptibility in a box of width [−N ω , N ω − 1] for fermionic frequencies. Corrections from higher frequencies outside the box are treated using the procedure described in Appendix B of Ref. [28]. The corresponding physical quantities are obtained by summation over the fermionic frequencies. Figure 7 shows the physical lattice susceptibility using the fitted results for D = 5 and iν = 0. For the calculations in this section, we took N ω = 100. We found that the results for D = 5 are already indistinguishable from the exact ones at the scale of the figure. There is a pronounced peak at M = (π, π), corresponding to an antiferromagnetic spin order. The other three eigenvalues, which correspond to charge susceptibility, are not enhanced.
The storage of X loc computed on the sampling points takes up 340 kB (including all spin sectors), while the compressed data for D = 5 only 5.2 kB.

DMFT calculations for two-band Hubbard model
Next, we apply the present IR approach to a state-of-art linear response DMFT calculation. To this end, we choose the two-band Hubbard model in an intermediate coupling regime and low-spontaneously broken-symmetry, a problem that some of us have studied recently [29]. The model Hamiltonian reads   where a † iσ and b † iσ are fermionic operators that create electrons with the respective orbital flavors and spin σ at site i of a square lattice. The first term describes the nearest neighbor hopping. The rest, expressed in terms of local densities n c i,σ ≡ c † iσ c iσ , captures the crystal-field ∆, the Hubbard interaction U and Hund's exchange J in the Ising approximation. We use the same hopping parameters t a = 0.4118, t b = −0.1882 as in [29], but choose weaker interaction U = 2 (J = U/4, U = U − 2J) and ∆ = 1.55. At the studied temperature β = 60, an ordered phase called polar excitonic condensate [30] is realized. We follow the same the algorithmic programme as in Ref. [29], except for the representation of the 2P Green's function.
The local 2P Green's function G loc is sampled on a non-uniform grid in the Matsubara frequency domain, shown in Fig. 2 (Λ = 10 4 , N l = 24), using a modified version of the ALPS/CT-HYB impurity solver [31,32] based on the continuous-time hybridization expansion algorithm [33,34]. The regression (11,13) for G loc provides us with the x (i) , i ∈ {0, . . . , 4} coefficients, Eq. (11), which in turn are used to interpolate the local 2P Green's function at all Matsubara frequencies. The generalized local susceptibility X loc is then evaluated by subtracting the disconnected part from the local 2P Green's function. Figure 8 illustrates the convergence of the process with increasing value of D in this situation. The solid lines represent the real part of G loc in the Matsubara representation, obtained from Eq. 8. Both panels show the profile for a fixed value of the bosonic frequency in the particle-hole notation (m = 0 and m = 10), along a cut in the two-dimensional fermionic frequency space, slightly away from the main diagonal (shifted by ten Matsubara frequencies). The crosses represent the actual sparse frequencies, where the data was sampled. The structure of the data at zero bosonic frequency is relatively simple, so that the fit is excellent for D = 30, but the tenth bosonic frequency requires D = 60.
Using the above data we can solve the BSE to get the linear response of our system. As    Figure 9: Selected components of the static susceptibility χ(q) throughout the Brillouin zone, for the two-band Hubbard model described in the text, for β = 60. an example we calculate the diagonal susceptibilities for local operators [29] which capture the low-energy dynamics of the polar condensate. We have chosen the ordered phase such that only Rφ x is non-zero. In this set-up Rφ y generates a spin rotation of the order parameter (Goldstone mode) and Iφ y couples to S z (only in the ordered phase). In Fig. (9) we show the static susceptibilities on a fine grid in the 2D Brillouin zone. Unlike in the strong coupling case of Ref. [29], the spin susceptibility is dominated by a Fermi surface nesting present both in the ordered and disordered (not shown here) phases, which gives rise to a peak at an incommensurate vector on the X-M zone boundary. The response corresponding to the Goldstone mode in the middle panel exhibits the expected divergence at the ordering wave vector (q = 0). The excitonic susceptibility in the right panel exhibits, in addition to the main (finite) peak at q = 0, additional peaks that coincide with the maxima of the spin susceptibility. This reflects the coupling between S z and Iφ y induced by the symmetry breaking. While in the strong coupling regime of Ref. [29] S z was a slave to the dynamics of Iφ y , here we can see that S z affects Iφ y . In Fig. (10) we show the absorptive (imaginary) parts of the dynamical susceptibilities obtained by the analytic continuation described in Supplemental Material of Ref. [29]. It reveals the Goldstone nature of the Rφ y response and complex nature of the spin response. Interestingly, the sharp response in the vicinity of Γ does not reflect formation of a bound state, but is a consequence of parallel bands upon opening of the excitonic gap. The low-energy hot spot on the X-M linearand its counterpart in the static susceptibility reflect the vicinity of an antiferromagnetic phase [35].
The presented susceptibilities obtained from the IR inputs are in excellent agreement with benchmark data obtained using the Legendre representation used in Ref. [29]. The current setup is more flexible, insofar as it is not limited by the sampling window, neither in the bosonic nor the fermionic Matsubara frequency domain. It is also very compact. The sparse grid solely saves computational time and memory footprints for QMC significantly. The tensor regression further compresses the sparsely sampled data by several orders of magnitude: The measured data is 700 MB large on the sparse grid, while the tensor network representation takes up only 330 kB for D = 60.
In general, if a too large D is employed, one could overfit QMC noise, giving a rise to oscillatory behavior between the sampling points in the interpolated data. A practical recipe for avoiding overfitting is to use the value of D that minimizes test errors rather than training errors. In the present study, however, we did not observe overfitting behavior. This may be because the fitting parameters is still much smaller than the fitted QMC data in size. More detailed analysis of the stability of the fitting procedure is a topic of future studies.

Summary
Based on the IR basis, we have introduced a procedure for generating sparse grids in the Matsubara frequency domain and a fitting algorithm based on a tensor network representation. These two enable efficient an transformation of numerical data from Matsubara to IR domain. The tensor network representation provides a model-independent way to compress the IR expansion coefficients (IR tensor) by decoupling the frequency and spins/orbital dependence. Low-temperature calculations for multi-orbital systems benefit from this compression. We have demonstrated the efficiency and accuracy of the present method in DMFT calculations: static susceptibility calculations for single-band Hubbard model and dynamic susceptibility calculations for two-band Hubbard model with low symmetry. We have shown that accurate susceptibilities can be obtained already with low-rank approximation of the IR tensor.
Potential applications of the present scheme include DFT+DMFT calculations on realistic multi-orbital models and diagrammatic extensions of DMFT. It is highly desirable to develop efficient methods for solving equations at the 2P level such as Bethe-Salpeter and parquet equations directly in the tensor network format with the sparse sampling. This requires efficient evaluation of contractions of 2P quantities, e.g., a vertex function and a generalized susceptibility. Potentially useful techniques for manipulating matrix product states and tensor networks have already been developed in other fields of condensed matter theory. [36]

A Intermediate representation at fixed bosonic frequency
The frequency dependence of G(iω n , iω n , iν m ) can be decomposed into 16 distinct components shown in Table. 1. For instance, the first component (r = 1) depends on the three frequencies through iν m + iω n , −iω n , iω n , which defines the structure of discontinuity planes in the imaginary-time domain. To be more specific, as discussed in Ref. [22], the first component can be discontinuous at three equal-time planes: τ 1 = τ 4 , τ 2 = τ 4 and τ 3 = τ 4 (mod β). Such a function with this discontinuity structure may be well approximated by where ρ (1) ( 1 , 2 , 3 ) is an auxiliary spectrum bounded in [−ω max , ω max ]. Applying the same procedure to all the 16 components, we obtain the assumption that .
For instance, the term for r = 1 can be decomposed as One can see the first and second terms in the last line are the products of two fermionic kernels. The first term can be represented compactly as where the coefficient in the parenthesis decays as fast as the singular values with respect to l 1 and l 2 .
Applying the same procedure to all the terms in Eq. (15), one obtains a compact overcomplete representation Here we defined for s = 0, 1. The sum can be restricted to s, s = 0 in the case of zero bosonic frequency, i.e., ν = 0. In the present study, for simplicity of the implementation, we also keep s, s = 1 in the sum also for ν = 0, which does not harm. We can recast Eq. (18) into a form more analogous to Eq. (8) as G(iω n , iω n , iω m ) = 12 r=1 l 1 ,l 2 G(r, l 1 , l 2 ; iν)U (r)

B Optimization algorithm for tensor regression
We minimize the cost function in Eq. (13) by means of a accelerated alternating least squares (ALS) method. The essential idea of ALS is to optimize each tensor in {x (0) , x (1) , · · · } at one time. The optimization of a single tensor reduces to a convex optimization problem. In ALS, we sweep through all the tensors until the value of the cost function is converged. In addition, we introduce recently proposed acceleration techniques to improve the convergence of ALS. In the following, we detail the procedure of ALS and the acceleration techniques.

B.1 Alternating least squares
We explain how to optimize the tensors in Eq. (13) by alternating least squares.

B.1.1 Optimization of x (0)
The minimization of Eq. (13) with respect to x (0) can be recast into min which is a regularized convex optimization in the well-known form of Ridge regression. The tensor A reads where In matrix form, Eq. (21) reads min where y and x (3) are flatted 1D arrays. The matrix is size of (N W N O , N r D), where N r is the number of different representations (=12), N O is the size of the combined index for spin and orbitals.
In practice, we solve Eq. (24) by an iterative method, LSQR [41], without constructing the matrix A explicitly. In LSQR, the matrix A is used only to compute Av and A † u for various v and u. Hence we store the (precomputed) tensor B in memory, and compute these products by means of tensor contractions. We illustrate the tensor contractions for computing Av and A † u in Figs. 11(a) and 11(b), respectively. This approach not only reduces memory footprints but also reduces the computational complexity from

B.1.2 Optimization of x (1)
The optimization of x (1) can be done in a way very similar to that of x (0) . Thus, we focus only on the differences. The reduced least squares problem reads min where where As illustrated in Fig. 11, we can readily compute A v and A † u by tensor contractions.

B.1.3 Optimization of x (2)
The tensor x (2) can be optimized in exactly the same way as x (1) . Thus, we do not describe the optimization of x (2) for simplicity.

B.1.4 Optimization of x (3)
The optimization of x (3) is rather simple. The reduced least squares problem reads min where the tensor A can be stored in memory. Furthermore, this optimization problem is separable with respect to D and thus can be solved independently.

B.2 Acceleration techniques and convergence condition
In the previous subsection, we have explained how to perform one sweep through the tensors. This single ALS sweep corresponds to the function ALS in Algorithm 1. The function ALS takes an array obtained by flattening tensors of fitting parameters as input. After a single sweep, the updated tensors are returned as a flattened array. Here, flattening means recasting multiple tensors of complex numbers into a single one-dimensional array of real numbers (the order is arbitrary). The whole procedure of the accelerated ALS is illustrated in Algorithm 1. The main difference from the plain ALS is the existence of β. In the second last line, β(x k − x k−1 ) acts as a momentum term for β > 0. Although this momentum term accelerates the convergence by updating the parameters aggressively, this sometimes leads to oscillatory behavior or divergence. We use a restarting mechanism to stabilize the accelerated ALS. In practice, when the restarting condition f (x k ) > f (x k−1 ) is met (f is the cost function), a ALS sweep is forced by setting β = 0 (see the comment in Algorithm 1). For more details on the acceleration techniques, please refer to Ref. [42].
The loop is exited when a convergence condition is met. δ in Algorithm 1) is a relative tolerance.

B.3 Technical details and numerical results
We parallelize the whole fitting procedure by MPI with respect to frequencies. This parallelization is efficient particularly for G. We parallelize the LSQR implementation in the SciPy Python package [43] using MPI with respect to sampling frequencies. Figure 12 shows the convergence of the root squared errors of the local susceptibility for the single-band Hubbard model analyzed in Sec. 5. One can see that the fitting errors quickly converge. The small oscillatory behavior is due to the acceleration.  Flatten updated tensors end function Exit loop end if end for