Equilibration towards generalized Gibbs ensembles in non-interacting theories

Even after almost a century, the foundations of quantum statistical mechanics are still not completely understood. In this work, we provide a precise account on these foundations for a class of systems of paradigmatic importance that appear frequently as mean-field models in condensed matter physics, namely non-interacting lattice models of fermions (with straightforward extension to bosons). We demonstrate that already the translation invariance of the Hamiltonian governing the dynamics and a finite correlation length of the possibly non-Gaussian initial state provide sufficient structure to make mathematically precise statements about the equilibration of the system towards a generalized Gibbs ensemble, even for highly non-translation invariant initial states far from ground states of non-interacting models. Whenever these are given, the system will equilibrate rapidly according to a power-law in time as long as there are no long-wavelength dislocations in the initial second moments that would render the system resilient to relaxation. Our proof technique is rooted in the machinery of Kusmin-Landau bounds. Subsequently, we numerically illustrate our analytical findings through a quench scenario with an initial state corresponding to an Anderson insulator observing power-law equilibration. We consider some possibilities for realizing distinct instances of generalized Gibbs ensembles in optical lattice-based quantum simulators and studying their stability in the presence of interactions.


I. INTRODUCTION
Over more than a century, it has become clear that the methods of statistical mechanics work incredibly well in a vast range of physical situations. But, to this day, a complete understanding of why this is the case remains elusive. Based on both experimental and theoretical work, a good deal of progress has already been made [1][2][3][4][5][6]. Nevertheless, the key objective, finding a set of physical assumptions from which we can demonstrate that quantum systems reach thermal equilibrium, has yet to be achieved. And there are exceptional cases where this simply does not occur, which typically involve the existence of locally conserved quantities.
The process whereby a system locally relaxes to a thermal state or a generalized Gibbs ensemble (which we call generalized thermalization) can be broken down into two components (see Figure 1). The first is simply that it equilibrates, meaning the system spends most of the time locally close to some time-independent steady state. This should be true at least for large classes of important observables, e.g., local observables. A crucial aspect (sometimes overlooked) is that the equilibration time for this must be realistic: in experiments, we can observe physical systems relaxing over reasonable times only, which is something that needs to be appreciated. The second component in the case of thermalization is that the equilibrium steady state has no detailed memory of the initial state (beyond, e.g., temperature or chemical potential), namely it is a thermal state.
It has become clear, however, that some specific classes of physical systems do not equilibrate [7][8][9][10], at least over the times one can assess in the laboratory. Furthermore, some systems equilibrate but not to a thermal state, instead retaining some memory of the initial state [11][12][13]. A major distinction arises in this context between non-integrable systems, FIG. 1. Thermalization and equilibration are often studied in a dynamical quench scenario, where a parameter in the Hamiltonian is suddenly quenched to zero, which knocks the system out of equilibrium (1). The subsequent process of (generalized) thermalization has two components. First, the system must relax to a steady state (2a) with respect to meaningful quantities. Exceptions to this are typically characterized by oscillations, as in (2b). Second, if equilibration occurs, the equilibrium state must be thermal (exemplified here by the Fermi-Dirac distribution in (3a)), or correspond to a generalized Gibbs ensemble (3b) in case further constants of motion are relevant.
which indeed are expected to equilibrate to a thermal state, and integrable systems, which are expected not to fully thermalize, but to equilibrate to generalized Gibbs ensembles [14][15][16][17][18][19][20]. Many-body localized systems [11,21], in which disorder and interactions interplay, can be seen as being reminiscent of the latter systems, as instances of quantum systems which also do not thermalize. In both cases, local (or quasi-local) conserved quantities play a major role. Whenever initial states with inequivalent values of these conserved quantities are experimentally accessible, the resulting steady states will retain arXiv:1809.08268v1 [quant-ph] 21 Sep 2018 a memory of these differences that can be measured. A rigorous dynamical derivation of generalized thermalization must therefore overcome several difficulties arising from these observations: we must identify what properties most physical systems have that lead them to thermalize or relax to a generalized Gibbs ensemble.
There are several different theoretical approaches to this challenging long-standing problem. One is to focus on what can be proven for abstract quantum systems with as few assumptions as possible [4,[22][23][24]. In this case, powerful results have been found, though often without reproducing the relevant equilibration times [25][26][27][28]. Another approach is to use randomness to attack the problem [29][30][31][32][33][34]. Suggestions for the mechanism underlying the relatively fast process of equilibration in the general setting have been offered [35,36], but a consensus together with more concrete estimates for equilibration times have yet to emerge.
In this work, we first analyse quenches of lattice fermions (and -less explicitly -bosons) to non-interacing Hamiltonians. Our first main result is that they locally equilibrate quickly. Two tools we employ are the Kusmin-Landau bound [49] and fermionic Gaussification from Ref. [50]. The latter showed that non-interacting fermions on a lattice locally Gaussify, meaning the state becomes locally indistinguishable from a Gaussian state for relatively long times. However, this Gaussian state may be time dependent. Not only do we show that one of the assumptions of Ref. [50] is unnecessary for Gaussification, but we also show that the Gaussian state that the system approaches will be time independent. This is a proof of equilibration over realistic times for these models, and it also proves that the equilibrium state can be described by a generalized Gibbs ensemble (GGE). In fact, our work can be seen as a comprehensive rigorous proof of a convergence to generalized Gibbs ensembles, bringing the program initiated in Refs. [14,19,51,52] to an end, by widely generalizing the previous results, while keeping the discussion fully rigorous. We then turn to discussing the question of whether one does indeed need the extra degrees of freedom of a GGE (as opposed to simply a thermal state). We show numerically that initial states corresponding to thermal states of an Anderson insulator equilibrate after quenching the on-site disorder to a thermal state (or grand canonical state), except in cases with highly correlated noise. In this latter case, the equilibrium state must be described by a GGE. It is easy to see that if one has strongly inhomogeneous initial conditions, the equilibration times can be of the order of the system size, see, e.g., Ref. [38]. Finally, we consider some possibilities for realizing distinct instances of generalized Gibbs ensembles in optical lattices and systematically studying their stability in the pres-ence of interactions.

II. SUFFICIENT CONDITIONS FOR LOCAL EQUILIBRATION TO A GENERALIZED GIBBS ENSEMBLE
A quantum system locally equilibrates if, for all times t between some relaxation time t 0 and some recurrence time t R , the state at time t is practically indistinguishable from the time-averaged stateˆ (eq) with respect to local observables. In other words, the extent of non-equilibrium fluctuations is bounded by some small > 0 such that for every local ob-servableÂ we have for all t ∈ [t 0 , t R ], where Â ˆ = tr[ˆ Â ]. Clearly, whenever a system equilibrates, the equilibrium state must be the infinite time averageˆ In this assessment, it is particularly challenging to identify the equilibration time t 0 > 0. When equilibration does indeed occur, a most natural question is how to precisely characterize this equilibrium state. Statistical physics is built upon the assumption that systems equilibrate to a thermal state. The thermal (or Gibbs) state of a quantum system with Hamilto-nianĤ is defined to bê where β > 0 is the inverse temperature, which fixes the value of the expected energy, µ is the chemical potential, which determines the expected particle number, andN is the particle number operator. We say that a system with initial stateˆ thermalizes locally if it equilibrates in the sense defined above and ifˆ (eq) is locally indistinguishable from a thermal state (for some value of β and µ). For the case of non-interacting, quasi-free models, thermal states of quadratic Hamiltonians are called Gaussian or quasi-free and are the target equilibrium ensemble upon quenches to quasi-free dynamics.

A. Statement of the main result
Our main result is the following. Take a system of noninteracting fermions on a line described by a translation invariant (with periodic boundary conditions) short-ranged Hamiltonian. Assume that the couplings are generic such that there are no points with coinciding roots of the derivatives E (p) = E (p) = 0 of the dispersion relation E. Initialize the system in a state with finite correlation length and non-resilient second moments (defined presently). Then local equilibration occurs according to the following statement.
Theorem 1 (Emergence of statistical mechanics). There exist a constant relaxation time t 0 and a recurrence time t R proportional to the system size such that for all times t ∈ [t 0 , t R ] the system locally equilibrates to a Gaussian generalized Gibbs ensemble, with for some C, γ > 0 independent of the system size. That is, we The generalized Gibbs ensemble is parametrized by an intensive number of generalized temperatures that scales with the correlation length of the initial state and the thermodynamical potentials are exclusively local. These are two defining features of statistical mechanics and indeed are present in our equilibrium ensemble. We argue this by invoking the Jaynes' principle of looking for the maximum entropy state given expectation values of interest. In our case, these are the tunneling currents Î z which are quadratic operators, e.g.,Î 0 is the mean on-site particle density andÎ 1 corresponds to the nearest-neighbor tunneling. Since the equilibrium ensemble is Gaussian we can also use the property that characterizes these states, namely that they are the maximum entropy states given fixed second moments [50]. Hence fixing the values Î z is a way of parametrizing a Gaussian state. Say that = Ct −1/6 is our desired experimental resolution, and deviations from equilibrium should not be larger then it. Then within that precision we neglect all the currents with range significantly above the correlation length z > z ξ ≈ ξ ln( −1 ) and aim at reproducing the values of the relevant currents in the initial state for z ≤ z ξ . This condition is met by setting the state to be parametrized asˆ where Z ensures normalization. Note that for fixed > 0, e.g., determined by the experimental resolution of the apparatus, only an intensive number of generalized temperatures λ z significantly contributes to the parametrization of this ensemble. It remains to argue that for z ≥ z ξ all correlation functions are smaller than the desired resolution . By the result in Ref. [53], any one-dimensional thermal state has exponentially decaying correlations and hence indeed we recover asymptotically Î z ˆ (eq) G ∼ C Clust e −z/ξ A . Here we can identify the chemical potential as µ = λ 0 and oftentimes β = λ 1 in the case of the nearest-neighbor hopping quench Hamiltonian. If we find that z ξ z=0 λ zÎz = βĤ + µN wherê H is exactly the quench Hamiltonian andN the particle number of operator then we would say that the equilibrium ensemble is thermal. Whenever this is not the case then one concludes that relaxation towards a generalized Gibbs ensemble (GGE) has taken place.

B. Discussion of the main result
The novel feature beyond known non-interacting results [14,19,42,[50][51][52][54][55][56][57] is that for the first time we show equilibration over a reasonable time in a closed quantum system to occur generically within a class of models and initial states. Such ubiquitous validity is one of the defining features of statistical mechanics. Roughly speaking, in our case it occurs as a result of translation-invariance of the dynamics, even if the initial state is non-Gaussian and is not translation invariant, as long as it does not have unnatural initial correlations. Note that our argument does without the knowledge of the actual values of the couplings or specific initial configurations of the particles as long as these satisfy our general assumptions. This generality is a crucial feature of statistical mechanics and is to a large degree responsible for its success.
Throughout the work, it will be our goal to give intuition that grounds the proof of this result. Let us begin by explaining how equilibration can fail or is physically implausible if any of the ingredients of Theorem 1 is relaxed and therefore other assumptions become necessary. By our result equilibration occurs via dynamics generated by non-interacting Hamiltonians: while strong results are possible even in the general interacting case [24,25,28,38,[58][59][60], deriving a rigorous bound on the equilibration time of the type ∈ O(t −γ ) has been elusive so far. In fact, it may be impossible on grounds of quantum computational complexity [61][62][63][64] because equilibrated time-evolution is concomitant to converging results of a quantum algorithm and often the runtime should be longer than polynomial [65].
In the main text, we will present the results for noninteracting fermions even though same statements hold for non-interacting bosons with a little technical fineprint due to the local Hilbert space being unbounded, and one needs additional assumptions on the correlations in Ref. [51]. Concerning geometry, we consider a ring configuration, mostly for the clarity of the argument while of course thermodynamics should not change by the choice of boundary conditions. However, in higher dimensions additional complications could occur as the group velocity, i.e., the derivative of the dispersion relation could vanish along curves instead of separated points [66], but certainly our techniques should generalize when supplemented with additional assumptions that exclude such technical issues. One of the core physical assumptions enabling sufficient scrambling of the initial conditions is translation invariance of the Hamiltonian. Relaxing it, one can find that particles do not propagate and without mixing ergodicity breaks down and with it relaxation. As a prime example, the Anderson insulator model [67] is a nontranslation invariant Hamiltonian where equilibration is obstructed due to localization.
Long-ranged non-interacting models often violate causality [68,69]. That is to say, if equilibration occurs, then one would need to develop an entirely new intuition for its mechanisms. Here, we assume a short-ranged local Hamiltonian which is already enough to ensure effective causality by means of the Lieb-Robinson bound [70][71][72]. By additional technical calculation, it should be possible to extend the results to cou-plings that asymptotically decay exponentially. Note that we consider a closed system described by a static Hamiltonian. If we relax the condition on exponentially decaying correlations then one can consider as the initial state a state evolved backwards to extensively long times which suddenly would acquire "out of nowhere" non-equilibrium dynamics while the system should be expected to be equilibrated.
Finally, it has turned out to be necessary to demand that second-moments of the fermionic state be non-resilient. The simplest example of a state without this property occurs when particles occupy half of the system and the other half is empty. Then for any short-ranged Hamiltonian by the Lieb-Robinson bound it will take extensive times for the particles to even explore the system and equilibration to occur. This property will be precisely stated below in the form of a definition after the necessary notation has been introduced. Summarizing this discussion, trying to establish equilibration one can encounter numerous obstructions, some of them are fundamental difficulties and some are rather technical. In this work we identify precise conditions, mostly concerning locality of couplings and correlations, which are physically very natural and general, and at the same time are sufficient to establish local equilibration with time-scales for a closed quantum system.

A. Non-interacting fermion models
We denote fermionic annihilation operators byf x and will discuss bosons in the appendix. The annihilation operators obey the canonical anticommutation relations {f x ,f † y } = f xf † y +f † yfx = δ x,y . Note that any fermionic initial state satisfies the parity superselection rule [73,74], meaning physical states can never involve a superposition of even and odd numbers of fermions. More precisely, we assume that the density operatorˆ commutes with (−1)N , whereN = L x=1N x is the total number operator withN x =f † xfx . A non-interacting fermionic model conserving particle number is characterized by a quadratic Hamiltonian of the formĤ where h = h † ∈ C L×L is the coupling matrix for a finite system size L. By a linear transformation of the fermionic operators preserving the anticommutation relations, any such Hamiltonian can be brought into diagonal form. Whenever the system is translation invariant then h is circulant, and so h can be diagonalized by a discrete Fourier transform. Throughout, we make the assumption that h ∈ R L×L is real, translation invariant and has range R, that is J z := h 1,1+z vanishes for z > R and hence we consider the hopping models of the form By this, we can define the dispersion relation E : R → R as J z cos(pz) (9) and evaluating at p k = 2πk/L we can write the eigenvalues of h as ω k = E(p k ) for any finite system size L > 2R. Here E(p) is analytic and its derivative can be used to express the dispersion gaps, e.g., ω k+1 − ω k = E (p k )2π/L for somẽ p k ∈ [p k , p k+1 ] by the mean value theorem. It will be useful to define J max = max z=1,...,R |J z |. The Heisenberg evolution of mode operators readŝ where G * (t) = e −ith is the propagator given by in the translation invariant case, see Appendix A. The covariance matrix is defined as the collection of second moments of a stateˆ , given by Note that we consider states with no pairing correlations: Our methods can be generalized to that case as well [75,76], but this complicates the presentation. Using (10) we see that the covariance matrix at time t is Of particular relevance for us will be fermionic Gaussian states, which are completely specified by their second moments and Wick's theorem for higher-order correlation functions [75].
To prove many of our results later, we will require that the initial state has exponential decay of correlations, meaning there exist positive constants C Clust , ξ > 0 such that correlations decay like whereÂ andB are observables acting non-trivially only on lattice regions separated by a distance d with sizes s(Â) and s(B) respectively. For simplicity, we have chosen Â = B = 1, where · is the operator norm.

B. Constants of motion
What are the relevant constants of motion for translation invariant dynamics? The most obvious candidate consists of momentum occupation numberŝ Another set of conserved quantities are the current operatorŝ These are indeed conserved quantities, which follows because they are linear combinations of the momentum occupation numbersÎ z = (2/L) L k=1 cos(2πkz/L)n k . The current operators allow us to judge how many conserved quantities are really necessary to describe the steady state with finite experimental resolution . Due to the exponential decay of correlations Eq. (14), we have | Î z | ≤ C Clust e −z/ξ , and so | Î z | ≤ for z ≥ ξ ln(C Clust / ). So there are only z ∼ ξ non-negligible values of Î z , which form the only relevant local conserved quantities. Thus whenever equilibration occurs, then the equilibrium ensembles of any set of non-local momentum occupation numbers { n k } with the same current content will agree.
For initial statesˆ (0) with short range correlations we prove in the appendix, assuming minimal degeneracy of the dispersion relation ω k , that the steady-state obtained from the infinite-time average Γ (∞) x,y is translation invariant up to a small parameter where |x − y| is meant in the sense of the distance on the ring geometry. By this, we define our target equilibrium ensemble to have the matrix elements Note that due to thisˆ (eq) G will be translation invariant and hence the current operators can be evaluated by a strictly local measurement where d = |x − y|.

IV. POWER-LAW EQUILIBRATION
Our goal in this section is to bound how quickly timeevolved second moments t → Γ x,y (t) relax towards the timeaveraged value. The culmination of this is a bound of the form where C Γ , γ > 0 are constants independent of the system size. Let us begin by defining the decomposition of the covariance matrix Γ into its currents Γ (d) with entries where we use the convention δ a,b+L = δ a,b . Intuitively, one can find Γ (d) by picking out bands from Γ parallel to the diagonal and we will show that each band equilibrates individually to the conserved current value I d using that the evolution is linear in the bands Γ(t) = L/2 d=− (L+1)/2 +1 Γ (d) (t). Now we expand Γ (d) via the discrete Fourier transform and after a technical calculation we obtain with This step is of crucial importance. We have separated out a dynamical function f n which, when it decays, does so independent of the initial state -or colloquially it scrambles the initial state. To prove our result, we show in Appendix C that f n dephases in time with γ > 1/(6R + 6). Here, one should note that C # (nπ/L) will be constant in time but could depend on the system size. Indeed for n ≈ 1 we will have C # (nπ/L) ∼ L 2 . However, we will see that this is not an artefact of the technique that we use to obtain the bound (25) -points n with constant larger than some threshold C # (nπ/L) > C th are resilient points where f n dephases slowly and will be discussed in detail below. In the end C # has a simple form and it does not scale in the system size for very many natural initial configurations. In order to derive the bound from Eq. 25 we will study the phase function Φ t,α : [0, 2π] → R defined as Φ t,α (p) = Dp − 4t R z=1 J z sin(zα) sin(zp + zα) . (26) Choosing D = 2π(x − y + d)/L and α = πn/L we have This relation (27) is called an exponential sum and its dephasing is instrumental for the state to dephase itself. In order to bound it, we make use of the Kusmin-Landau technique [49]. This powerful machinery allows to arrive at quantitative bounds as opposed to intuitive estimates obtained from stationary phase approximations [77,78]. The crux of this method is however similar -dephasing is determined by the gaps of ω or specifically by the first derivative of Φ. By analyzing the dispersion relation E, we find a lower bound to the gaps by appropriate Taylor expansions. The bound is then determined by the values of the derivatives of Φ at points that one could view as stationary points.
We define and correspondingly for the second derivative. Due to the finite range R of the Hamiltonian, there are at most 2R+2 stationary points, which we prove in the appendix. While in the appendix we prove a more general statement, here we discuss the generic case only where we assume that there are no points such that Φ t,α (p) = Φ t,α (p) = 0. Hence, for any first order root r ∈ S (1) we either have Φ t,α (r) = 0 or Φ t,α (r) = 0. For the Taylor expansion we want to take the value of the minimal derivative that does not vanish at r so we take κ r = 1 if Φ t,α (r) = 0 and otherwise we set κ r = 2. In Appendix C we show a more general statement, but in the generic case we simply have and where we define the minimal derivative value used for lower bounding dephasing through a Taylor expansion Note that this constant is time independent hence the time scaling is governed by the smallest next order derivative which does not vanish at a stationary point. Hence, as proved in Appendix C, we obtain a bound on the dephasing of the form (25), which is a huge simplification as the bound is now encoded in the minimal value of derivatives at stationary points which is a sparse set. As an example, let us study M α ofĤ(h) with only one non-trivial coupling value J 1 = 0. Then we have the simplification Then we find that S (1) has at most 2 roots and we should evaluate the value of the second derivative at these points Now, we notice that for n ≈ 0 we have α = nπ/L ≈ 0 which means that M α ∼ α ∼ L −1 and hence C # ∼ L 2 becomes size dependent. In this case C # can be independent of the system size only if n is a significant fraction of L. However, inspecting (24) for α = nπ/L ≈ 0 we find that it will in fact not dephase for the same reason that our bound yields a large C # (α) constant as we have for times t L. Therefore we would need times t scaling in the system size for dephasing to even set in -this is an effect that we call resilience.

A. Definition of non-resilient second moments
Choosing the initial state such that Γ has substantial X (d) n around a resilient point will render the covariance matrix resilient against equilibration. This should be expected and has been discussed in the literature [2] with the simplest example being a system with a linear dispersion relation. By Eq. (9) we see that generically we will find regions in momentum space where the dispersion relation is indeed approximately linear and populating the initial state with quasiparticles from these regions will obstruct dephasing. More generally, resilience to equilibration can be characterized within the framework of resource theories [79]. Here, we have enough structure to be able to phrase a sufficient condition for correlations to be nonresilient using the above intuition.
Definition 2 (Non-resilient second moments). For a threshold constant C th > 0 independent of the system size L we call points in resilient. If for all d there exist constants C RS , C NRS > 0 independent of the system size such that the distribution X has little weight at resilient points and is bounded outside then we say that the correlations Γ are non-resilient second moments at the level C th .
The crucial mathematical feature of this definition that is needed to ensure equilibration is the system size independence of the constants such that constants derived in further bounds are also system size independent.

B. Equilibration of non-resilient second moments
We can easily see that with this definition, we can give a bound as to how fast individual currents (23) relax as long using the bound (25) where now we have the promise that C # ≤ C th . Indeed, at the resilient points we can use a trivial upper bound |f n (t)| ≤ 1, to obtain where in the second line we used t ≤ t R ∈ O(L). By the decay of correlations only currents with range of the order of the correlation length d ξ (t) = ξ ln(t γ ) will be relevant. In the appendix we show using the unitarity of the propagator that and hence one easily arrives at a bound for fluctuations of the covariance matrix entries away from equilibrium where C Γ is obtained by appropriately collecting the system size independent constants andγ ≈ γ is chosen such that The following proposition encapsulating these ideas is proven in full detail in Appendix E.
Proposition 3 (Equilibration of second moments). Consider a fermionic system with initially exponentially decaying correlations and non-resilient second moments Γ. Then there exist a constant relaxation time t 0 and a recurrence time where C Γ , γ > 0 are constants.
As we will see, this general bound must have γ ≤ 1/2 by giving a specific example with a tight relaxation scaling via the Bessel function asymptotics. On the other hand, we have that the exponent is lower bounded due to γ ≥ 1/(6R) − ε for any ε > 0, as explained in Appendix E.

C. Examples of non-resilient second moments
As the simplest example of non-resilient second moments, consider the covariance matrix Γ (0,1) of the charge-density wave corresponding to the Fock state vector |0, 1, 0, 1, . . . which will equilibrate under the nearest-neighbor model. More generally, if there is no shift symmetry of the dispersion relation any P -periodic configuration of currents will be non-resilient for intensive P not scaling in the system size, see Appendix F. This continues to hold true even in the presence of sparse defect sites at random points. This is the most important case and captures the intuition about what physically one should expect to be necessary for equilibration, namely that the mass distribution (and concomitantly currents) are already distributed over the system, albeit with possibly intensive random configurations at microscopic scales.
On the other hand, a P -periodic state with extensive P will be resilient and not relax towards a translation invariant steady state according to a power-law. Specifically, second moments of the form Γ x,x = 1 for x ≤ 1 and all other entries vanishing are resilient. Intuitively this is a block of particles over an extensive part of the system and is resilient because by the Lieb-Robinson bound one would need to wait to extensively long times for the current to become evenly distributed. Such a covariance matrix would violate our definition of non-resilient second moments already on the level of X (d) n , see Appendix F. Let us finally remark that the definition of non-resilient second moments has a linear structure and mixtures of different P -periodic covariance matrices Γ (P ) are again non-resilient, as long as the weights decay fast enough, i.e., can be non-resilient for various weights a P .
If we would like to quantify the resilience in the generic case, we may neglect physical constraints on the covariance matrix and choose Γ (rnd) x,y ∈ [a, b] uniform at random. In this case, we will indeed find non-resilience on average n ] = (a+b)δ n,L /2. However, the fluctuations are rather large as we find Var[X , so drawing a random selection from the uniform distribution will often yield a significant number of the L-many harmonics to be of the order (X ] ∼ L −1 which is too large and could lead to resilience. Yet, constructing a mixture of such matrices can smoothen the distribution and so n ] up to fluctuations decaying K −1/2 , i.e., one can get closer to the average behavior which is non-resilient. Observe, that by Eq. (24) dephasing could also occur if the Fourier weights X (d) n are larger than what we allow for in Definition 2 if they fluctuate uniformly on the scale where f n (t) does not change strongly. Later, in order to discuss equilibration of a random selection of second moments, which are physically admissible and have a finite correlation length, we will discuss thermal states of the Anderson insulator -numerically we indeed find equilibration in that case too.
Finally, note that our definition of non-resilient second moments characterizes initial states that equilibrate to translation invariant steady states. However, it is important to note that non-translation invariant steady states can also occur -due to possible shift symmetries of the dispersion relation such that we have ω k = ω k+n for all k = 1, . . . , L. The simplest example is to notice that ω k = ω k+L/2 for the next-nearestneighbor model so then f L/2 (t) = const. In this case, our definition of non-resilient second moments excludes any Γ which has significant X (d) n for n ≈ L/2 via the condition on the C # (nπ/L) ≤ C th constant. These are very special cases, see Fig. 2 and we have chosen to study equilibration exclusively towards translation invariant steady states. Notably, the nearest-neighbour model has no shift symmetry hence only states with long-range dislocations, or a population of longwavelength quasiparticles are being excluded by the definition of non-resilience. D. P -periodic initial density distributions and nearest neighbor hopping A specifically instructive case is to study the situation in which the initial state is such that the covariance matrix is diagonal with a P -periodic structure, and the system is quenched to evolve via the nearest neighbour hopping model. This means that the density distribution repeats every P sites in that Γ x,x = Γ x+P,x+P for all x. It is one of the strengths of our result that we need not care about the structure within the block because any such distribution for an intensive P is non-resilient. The steady state will be translation invariant and diagonal with the second moments given by Γ (eq) x,y = δ x,y /F where 1/F is the filling ratio. For example, if the initial covariance matrix was Γ (1,0) = diag(1, 0, 1, 0, . . .), then we have half-filling 1/F = 1/2 and for Γ (1,0,0) = diag(1, 0, 0, 1, 0, 0 . . .) we get 1/F = 1/3. When considering the evolution under a nearest-neighbor fermionic hopping Hamiltonian, and such initial conditions, we find that the propagator follows a law O(t −1/2 ) in time, as laid out in Appendix A 2, dictated by the asymptotics of Bessel functions of the first kind. This decay is inherited by the actual correlation decay, in that for any P -periodic initial condition, one finds that for all x, y. It is also interesting to note that the resulting steady states can be seen as an infinite temperature Gibbs state at a specific chemical potential which imposes the value of the total particle number. Specifically, one finds for the equilibrium covariance matrices from which one can obtain the value of the chemical potential in explicit form µ = − ln(F −1) for any x, y due to translation invariance.

V. QUASI-FREE ERGODICITY
One of the key questions of statistical mechanics is what precise properties of the Hamiltonian governing the dynamics can be held responsible for the emergence of aspects of quantum statistical mechanics. In classical mechanics it results from sufficient transport properties which is evident already in Boltzmann's H-Theorem. In the quantum regime for free systems, a notion with similar operational meaning can also be identified, namely that propagators decay quickly, which holds with surprising generality and can be interpreted as a lower bound to particle transport.
Theorem 4 (Free fermionic ergodicity). Let t → G(t) be the propagator for a non-interacting translation invariant fermionic HamiltonianĤ(h) which is off-diagonal on the one-dimensional real-space lattice. Then for all times t between a relaxation time t 0 ∈ O(1) up until a recurrence time 10 20 30 40 50 Lattice site x Covariance matrix of a charge-density wave Γ (0,1) which corresponds to the Fock state vector |0, 1, 0, 1, . . . has varying equilibration behavior depending on the locality of the Hamiltonian and the system size parity. For the next-nearest-neighbor model the system in this special initial state splits up into two independent sublattices and is in an exact steady state whenever the system size L is even. However, for odd L the symmetry of the density distribution is incommensurate with the system size and there is necessarily a defect of the type |. . . , 0, 1, 0, 0, 1, . . . or |. . . , 1, 0, 1, 1, 0, 1, . . . around which the charge-density wave pattern starts becoming homogeneous. Note, that away from the defect point the charge-density wave looks locally like a steady state of the Hamiltonian and one can prove by the LR bounds that the middle region will remain unaffected for extensively long times. The left plot shows Γ (0,1) (t = 1.5) after a quench to the next-nearest-neighbor model (the inset throughout shows the sub-block of the first 10 sites). On the other hand, if we quench to the nearest-neighbor model then there is no transient symmetry present and the charge-density wave is completely nonresilient and homogeneously tends towards equilibrium as seen in the right plot for the same initial state.
where C, γ > 0 are constants. We can take γ = 1/3, provided there are no points p such that E (p) = E (p) = 0 which is true for generic models.
We can interpret Theorem 4 as proving free-particle ergodicity for these models. This notion of ergodicity is motivated by the classical notion of ergodicity, which states that an ergodic system essentially explores the whole available phase space, and it does so homogeneously. In free systems, we have to respect the linear constraint in the relation (10) at all times and given that, the suppression (47) allows to show that the particles must spread over the lattice. Indeed unitarity of the propagator which together with the unitarity of the propagator can be used to show that y+ 0 x=y− 0 |G x,y (t)| > O(1) for all times, i.e., particles cannot spread by more than the localization length 0 . The proof of Theorem 4 is given in Appendix D and is again based on Kusmin-Landau inequality [49]. It could be also of general interest as a method for deriving error-bars for stationary phase arguments in field theory. Note that one can explicitly calculate the relaxation time and the recurrence time, t 0 and t R , using only the dispersion relation.

A. Gaussification is generic
Combining the above results with insights from Ref. [50] lead to a remarkably strong result. Ref. [50] presented results on how non-interacting fermionic quantum systems that show delocalizing transport would "Gaussify", that is, turn to a quantum state that is Gaussian to an arbitrarily good approximation in time. However, Theorem 4 shows precisely this: Non-interacting one-dimensional models generically exhibit delocalizing transport. We hence arrive at a statement of a rigorous convergence to a generalized Gibbs ensemble with enormous generality. When stating this Gaussification theorem, we define the stateˆ G (t) to be a Gaussian state with the same covariance matrix asˆ (t).
Theorem 5 (Fermionic generic Gaussification). Consider the initial fermionic stateˆ (0) with exponential decay of correlations and a non-interacting translation-invariant post-quench Hamiltonian with dispersion relation E(p) such that there are no points with E (p) = E (p) = 0 for any p. Then we have the asymptotic bound VI. PROVING THEOREM 1 In this section we collect all our findings that lead to the statement of Theorem 1. Within the setting described above the two crucial ingredients are an initial state featuring exponentially decaying correlations and the quench Hamiltonian being translation invariant. By Theorem 5, we have that at a sufficiently large time any local correlation function can be approximated by the value obtained from the Gaussified state. That is, it suffices to take the second moments Γ of the initial stateˆ (0), evolve them according to the quench Hamiltonian and evaluate Â ˆ (t) by appropriately employing Wick's theorem for Γ(t). We hence find equilibration Â ˆ (t) ≈ const if Γ(t) ≈ const is time independent. This is already the case if we perform the quench starting from a translationinvariant non-Gaussian state because then the covariance matrix Γ is also translation invariant and so Γ(t) = Γ(0) because ∂ t Γ(t) = i[h, Γ(t)] = 0. For such cases Gaussification is sufficient for equilibration [50]. However, thanks to Proposition 3 we obtain a much more general statement. Namely, any nonresilient covariance matrix will equilibrate. This result applies to very natural initial conditions that can dramatically deviate The green line is a guide to the eye scaling as ∼ t −1/3 . At some point, the power-law relaxation must level off either due to finite system size, with the ultimate small parameter being ∼ t −α R ∼ L −α , or due to the specific quasiparticle content X (d) .
from a homogenous configuration. The relaxation takes the form of a power law O(t −1/6 ) determined by the Gaussification times. This however is an artifact of our rigorous uniform bounds -one should expect the calculation for a special configuration from Refs. [42,54] to be generic O(t −1/2 ). A proof of such a behavior being the standard time-scale may be possible but would involve a significantly more detailed treatment of the wavefront which is responsible for our scalings not being tight as compared to the behavior in the bulk of the Lieb-Robinson cone [50].

A. Quenches of the Anderson insulator to an ergodic translation invariant Hamiltonian
As a numerical illustration, in this section, we discuss the situation arising from starting in the thermal state of a disordered Anderson insulator, initially not translation invariant, followed by a quench to a perfectly translation invariant ergodic hopping Hamiltonian. Needless to say, the equilibrium states emerging are once again generalized Gibbs ensembles and Gaussian states: It is interesting to note, however, that they resemble fully thermal states with high probability to a rather good approximation.
To be specific, as a starting point we choose an initial covariance matrix which is not translation invariant and has a finite correlation length. A natural way of assigning such initial conditions is to consider a Gibbs state of the Anderson . This can be quantified with a comparison to the thermal state of nearest-neighbor hopping Γ (β,µ,fit) obtained from fitting over the temperature β and chemical potential µ (bottom-left). While deviations are seen the inverse system size L −1 = 10 −2 is not a stringent small parameter. However if equilibration occurs for larger systems, then it will be towards the infinite time average Γ (∞) which looks thermal at already small system sizes (bottom-right) and this property is retained when going towards the thermodynamical limit.
insulator with Hamiltonian where the noise is uniformly distributed in the interval ξ x ∈ [−w, w] for w > 0. We study the quench consisting in switching off the disorder, i.e., setting ξ x = 0 for all x. Following a numerical calculation, the quenched state Γ (Quench) (t) = Γ (β,Anderson) (t) can be seen to become largely homogeneous for sufficiently long duration of the evolution, see Fig 3. As a measure of equilibration, we make use of the max norm distance x,y between two covariance matrices Γ (1) , Γ (2) . Whenever Γ (1) − Γ (2) max is small, a large fidelity between the two states is implied [80,81]. Fig. 3 provides further substance to the above established rigorous insights, in that a significant part of the equilibration is indeed governed by a power-law by comparing Γ (Quench) (t) to the infinite time average Γ (∞) . To further elaborate on this setting, we discuss the features The best fit to a thermal state is given by an infinite temperature state with the corresponding filling ratio and strongly deviates from the steady state Γ (∞) . This is despite the density distribution becoming homogenous because the state deviates from the thermal ensemble by the absence of the nearest-neighbor current I 1 and the presence of the next-nearest neighbor tunneling I 2 . Note that there are particles and currents present on the initially unoccupied sublattice too. This showcasts a general approach to creating initial conditions that demand a description in terms of a GGE by exploiting the existence of memory in terms of conserved local currents.
of the equilibrium state, see Fig. 4. We begin by investigating the difference between the quenched state Γ (Quench) (t) and a fit to a thermal covariance matrix Γ (β,µ,fit) of the quench Hamiltonian obtained from fitting over the temperature β and chemical potential µ. We find that the discrepancy is already diminished for L = 100 and |Γ x,y | is homogenously distributed. Forthe infinite time average we have Γ (∞) − Γ (β,µ,fit) max ≈ 10 −3 as the distance to the Gibbs state. The upshot of the findings is that due to a concentration of measure effect, the resulting generalized Gibbs ensembles are with high probability close to an actual Gibbs state, with stray fluctuations being detected. We discuss further details of this argument in Appendix F 5.

B. Realizing generalized Gibbs ensembles in optical lattices
Ultra-cold atoms trapped in optical lattices [82] have proven to be an excellent platform for studying relaxation phenomena [83,84] in instances of quantum simulators [85][86][87], because the system is well isolated from the environment during the evolution and one can prepare with high-level of control states that have very visible non-equilibrium dynamics after the quench. Here we hint that with the present techniques that have been used in various settings one can prepare two different initial states that will equilibrate to two different steady states which are easily distinguishable -despite the Hamiltonian governing the dynamics being the same in both cases. This is expected to be possible at least for intermediate times in instances of prethermalization, before interac-tion effects will lead to a genuine full thermalization. The first steady state would be one obtained from simply letting the gas equilibrate on the lattice. For the second type of the steady state, we would prepare the initial state in the same way and perform the quench by suddenly doubling the lattice by adding in-between sites, exploiting optical super-lattices, similar to the situation described in Ref. [83]. In that situation the initial covariance matrix will feature a checker-board pattern with only the odd sites being occupied and currents being non-zero only between odd sites, see Fig. 5. Note that the specific details of how the doubling is performed are not important as long as the initial state preparation will feature a charge-density wave pattern -however it is absolutely crucial for our example that the charge-density wave is also present in the current structure. The quench then consists in allowing for tunnel between all sites. By our analytical result, the density pattern which is a P = 2-periodic block structure will equilibrate to a uniform distribution at each site. The same, again, will occur for each current individually. Usually, the nearest-site tunneling current will be the strongest so if we had I 1 = Î 1 before the doubling then the current will equilibrate to I 1 /2 after the doubling of the lattice. However, the surprise value lies in the fact that this will be the next-nearestneighbor current I 2 in the new lattice, and in the steady state the final nearest-neighbor current I 1 should not be present. That is, after the quench, one will observe that there are only currents in the system in multiples of two sites, cf. Fig. 5. This is a non-trivial observation, because the sites that have been un-occupied immediately after the doubling will become occupied and there will be currents flowing out of them to the next-nearest sites, i.e., the neighboring initially un-occupied sites. In contrast, in the steady state there will be no tunneling between the nearest-neighbor sites which is unintuitive as the Hamiltonian is nearest-neighbor showcasing peculiar memory effects that can be obtained with quenches to quasifree evolution. Realizing such a setup in gases where interactions can be controlled by a Feshbach resonance would also allow to study how nearest-neighbor currents can be generated by many-body scattering, an effect not present in a noninteracting Hamiltonian [88][89][90][91].

VIII. DISCUSSION AND OUTLOOK
In this work, we have established a widely applicable and very general situation in which the convergence to generalized Gibbs ensembes can be proven. Specifically, we have shown in large generality that for large classes of natural initial conditions, local expectation values of systems relaxing under unitary dynamics generated by non-interacting Hamiltonians take the values of translation invariant genaralized Gibbs ensembles. The emerging steady state is parametrized by thermodynamical potentials whose number is intensive, namely of the order of the initial correlation length in units of the lattice spacing. Our assumption is that the quadratic Hamiltonian is translation invariant which leads to homogeneous spreading of particles on the lattice, a generic effect which we describe as a possible notion of ergodicity for quasi-free quantum systems.
We have given numerical examples illustrating our rigorous statements and explain how to observe non-trivial generalized Gibbs ensembles in, e.g., an optical lattice experiment.
Specifically, we saw that locally the memory of initial, possibly non-Gaussian, correlations is lost via the process of Gaussification, which relies only on finite correlation length in the initial state. Hence, even if the initial state preparation involves intricate interactions, a quench to quasi-free evolution will lead to a loss of memory of these initial strong correlations, and the state will obey Wick's theorem up to an error decaying algebraically in time. Such states (i.e., Gaussian) are determined only by their covariance matrix which we show to equilibrate. A necessary condition for this was that the initial current and density distributions did not have large-scale structure (which may still equilibrate, but only after a time of the order of the system size [38]). Thus, we derived a rapid polynomial time-scale for equilibration (which is independent of the system size). More precisely, the deviation from equilibrium of any normalized local correlation function is bounded by ∈ O(t −γ ), and the scaling is functionally tight, which we showed numerically.
The goal of our work was to show that it is possible to make rigorous statements concerning the dynamical emergence of statistical mechanics in mean-field models. For this reason we had to leave several aspects of the subject unanswered. Within our setting we have not discussed in detail the possibility of the infinite-time dephasing leading to steady states which are non-translation invariance due to degeneracy of the dispersion relation, and the proof of Lemma 11 in the appendix hints at that. It would also be interesting to understand in more detail if Gaussification is possible for Green's functions which have only very weak quasi-free ergodicity, i.e., |G x,y (t)| ∈ O(t −γ ) for γ < 1/4 for a significant number of entries x, y. If the argument in Ref. [50] is optimal then one should observe for γ < 1/4 a temporal persistence of deviations from Wick's theorem for quenches of non-Gaussian states.
Concerning the question of adding small interactions one would expect the GGE examples that we have given to eventually thermalize. Understanding the dynamical stability of the GGE description is important for applications, e.g., work extraction protocols [92] but also is instrumental for our conceptual understanding of the emergence of thermalization. Above, we have hinted at an open problem of characterizing the structure of dephased states as being thermal in light of computational complexity and that an interesting approach would be to first make progress concerning high-temperature quenches.

NOTE ADDED
Upon completion of this manuscript, a preprint presenting closely related results appeared [93]. Our work puts significantly more emphasis on including rigorous error bounds, whereas Ref. [93] stresses more the physical intuition underlying the phenomena observed. The methods are also somewhat different (though related in spirit), as Ref. [93] uses stationary phase approximations, while we employ the machinery of Kusmin-Landau bounds.
In this section we will derive the propagator representation from the main text. All statements concern quasi-free Hamiltonians conserving the total particle number.
a. Fermions. A fermionic annihilation operator acting on mode x is denoted byf x . These operators obey the canonical Quasifree fermionic Hamiltonians conserving the particle number are of the formĤ where h = h † ∈ C L×L is the coupling matrix for a finite system size L.
Proof. We begin by noticing thatf x (t) is differentiable and take a time-derivative obtaining which is the Heisenberg equation of motion. We further notice that which means that we need to evaluate the commutator at t = 0. Next, we calculate the commutator which gives by linearity This allows us to write the above Heisenberg equation of motion explicitly as This is a system of L linearly coupled ordinary differential equations and is solved bŷ where G * (t) = e −ith ∈ U (L). Indeed, this becomes apparent if one considers a vectorf = (f 1 , . . . ,f L ) then we get in vector notation Quasifree bosonic Hamiltonians conserving the particle number are of the form where h = h † ∈ C L×L is again the coupling matrix for a finite system size L.
Proof. Again, the Heisenberg equation of motion is and it suffices to evaluate the commutator at t = 0. We have which gives by linearity This is again a system of L linearly coupled ordinary differential equations with the solution where G = e −ith ∈ U (L). Here, we have used the general correspondence c. Translation invariance. Let us considerĤ whereâ stands either forf in the case of fermions orb for bosons and h = h † ∈ C L×L is the coupling matrix for a finite system size L. The above two paragraphs have shown that In this paragraph we will be interested in translation invariant Hamiltonians.

Lemma 8 (Translation invariant propagator).
Let h be translation invariant couplings with hopping amplitudes J k . The propagator is given by where ω k = J 0 + 2 L/2 z=1 J z cos(2πkz/L).
Proof. A translation invariant model has couplings which satisfy h x,y = h x+z,y+z with periodic boundary conditions. Below we recall that such matrices are called circulant and are diagonalized by a discrete Fourier transform. Hence we can write with ω k as above which is obtained by an explicit calculation using Fourier modes. Using the formula G(t) = e −ith , we hence get for the propagator.

Circulant matrices
In this section we gather some basic facts about circulant matrices, leading up to the characterization that these are exactly the matrices diagonalizable by a discrete Fourier transformation. Additionally we describe simple formulas for the spectrum in the general case and for periodic boundary conditions. We begin by giving a precise definition of a circulant matrix.
The name comes from the fact that in a circulant matrix the k-th row is a circulant shift of the first row by k−1 steps to the right. That, is if (J 0 , J 1 , . . . , J L−1 ) is the first row then the second is (J L−1 , J 0 , . . . , J L−2 ), the third (J L−2 , J L−1 , J 0 , . . . , J L−3 ) and altogether We see that it is enough to know the vector of (hopping) amplitudes J z = h 1,1+z for z = 1, . . . , L − 1 to describe the whole matrix h. A translation invariant Hamiltonian H(h) has a couplings matrix which is circulant but also hermitian. This means that J L−1 = J 1 , J L−2 = J 2 and in general J L−z = J z . In that case J 0 , J 1 , . . . , J L/2 are necessary to parametrize the matrix.

Lemma 10 (Circulant matrices and discrete Fourier transforms).
A matrix h is circulant if and only if it is diagonalized by a discrete Fourier transform. For k = 1, . . . , L the eigenvectors are where φ k = e 2πik/L and the corresponding eigenvalue read We are interested in circulant matrices which represent couplings or covariance matrices and in the real symmetric case we have Proof. To show the first direction, we will show that ψ † k (hψ k ) = λ k (h)δ k ,k . We have which is by definition of J what we were looking for. For the converse direction, we must show that a rotation by the discrete Fourier transform matrix of a spectrum λ yields a circulant matrix. We do this by checking the defining propertỹ Thus, the matrixh is circulant.

Bessel function asymptotics
A particularly insightful situation is the special case of a nearest-neighbor fermionic hopping Hamiltonian, setting J 0 = 0 and J 1 = 1. In this situation, we simply obtain and hence for the propagator. In the limit of large L, this can be seen as a Riemann sum approximation [54] to the integral where J l : R → R is the Bessel function of the first kind. The error in this approximation can be bounded from above as These Bessel functions satisfy for all x, y. That is to say, in this situation, one gets an equilibration following a O(t −1/2 ) behavior. This behavior of the propagator is actually inherited by the actual correlation decay.
Proof. Let us consider the action of the dephasing map on the initial covariance matrix Γ (eq) = lim T →∞ 1 T T 0 Γ(t) for two sites x, y which reads Next, we will use the assumption concerning the minimal degeneracy of the dispersion relation which gives We notice that the condition k = k gives δ x−y, . The other condition k = L − k also leads to simplification but one needs to be careful to observe that for L even we may obtain k = L − k and k = k = L/2 and such terms are already included in the previous sum. Additionally, k = L gives no solution to k = L − k and so together we have Here we identify the first line to give a current expectation value Î |x−y| . The inner sum in second line gives Lδ x+y−x −y − 1 and the third is either 0 or can be bounded from above Finally, we make use of the exponential decay of correlations |Γ x,x+z | ≤ C Clust e −z/ξ , to get obtaining Employing a similar bound for the first term we obtain Lemma 12 (Relevant currents). Consider a state with covariance matrix Γ and exponentially decaying correlations parametrized by the correlation length ξ > 0. Then for any time t we have Proof. After a technical calculation using the definition of Γ (d) we find The second line follows from the inequality where we are thinking of Γ z,z+d δ w,z+d as a matrix, with operator norm given by max z |Γ z,z+d |. The third line follows because G(t) is a unitary matrix, so its rows and columns are orthonormal vectors. In the last line, we have used the definition of exponentially decaying correlations.

Appendix C: Bound on oscillatory sums of sequences with compact Fourier representation
We would like to prove a general bound on oscillatory sums of the type appearing in the main text where the phase sequence can be decomposed in a Fourier series with bounded number of harmonics. Specifically, we define a smooth phase function Φ t : [0, 2π] → R of the form where t, d, J 1 , . . . , J R , α 1 , . . . α R ∈ R with J = 0. It will be convenient to define which plays, e.g., the role of a dispersion relation and we have Φ (κ) t = tΦ (κ) for all higher-order derivatives κ > 1. If we additionally define p k = 2πk/L, then the sequence of interest will be for k = 1, . . . , L, where L as in the main text stands for the system size. Note that our results become non-trivial for L ≥ t 0 where t 0 is the relaxation time which dependents only on J z . Physically, it is always given that L is asymptotically large giving a uniform small parameter, so for system sizes of interest our requirements should be fulfilled. Mathematically all our statements remain correct by defining [t 0 , t R ] = ∅ if t 0 ≥ t R but it should be stressed that when L is large enough we obtain a very non-trivial bound with t 0 < t R . Before we state our main theorem of this section, let us make the following definitions. We will use the Kusmin-Landau bound and the role of stationary points will be taken by the roots of Φ t denoted by and the extremal points of the group velocity Φ t The former set of points are exactly the points of vanishing group velocity while for the latter the band curvature vanishes. Let us make additionally the following definition useful for Taylor expansions around roots r ∈ S = S (1) ∪ S (2) . In general for r ∈ S we define to be the minimal integer such that for r ∈ S (a) where a = 1, 2 the (κ r + a)'th derivative does not vanish |Φ (κr+a) t (r)| = 0. Additionally, will set the scaling of the final bound in the following theorem.
This theorem will for example allow us to bound as a function of time and with constants expressed in the analytic properties of Φ t . The following lemma attributed to Kusmin and Landau [49] will be our key tool.
To apply Lemma 14, we need to understand the discrete Kusmin gaps defined by and show that they are separated from 0 and 2π by some λ > 0 on a constant number of intervals where they are also monotonous. Because ϕ k = Φ t (p k ) we can use the mean value theorem obtaining for somep k ∈ [p k , p k+1 ] and 2π/L is the size of the interval to which we apply the theorem. We denote the summation domain by and collect the correspondingp k points in Observe, that by the mean-value relation (C14) the Kusmin gaps are monotonous on some I r ⊂ I if Φ t is monotonous on I r = conv(I r ). By using the mapping k →p k , we can define intervals in D associated to any subset I r ⊆ I defining D r = {k ∈ D s.t.p k ∈ I r }. We want to divide D into intervals where δ are monotonous. After an elementary application of the triangle inequality, to each such region we want to apply the Kusmin-Landau bound. It is important to notice that their number is bounded above by the maximal coupling range R according to the following lemma.
Lemma 15 (Roots). The function Ω : [0, 2π] → C defined by where a j , b j ∈ C has at most 2R roots as long as a = 0 or b = 0.
Proof. Define a complex polynomial Y : and observe that it is not identically zero and has degree at most 2R and hence at most 2R roots. Further, note that when restricted to the unit circle S 1 in the complex plane we have Y (e ip ) = e iRp Ω(p) for any p ∈ [0, 2π]. From this we see that whenever Ω(p) = 0 for some p ∈ [0, 2π] then u = e ip is a root of Y because the multiplicative prefactor e ip does not remove any roots. Thus the number of roots of Ω cannot exceed the number of roots of Y which is upper bounded by 2R.
Corollary 16 (Number of roots of phase functions). The phase function Φ t and all its derivatives Φ t , Φ t etc. have at most 2R roots.
As we can easily check without loss of generality we can assume that Φ t (0) = Φ t (2π) = 0 and hence the interior between consecutive extremal points in S = S (1) ∪ S (2) defines at most 4R + 2 intervals where Φ t (p) and Φ t (p) have a fixed sign. Specifically, we define these intervals as the points in I that lie between two consecutive roots from S and denote them by I r ⊂ I and note that r ranges from 1 to some R 0 ≤ 4R + 2. If, e.g., Φ t (p) = cos(p), then R = 1 and we have R 0 = 4 regions. We now establish condition i) of the Kusmin-Landau Lemma which concerns monotonicity.
Lemma 17 (Monotonicity). Let r < r 2 be two consecutive points belonging to S. Then the Kusmin-Landau gaps δ k are monotonous for all points k corresponding to the interval I r = I ∩ [r, r 2 ].
Proof. Between r and r 2 the first derivative of the phase function Φ t must be non-zero or otherwise there would be an intermediate root which is not possible as r and r 2 are consecutive. There is also no intermediate root of the second derivative Φ t so it must have a fixed sign on the interior of the interval hence the derivative Φ t is either weakly increasing or decreasing and so the Kusmin-Landau gaps δ k must be monotonous.
It will be useful to observe that any κ r can be bounded by the range R.
Proof. Consider again the non-zero polynomial Y associated to Φ t or Φ t as described above. Then Y has degree at most deg(Y ) ≤ 2R and because J = 0 we have that Φ t = const and so Y = 0. Now, if we had that Y (z 0 ) = Y (z 0 ) = . . . = Y (2R) (z 0 ) = 0 then, for any z by Taylor expansion, we would find Thus, for Y (e ip ) = 0 to be true at some point e ip then Y (n) (z 0 ) = 0 must be true for some n ≤ 2R.
With this definition we can further set the constants This quantity is finite and independent of L by the above remark and definition of κ r , and because the numerator can be upper bounded by Furthermore, we define the time-independent constant We will now show that, after removing a small amount of points close to the border from each of the intervals I r , for the remaining points the Kusmin gaps will be lower and upper bounded. More precisely we define the two scalings that we shall use Proof of Theorem 13. Let us use the elementary observation that for any a together with the fact that our phase function is always periodic up to a constant Observing that in the absolute value of the total sum any constant term in Φ t drops out, we may assume that Φ t (0) = 0 without loss of generality. Then using that the cosine functions are 2π periodic we also find that Φ t (2π) = 0. With this step we reduced the total sum to a sum over the intervals I r where the boundary points are appropriate roots. Consider r ∈ S and the corresponding interval I r . Without loss of generality we may assume that Φ t (r) < Φ t (r 2 ) and hence our task is to lower bound the Kusmin gaps around r. (If on I r the gaps are negative then we can simply lower bound Φ (−) t = −Φ t , while in the case Φ t (r) > Φ t (r 2 ) we would have to lower bound the Kusmin gaps around r 2 which can be done the same way). By the monotonicity lemma, this assumption implies Φ t > 0 on I r .
Step 1: and Step 1, case 1: r ∈ S (1) . In this step, we expand around r, to obtain where the Lagrange error term in the first line is evaluated at someq ∈ [r, r + q t ]. We will show that for t ≥ t 0 we have We have that Φ t (p) > 0 so by Eq. (C29) we infer using (C30) that Φ (κr+1) t (r) > 0 and hence we have a non-trivial lower bound of the form and observe that 2πλ r /L ≥ λ. It thus remains to show (C30) which follows easily noticing that we can make q t sufficiently small using t ≥ t 0 . This condition is implied by finding that which is equivalent to (C34) This can be shown to be true by invoking the definition of t 0 .
Step 1, case 2: r ∈ S (2) . Expanding around r + q t we obtain where the Lagrange error term in the first line is evaluated at someq ∈ [r + q t , r + q t + p t ]. Note that we choose q t and p t small enough such that there is no repeated roots at this step.
We will show below that We next continue to expand Φ t (r + q t ) around r using the Taylor expansion where the last term is the Lagrange error term, soq ∈ [r, r + q t ] and κ r ≥ 1 was defined above. We check that q t is sufficiently which after a simple rearrangement leads to that observation. This in turn implies that Note that this is a non-trivial bound as due to the monotonicity on I r we have Φ t > 0 and so Φ (κr+2) t (r) cannot be negative because the other term on the right hand side of (C40) would be too small to make the whole right hand side positive. We are now in the position to check that condition (C38) is satisfied which is implied by showing which is equivalent to (r) ≥ 0 we find that this is true by comparing to the definition (C22). With this result we obtain the lower bound (C39) and explicitly inserting the time dependence arrive at where again we find 2πλ r /L ≥ λ, as desired.
Step 1 summary: Using (C14) we obtain the following uniform bound lower bound δ k ≥ λ for k ∈ I r ∩ S c t and any r ∈ S.
Step 2: Upper bound |δ k | ≤ 2π − λ. We show this by the bound Note that we can always take |d| ≤ L/2. To see this, suppose that, e.g., L > d = x − y > L/2. Then we can replace x by x = x + L, which does not affect the propagator, but now we have |x − y| ≤ L/2. A similar trick works if x − y < −L/2. So we can upper bound 2π|d|/L by π, and we |δ k | ≤ 2πtC Next, we make use of Eq. (C20) to see that Hence for each I r we can apply the Kusmin bound for times t 0 ≤ t ≤ t R .
Step 3: Use the Kusmin-Landau lemma and obtain the final bound. Summing up the discarded contribution and taking into account the bound on the number of the monotonous intervals we obtain the bound where we have used that there are at most 4R + 2 Kusmin-Landau intervals that we restrict each at the edges by fewer than L2(p t + q t )/π points and where the last term is the Kusmin-Landau bound. Inspecting the definition of C 1 we find that (C52) Here, we have put the absolute values such that the bound in this form remains valid for monotonously growing and decreasing intervals. Hence, the constant C # reads C # := 6(2R + 1) max 1, min which evaluates to for p = 2π/L and d = x − y. By Theorem 13, we hence find the bound with C # given in Eq. (C53) being system size independent as the couplings are fixed. The relaxation and recurrence times t 0 and t R are given implicitly by the constraints in the proof of Theorem 13. The generic behavior of the exponent γ = 1/(3κ 0 ) is obtained for κ 0 = 1 which is attained at the wavefront of the nearest-neighbor hopping model [50].
Appendix E: Equilibration of the covariance matrix Proposition 20 (Equilibration of second moments). Consider a fermionic system with initially exponentially decaying correlations and non-resilient second moments Γ. Then there exist a constant relaxation time t 0 and a recurrence time where C Γ , γ > 0 are constants.
Proof. Our goal is to bound how quickly Γ x,y (t), where Γ(t) = G(t)ΓG(t) † , relaxes towards its time-averaged value. Let us define the decomposition of Γ into its currents, that is Γ = where we use the convention δ a,b+L = δ a,b . The evolution is linear, so Γ(t) = Our target equilibrium ensemble has matrix elements given by We observe that it is constructed by taking for each Γ (d) the uniform average which gives the value I d of the d-th current in the initial state. Hence, we have the bound After these steps organizing the notation, we will present a first non-trivial bound showing that in the above sum only the currents with d ≤ d ξ (t) contribute significantly. This is natural because of the exponentially decaying correlations so it suffices to use lemma 12 with d ξ (t) = ξ ln(t 1/(3κ0) ) .
Then the currents d > d ξ (t) will negligibly contribute to Γ(t) − Γ (eq) max for sufficiently large t > t 0 . So we consider d ≤ d ξ (t). Now we expand Γ (d) via the discrete Fourier transform Then we have that Recall the definition of the propagator Then we get applying it to the sum over r while summing over s, n. Then we find that −r + s + n = µL has solutions with either µ = 0 or µ = 1 but not at the same time because of the variable range r, s, n ∈ [L]. Indeed, 2 ≤ s + n ≤ 2L and we have the unique solutions r = s + n for s + n ≤ L and r = s + n − L for s + n ≥ L. Thus, using ω k+µL = ω k we get In the last line, we have defined The equilibrium currents will be uniform as established above so we need to bound As we have observed in the main text, we have with K z = −4J z sin (πzn/L). In order to use our general theorem, we define Φ t (p) = −4t R z=1 J z sin(αz) sin (zp + zα) + p(x − y − d) and by evaluating with α = πn/L and p k = 2πk/L, we can obtain ϕ k = Φ t (p k ) = (ω (k+n) − ω k )t + 2πik(x − y − d)/L .
We hence obtain the bound |f n (t)| ≤ C # (α)t −1/(3κ0) (E23) where now C # depends on the derivatives of (E21) evaluated at roots and we indicate the dependance on α as for α ≈ 0 the constant would not be system size independent. As long as K z have no stray dependence on L these values are constant numbers in the system size so we can scale up the system size and get a non-trivial bound. Hence, we now use the assumption of non-resilient correlations obtaining where we have used L −1 ≤ t −1/(3κ0) , which holds true for t ≤ t R ∈ O(L −1 ). We now finalize the total bound by Γ x,y (t) − Γ (eq) x,y ≤ 2d ξ (t) max x,y (t) − Γ (eq) x,y + C Clust 1 + e −1/ξ t −1/3κ0 (E27) where we have defined Observe that κ 0 ≤ 2R. Thus, for sufficiently large t we have for some ε > 0 the final bound Γ(t) − Γ (eq) max = max x,y Γ x,y (t) − Γ (eq) x,y ≤ C Γ t −1/(3κ0)+ε .

(E30)
Appendix F: Examples of non-resilient second moments 1. m-step periodic states Suppose Γ z+d,z is m-step periodic, with = L/m ∈ N, so that Γ z+d+m,z+m = Γ z+d,z . We get

Random dislocations
Suppose Γ (d) can be decomposed as Γ (d) = Γ (d,N R) + Γ (d,SR) where Γ (d,N R) is non-resilient and Γ (d,N R) has sparse support. Then Γ (d) is again non-resilient. This follows trivially as for the sparse part the Fourier transform is bounded by the inverse system size where S ∈ O(1) is the number of the sparse entries in Γ (d,SR) .

Uniformly random currents
Take Γ z+d,z ∈ [a, b] to be uniformly and independently distributed. Then Γ (d) is non-resilient. Indeed, we find that on average, we have We furthermore calculate the second moment using with probability greater than 1 − (CK) −2 . 1−e iπ = 1, n is odd 0, n is even .
Therefore there is no chance that we can get a non-trivial bound of the form because it will scale with the system size as ∼ LC th t −γ /2 due to the number of non-trivial harmonics.

Details of quenches from disordered to translation invariant models
In this section, we provide more details on the discussion of the simularity of averaged generalized Gibbs ensembles with thermal ensembles. In this context, it is useful to discuss the exact quench state Γ (Quench) (t) and the infinite-time average Γ (∞) on the level of quasiparticle occupation numbers in momentum space. For the quenched state these stay constant for all times due to unitarity and we have checked that typically there are initially fluctuations around the idealized Fermi-Dirac distribution but a Fermi edge can be observed. However, in order to obtain the infinite time average we apply a dephasing channel which as it is not unitary does change the occupation numbers despite conserving the relevant local currents. In that case we have noticed a much smoother quasiparticle number distribution, resembling much closer the thermal Fermi-Dirac distribution. Additionally we noticed that for the same model with the noise uniformly distributed in [0, w] the resulting equilibrium state is not thermal, but additionally considering a chemical potential leads to agreement. These observations suggest that the equilibration process that we have discussed analytically leads to a thermal steady state for the particular selection of initial states discussed here which are thermal states of a Hamiltonian with the same tunneling range as the quench Hamiltonian. It appears to be an interesting question whether we can in general expect GGEs which are simply thermal steady-states in the natural case occurring in many physical instances where the kinetic energy is an inherent property of the system over which we have little control and we prepare thermal initial states by controlling only the on-site potential by external forces.
A peculiarity stemming from quasi-free integrability is that it is enough to have access to only one translation invariant quench Hamiltonian to prepare thermal states of any other translation invariant model just by assigning the initial correlation content. In particular being able to tune the correlation length is a crucial ingredient, but the initial correlations need not be translation invariant or even Gaussian, as we have discussed above. Thanks to the Gaussification result, one can also use many-body interactions to tune close to a phase transition in order to increase the correlation length even if the state obtained will be non-Gaussian. Indeed we have a proof of equilibration to a Gaussian state but now the steady state may acquire an unusually large correlation length for a thermal state of the quench Hamiltonian. This "one to rule them all" result shows that the properties of the equilibrated state may be unrelated to the range of the dynamics, which is slightly at odds with the usual approach to inferring microscopic properties of various materials. It would be interesting to see whether experiments measuring only conductivity or other linear response properties could be adversarily tricked to indicate always different dynamical models by getting different states as input while the true Hamiltonian is always merely the nearest neighbor model. Such interactive experiments may be possible with existing quantum simulation technologies [82]. On the other hand if precise microscopic measurements are limited, then observing just the fundamental qualitative properties such as the formation of the Fermi edge which determines solid state properties should be a generic feature independent of the memory effect due to integrability. As there is only few trailblazing works concerning what happens to a GGE in the presence of weak interactions [88][89][90][91] it would be exciting to study this systematically in optical lattices experimentally.