Time-evolution of local information: thermalization dynamics of local observables

Quantum many-body dynamics generically results in increasing entanglement that eventually leads to thermalization of local observables. This makes the exact description of the dynamics complex despite the apparent simplicity of (high-temperature) thermal states. For accurate but approximate simulations one needs a way to keep track of essential (quantum) information while discarding inessential one. To this end, we first introduce the concept of the information lattice, which supplements the physical spatial lattice with an additional dimension and where a local Hamiltonian gives rise to well defined locally conserved von Neumann information current. This provides a convenient and insightful way of capturing the flow, through time and space, of information during quantum time evolution, and gives a distinct signature of when local degrees of freedom decouple from long-range entanglement. As an example, we describe such decoupling of local degrees of freedom for the mixed field transverse Ising model. Building on this, we secondly construct algorithms to time-evolve sets of local density matrices without any reference to a global state. With the notion of information currents, we can motivate algorithms based on the intuition that information for statistical reasons flow from small to large scales. Using this guiding principle, we construct an algorithm that, at worst, shows two-digit convergence in time-evolutions up to very late times for diffusion process governed by the mixed field transverse Ising Hamiltonian. While we focus on dynamics in 1D with nearest-neighbor Hamiltonians, the algorithms do not essentially rely on these assumptions and can in principle be generalized to higher dimensions and more complicated Hamiltonians.


Introduction
A numerical simulation of a many-body quantum system generally requires significantly more computational resources than its classical counterpart.This discrepancy is due to entanglement: a quantum state typically holds information that cannot be separated into sums of local parts, resulting in resources growing exponentially with the number of degrees of freedom.
In equilibrium, the local nature of physical theories partially alleviates this problem, as thermal states generically have only short-range correlations [1][2][3][4].Nevertheless, even for local theories, out-of-equilibrium time-evolution generically leads to a rapid buildup of correlations involving degrees of freedom spread over large scales [5].
Entanglement spread over large scales is, however, not directly observable.Instead, the set of density matrices of all small regions-the local density matrices-suffice to answer all physically relevant questions.In practice, measurement is mostly limited to either very few local degrees of freedom (e.g., single-spin polarization), or thermodynamic quantities such as the specific heat, susceptibilities to external fields, or transport properties such as heat and charge currents.Such quantities can generally be reframed as sums of local operators that act only within a small region and are therefore also captured by the local density matrices.To describe them requires resources increasing only linearly with system size, as opposed to exponentially for the entire wave function.
While entanglement buildup is unavoidable, there are reasons to believe that most of that entanglement does not influence the local density matrices.Generically, in the late-time steady state, the local density matrices coincide with those of a thermal density matrix, which one can predict without knowing any nonlocal degrees of freedom.This convergence can be understood in the following way: more quantum states have correlations on large scales than on small.Consequently, the information (a quantification of correlation) in small subsystems will, for statistical reasons, generically decrease, or equivalently, the entropy will increasethis accounts for the second law of thermodynamics for the entanglement entropy [6].This statistical drift is the same for all scales, meaning that information will continue to flow to larger and larger scales, only bounded by system size.This motivates the guiding intuition of this article: generically, when information has reached a large enough scale, it will not flow back and affect local observables and can therefore be disregarded.
To utilize the idea that certain information can be disregarded, we need a way to specify where information is and to quantify how information flows.Only then can we know which information leaves the local (small) scales for good, and thus can be discarded.Unitary time-evolution implies that information is conserved, but it is fundamentally different from hydrodynamic conserved quantities such as energy.If we have a local Hamiltonian, energy is a "substance" in the sense that we have a well-defined notion of where it is and how it flows.The same cannot be said for information: because of the existence of non-local degrees of freedom, there is no well-defined notion of where information is located.To remedy this problem we introduce in this article a way to organize information into a local structure which we dub the information lattice.In the information lattice the physical space is supplemented with an extra dimension, quantifying how spread out the information is, thereby allowing information to be treated as a locally conserved quantity.The decomposition of information on the information lattice is the primary tool we will consequently use to analyze quantum dynamics, and is discussed in detail in section II.
In the context of quench dynamics, we ask in section III the question: can one tell from the information distribution how and when the local density matrices decouple from long-range correlations?When a system reaches equilibrium, or more exotically, when it approaches a state with localized excitations bouncing around as billiard balls, then we can show from the information distribution that there is an exact (and numerically easy to implement) decoupling of the local observables.In these situations we find an information gap, a range of scales with no information, which implies a decoupling of the local density matrices from long-range correlations.o i P t X E V E y / V v y c y i K y d R q H r L A 6 1 / 3 q F + J 7 X S 2 l 8 M M h k n K S E s X h d N E 4 V J 8 2 L H P h I G h S k p o 6 A M N L d y s U N G B D k 0 p r b E h q Y I O X z W q j V y E k n 6 F 4 2 e O r W n y V o g L T J + u c m z 1 x V y i g P D 5 s H Q Y s X Z N / 3 G y V p t l r B W 5 Q X e / V g v 9 7 4 0 a g d H c 9 C X W V f 2 Q 7 7 x g L W Y k f s O + u w L h P s l t 2 x e / b g P X q / v S f v + b V 1 w Z v N f G F z 8 P 6 8 A A y a r 0 I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l T W u J z n y s d W q y G P + c a q u 3 R m X / Y U = " > A A A B 7 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S R S 0 W P R i 8 c K 9 g P a U D a b T b t 2 s x t 2 J 0 I p / Q 9 e P C j i 1 f / j z X / j t s 1 B W x 8 M P N 6 b Y W Z e m A p u 0 P O + n c L a + s b m V n G 7 t L O 7 t 3 9 Q P j x q G Z V p y p p U C a U 7 I T F M c M m a y F G w T q o Z S U L B 2 u H o d u a 3 n 5 g 2 X M k H H K c s S M h A 8 p h T g l Z q 9 W i k 0 P T L F a / q z e G u E j 8 n F c j R 6 J e / e p G i W c I k U k G M 6 f p e i s G E a O R U s G m p l x m W E j o i A 9 a 1 V J K E m W A y v 3 b q n l k l c m O l b U l 0 5 + r v i Q l J j B k n o e 1 M C A 7 N s j c T / / O 6 G c b X w Y T L N E M m 6 W J R n A k X l T t 7 3 Y 2 4 Z h T F 2 B J C N b e 3 u n R I N K F o A y r Z E P z l l 1 d J 6 6 L q 1 6 q X 9 7 V K / S a P o w g n c A r n 4 M M V 1 O E O G t A E C o / w D K / w 5 i j n x X l 3 P h a t B S e f O Y Y / c D 5 / A L D v j z c = < / l a t e x i t >

• • •
< l a t e x i t s h a 1 _ b a s e 6 4 = " l T W u J z n y s d W q y G P + c a q u 3 R m X / Y U = " > A A A B 7 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S R S 0 W P R i 8 c K 9 g P a U D a b T b t 2 s x t 2 J 0 I p / Q 9 e P C j i 1 f / j z X / j t s 1 B W x 8 M P N 6 b Y W Z e m A p u 0 P O + n c L a + s b m V n G 7 t L O 7 t 3 9 Q P j x q G Z V p y p p U C a U 7 I T F M c M m a y F G w T q o Z S U L B 2 u H o d u a 3 n 5 g 2 X M k H H K c s S M h A 8 p h T g l Z q 9 W i k 0 P T L F a / q z e G u E j 8 n F c j R 6 J e / e p G i W c I k U k G M 6 f p e i s G E a O R U s G m p l x m W E j o i A 9 a 1 V J K E m W A y v 3 b q n l k l c m O l b U l 0 5 + r v i Q l J j B k n o e 1 M C A 7 N s j c T / / O 6 G c b X w Y T L N E M m 6 W J R n A k X l T t 7 3 Y 2 4 Z h T F 2 B J C N b e 3 u n R I N K F o A y r Z E P z l l 1 d J 6 6 L q 1 6 q X 9 7 V K / S a P o w g n c A r n 4 M M V 1 O E O G t A E C o / w D K / w 5 i j n x X l 3 P h a t B S e f O Y Y / c D 5 / A L D v j z c = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " Z n T F D x r K M p j V L 6 z 3 0 t H w N l a g M V s = " > A A A B 9 X i c b Z B L S 8 N A F I U n 9 V X r q + r S z W A R X J W k t E < l a t e x i t s h a 1 _ b a s e 6 4 = " R w P M b s O Y V P N v Y Z r E R X S H X A I U C t o = " > A A A C Q X i c b V B N T 9 t A E F 0 D 5 S O 0 E O D A g c u K q F J P k V 0 S A j c E P X B B g E Q A K U m j 8 W Y C q 6 y 9 1 u 4 Y E V n + K 1 z h f / R X 9 C f 0 h r j 2 0 r W J q q b w p J G e 3 p v R z L w w U d K S 7 / / 0 Z m b n P s w v L C 5 V l j 9 + W l m t r q 1 f W p 0 a g W 2 h l T b X I V h U M s Y 2 S V J 4 n R i E K F R 4 F Y 6 O C v / q D o 2 V O r 6 g c Y K 9 C G 5 i O Z Q C y E n 9 6 n o 3 A r o V o P j R 9 5 1 + 1 q w 3 8 3 6 1 5 t f 9 E v w t C S a k x i Y 4 6 6 9 5 m 9 2 B F m m E M Q k F 1 n Y C P 6 F e B o a k U J h X u q n F B M Q I b r D j a A w R 2 l 5 W H p / z z 0 4 Z 8 K E 2 r m L i p f r v R A a R t e M o d J 3 F q f Z / r x D f 8 z o p D f d 6 m Y y T l D A W r 4 u G q e K k e Z E E H 0 i D g t T Y E R B G u l u 5 u A U D g l x e U 1 t C A y O k f F o L t R o 4 6 R u 6 l w 2 e u P W n C R o g b b L u h c k z V 5 U y y v 3 9 5 l 7 Q 4 g X Z 9 f 1 G S Z q t V v A 3 y s u v 9 W C 3 3 j h v 1 A 4 O J 6 E u s i 2 2 z b 6 w g L X Y A T t m Z 6 z N B L t n D + y R P X k / v F / e s / f y 2 j r j T W Y 2 2 B S 8 3 3 8 A F W i v u g = = < / l a t e x i t > C 3 5.5 Figure 1: A one dimensional lattice with the lattice sites indexed by integers and a few line segments depicted.We let C l n denote the line segment with diameter l centered at n.If a line segment contains an even number of sites, its center lies in-between two sites and n is a half-integer, exemplified by C 3  5.5 above.
Unfortunately, an information gap does not appear (in a finite time) in a generic setting.Nevertheless, one can simply try to time-evolve the local density matrices using some truncation [7][8][9][10][11][12][13]. The general idea is as follows: for each time-step δt, the evolution of the density state ρ of the entire system can be decomposed into two pieces.First the exact δt time-evolution is performed and then ρ is truncated by some function T , ρ → T (ρ), designed such that a part of the local observables is preserved (exactly what is preserved varies [7][8][9][10][11][12][13]).As the truncation variable increases more and more local degrees of freedom are preserved, and the exact time-evolution is recovered when the truncation variable is taken to infinity.If the time-evolution of the local observables converges at a finite value as one increases the truncation, it is plausible that one has captured the true time-evolution.This article's guiding principle-when information has reached a large enough scale, it will not come back-motivates why such a truncation scheme could work: if the error is introduced on a large enough scale, these erroneous correlations will propagate to larger scales and not affect the local observables.
A detailed analysis of specific quench dynamics reveals what can go wrong in such an approach.In the most straightforward truncation scheme with the mentioned properties, T transforms the state ρ into a state with correlations decaying exponentially with scale, and the decay length λ increases with the truncation variable.Such an approximation, unfortunately, generically leads to a systematic underestimation of the information flow at scales ∼ λ, leading to a buildup of erroneous correlations at scales ∼ λ.Even if information generically flows from smaller to larger scales, if the erroneous correlations become significant, only a tiny fraction of it returning to small scales would alter the time-evolution of the local density matrices.To remedy the underestimation of the information current we require an additional property of T : it should both preserve the local observables and accurately approximate the information current out of the smallest scales.The second main result of our work is to construct an algorithm based on this idea.This algorithm shows good convergence properties and thus provides an example of how analyzing dynamics using the information lattice can lead to valuable insights on simulating quantum dynamics efficiently.
To summarize, in this paper, we construct the information lattice, a way to quantify where information is and how it flows.We present it with the required information-theoretic background in section II.From the information distribution, one can directly derive a decoupling of the local observables under certain circumstances.When one cannot, by analyzing quantum quench dynamics using the information lattice, it becomes evident that the most direct algorithms trying to utilize a decoupling of the local observables will, at some scale, A j e 6 s v r p F U p e 9 X y V a N a q t 1 m c e T h D M 7 h E j y 4 h h r c Q x 2 a w A D h G V 7 h z X l 0 X p x 3 5 2 P Z m n O y m V P 4 A + f z B 3 2 F j L o = < / l a t e x i t > 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " T 9 .The value at a point is the information in the corresponding line segment that cannot be found on any smaller scale.Every triangle for which (n, l) is the top consists of points that correspond to line segment subsets of C l n .Therefore, summing all values in a triangle with base at l = 0 adds up to the total information in the line segment corresponding to the top of the triangle.As an example, summing the values in the blue triangle gives the total information in C 2 7 = [6,8].If i l n is zero in some region, such as the red region in the top left, the density matrices in that region can be reconstructed from smaller density matrices corresponding to the green region in the bottom left.
underestimate the information current and can therefore readily be improved.We present this analysis of quantum quench dynamics using the information lattice in section III.Finally, in section IV, we construct an algorithm that implements the correct information flow, and in section V, we analyze its convergence properties.

The information lattice
To discuss quantum dynamics in terms of where information is located and how it flows, we need to quantify these notions, and to this end we introduce the information lattice.To define it, we first need to review the concept of total information in a quantum state.Intuitively, the total information in a quantum state should quantify how much one can predict knowing the whole state via the density matrix ρ.The von Neumann information, the deficit of the von Neumann entropy S(ρ) [14] from its maximum, gives a precise meaning to this intuition.To understand it, consider a state ρ which is a product state of maximally mixed states on all sites except one, where it gives a statistical prediction on a single yes/no measurement.If ρ predicted with certainty the outcome of this measurement, we could with ρ answer exactly one yes/no question.Thus, ρ would provide a single bit of information.With the conventions implied by the definition of entropy (1) a bit of information is given the value 1 1 .If ρ instead only gives a probability for the different outcomes, then ρ does not provide a definite prediction to any observation.Repeating the measurement a significant number of times, one gets a well-defined average number of bits per measurement, k, needed to reproduce the string of outcomes [15].The state can, thus, on average provide at most (measuring in a suitable base) 1 − k bits per measurement.There is thus in this average sense 1 − k bits of information in the system.So the von Neumann information is in this case (1 − k).
In general the von Neuman information is the total information in a state in the average sense from the previous example.Depending on the measurement one can, knowing the full density matrix, predict different amount of information about the measurement outcomes.The von Neumann information is the maximum average number of bits one could predict.Similarly, the von Neumann information, of a reduced density matrix, on a region A provides the information in A, quantifying how many observables in A can be predicted from knowing ρ.
We define the 1D information lattice as the decomposition of the total information, I(ρ) = C i C (ρ), into the irreducible information i C (ρ) on all possible continuous line segments C. Specifically, i C (ρ) is the information in ρ C not contained in any ρ C on a line segment C that is a proper subset of C (from now on we often refer to the information in a line segment C as a shorthand for the information in the reduced density matrix ρ C of the line segment.)i C (ρ) quantifies what the reduced density matrix ρ C can predict which cannot also be predicted by the set of reduced density matrices of the proper subset line segments: {ρ C } C⊂C .The information lattice can be generalized to arbitrary dimensions by letting C run over connected clusters instead of line segments.However, the expressions for i C (ρ) in higher dimensions do not take forms as simple as they do in 1D; we leave such higher-dimensional generalisations to future work.
The set of line segments is naturally organized into a 2D lattice (motivating the name information lattice), and on this lattice the decomposition makes information reminiscent of a hydrodynamic conserved quantity with well-defined local densities and currents.We label the line segments by their location n and diameter l (which we also refer to as scale), n is an integer if l is even, half-integer if l is odd, see Fig. 1.The lattice sites are labeled by these indices and naturally take the form of a 2D lattice.Every triangle with (n, l) at the top and base at scale l = 0 consists of points that correspond to line segment subsets of C l n , see Fig. 2. Therefore, summing all values in a triangle with base at l = 0 adds up to the total information in the density matrix corresponding to the top of the triangle To translate these definitions of i l n into explicit expression we begin with l = 0. Since C 0 n has no proper subset line segment, the information in C 0 n not also present in any subset, is simply the total information on site n, For the definition for l = 1 we require the concept of the mutual information I ρ (A; B) between two disjoint regions A and B. This is defined as the information in AB = A∪B that is neither in A nor in B, With this, i l=1 n , the information in [n − 1/2, n + 1/2] not also present on site n − 1/2 or n + 1/2, is just the mutual information between the two sites, To define i l n for l > 1, we generalize I ρ (A; B)2 to overlapping sets.To get the expression for I ρ (A; B) we take the full information in AB, subtract the information in A and B separately, and add back the information in A ∩ B, since otherwise it is subtracted twice, One might worry that the information is not additive in the way assumed by these subtractions: what if there are situations when one could predict some observables not in A ∩ B but in B, from the density matrix on ρ A ?Then, apart from the information in A ∩ B (already corrected for), there could be information counted both in I(ρ A ) and I(ρ B ), and thus subtracted twice in (8).However, that would mean that there would exist some state ρ where the expression (8) is negative, which is not the case [16,17].Information thus has the assumed additive property (strong subadditivity) and the expression (8) is correct.With I ρ (A; B) defined also for overlapping sets we have expressions for all the information lattice values,

Reconstruction of density matrices from subsets
One interpretation of a reduced density matrix ρ AB is as an encoding of the knowledge of the value of all observables on AB.If most information about the values of observables on AB is known already from the subsets A and B, then the density matrix ρ AB can be approximated from the density matrices of the subsets.For small I ρ (A; B) there are several ways to approximately construct ρ AB from ρ A and ρ B [18].Of these, the twisted Petz recovery map [18] Φ TPRM (ρ A , ρ B ) = exp (ln has a known bound [19] on the error, stating that for given I ρ (A; B) and reduced density matrices ρ A and ρ B , all density matrices ρ AB must be within a radius 2 I ρ (A; B) trace-norm ball centered at Φ TPRM (ρ A , ρ B ).Turning to the information lattice, this means that if the information in a region (e.g., the red region in Fig. 2) is small, then one can reconstruct the corresponding density matrices from the density matrices corresponding to the information lattice values below (such as the green region in Fig. 2).

Summation of information lattice values
From our definition of i l n it follows that the information lattice values in a triangle with base at l = 0 sum up to the total information corresponding to the line segment at the tip of the triangle, To be consistent, this should also follow from the analytical definition in Eq. ( 9).We show this in general below, but to gain intuition we first consider a few specific examples to see how the sum over the entire lattice, comes about.First, consider ρ to be a pure local product state.The information in the total system is then L log 2 d, where L is the number of sites and d is the local Hilbert space dimension.Since all single site density matrices are pure, the information on each site is i 0 n = log 2 (d), and since there are L sites these terms add up to L log 2 (d).All other terms are zero, since there is no shared information between sites.As a second example consider the dimerized state of spin-1/2's where every other pair of adjacent spins is in a singlet state.Then all single site density matrices are maximally mixed so all terms i 0 n vanish.The pair of sites sharing a bond have a mutual information 2, and there are L/2 such pairs adding up to L. The pair of adjacent sites not sharing a bond are maximally mixed and their corresponding mutual information is zero.There is no correlations between nonadjacent sites, so all values with higher l vanish, and the left-and right hand side of Eq. ( 13) again coincide.
The general case (12) is proved by induction.That the sum (12) holds for l = 0 in I(C l n ) follows directly from the expression (5) for i 0 n .Assume that (12) holds for l < l.Using the property we have where the equalities above the expressions follow from the induction assumption.From the definition of i l n in Eq. ( 9), and the definition of I(A; B) in Eq. ( 8), we get the correct sum (12) also for l, completing the proof.

Information currents
For a local Hamiltonian, the conservation of the total information is not just a global conservation law; in analogy to how currents are defined given a locally conserved operator, the local structure of the information lattice gives rise to well-defined local information currents, see Fig. 3. Consider a operator Q = n qn , where qn acts on site n, that commutes with a 1D nearest neighbor Hamiltonian where {n, n } denotes an unordered pair of sites and h {n,n } acts only on the sites n, n .The time-derivative of the density matrix ρ is decomposed into terms each stemming from a term in the Hamiltonian, Therefore, the conserved charge at each site qn = d dt Tr(q n ρ) is in turn decomposed into local terms each stemming from a single δ {n,n+1} , qn = α n {n−1,n} + α n {n,n+1} .
Each δ {n,n+1} contributes to the time-derivative of the conserved charge on two sites, n and n + 1, and the assumption [H, Q] = 0 implies that the contribution is equal up to a sign: α n {n,n } = −α n {n,n } .Therefore the decomposition of ρ into n δ {n,n+1} gives rise to a welldefined definition of the flow of charge from site n + 1 to site n, j n+1→n , given by Turning to the information lattice, we let α {n,n+1} denote the term in il n stemming from δ {n,n+1} in the decomposition of ρ (16), Analogously to the usual conserved charge, each term is only present in the decomposition of the time-derivative of a single other information lattice value, but then with reversed sign, e.g., So, we have a well-defined notion of the local currents, At first sight it might seem odd that the left most term h {n−l/2,n−l/2+1} is responsible for the current from the right line-segment subset, and not the other way around.This is however not as unintuitive as it might seem: the term which can get correlations between the l right most sites in C l n to spread and become a correlation involving all l + 1 sites is precisely h {n−l/2,n−l/2+1} .
The given expressions for the currents are in terms of derivatives of i l n , which we now want to write in closed-form expressions.The gradient ∇f of smooth scalar functions f on the space of Hermitian matrices is the matrix satisfying for any Hermitian matrix ∆ρ.From this definition the gradient ∇S[ρ] of the von Neumann entropy is Since i l n is a sum of von Neumann entropies, see Eqs.( 9) and ( 8), we can use this result to get an expression for the gradient of i l n , ).
The coefficient α {n−l/2,n−l/2+1} is of the form of the right side of the definition of the gradient (23) where we introduced the short-hand notation h n ≡ h {n,n+1} .Inserting the expression (25) for the gradient ∇i l n we thus have a closed form expression for the current j (n+1/2,l−1)→(n,l) involving only the reduced density matrices.Doing the analogous rewriting for the three other currents we get closed form expressions for all currents, Finally, for later reference, we also introduce the notation J l→l+1 for the total current from scale l to l + 1, and j l→l+1 (without any position index) for the total current per site.The total current is a 1D current which means it also can be defined directly from the continuity equation, where < l a t e x i t s h a 1 _ b a s e 6 4 = " I i e j F v j 3 n g x X s e j E 8 b n z h r 6 I e P 9 A y i J q 9 0 = < / l a t e x i t >  a) The information lattice values i l as a function of l at three different times in units of the maximum information lattice value i l max = 5/3 − log 2 (3) (n is suppressed due to translation invariance).At t = 3J −1 (blue dots), nearly all the information remains local at scales l < 4. At t = 10J −1 (orange squares) two peaks have formed with almost zero information in between (note that up to l = 2 the green curve lies directly on top of the orange curve obscuring its view).As the system continues to evolve at t = 14J −1 (green diamonds) the information for l ≤ 2 has essentially stabilized to its infinite-time value while the peak at long range travels to larger and larger scales.b) The information lattice values i l for a time continuum.Notice the gap between the information localized at the smallest scales and the peak traveling to larger and larger scales which is beginning to form slightly after t = 6J −1 .c) The total information-current per site j l→l+1 in units of the maximum current j max l→l+1 ≈ 0.021.d) The total information-current for a time continuum.(In all plots the values at non-integer l, obtained by a third order spline interpolation, are added as a guide to the eye.)

Thermalization dynamics
We are now in position to discuss the general properties of thermalization dynamics from the perspective of the information lattice.To this end, we study the evolution of i l n on the information lattice in two different situations: first from a homogenous initial state and then from an initial state which is homogenous except at one point where there is a perturbation.In both cases we employ the nonintegrable transverse-and longitudinal-field quantum Ising Hamiltonian, where the operators s x n and s z n are spin-half (with eigenvalues ±1/2) operators on site n.The specific values of the Ising parameters are not very important; for easy comparison we take them as in Ref. [10], h L = 0.25J and h T = −0.525J.
For the first example we consider a quench from the initial state, time-evolved with the Hamiltonian (34).The information in the initial state is purely local and, as shown in Fig. 4, remains so at short times.As can be seen in Fig. 4a, later at t = 10J −1 and t = 14J −1 , the information has split into two main parts: one part travels to larger and larger scales at the Lieb-Robinson speed [20] (reminiscent of the entanglement tsunami in holographic systems [21]), and the other remains stationary and purely local at small scales.Note also how the curves, in Fig. 4a, for l ≤ 2, at Jt = 10 and Jt = 14 are indistinguishable.This local part corresponds to the local density matrices of the thermalized infinite-time state.In Fig. 4b, slightly after t = 6J −1 , the splitting of the information is visible: a gap opens up forming two separate information bumps.If the information at scale l is zero, it means that local density matrices at scale l can be reconstructed from the density matrices at scale l − 1.In turn, this means that the (l − 1)-local density matrices can be time-evolved without any knowledge of longer-range correlations; the local degrees of freedom have decoupled from the rest.It is, however, not required that the information at a scale completely vanishes for decoupling to occur.In fact, the information current, depicted in Fig. 4, also vanishes at the smallest scales when the information wave-packet is well separated.This vanishing of information current is sufficient for decoupling.For statistical reasons, information generically flows from small scales to large.When the information current from l to l + 1 vanishes one therefore generically expects that, up to local constraints, the information in the l smallest scales is minimal.In this case we can reconstruct the (l + 1)-local density matrices from the l-local density matrices via the state with minimal information given the l-local density matrices: the l-local Gibbs state, see App.E. The reconstructed (l + 1)-local density matrices then give the time-derivative of the l-local density matrices, making the time-evolution of the l-local density matrices closed.
It is important to note that care must be taken in choosing l, when approximating a state with an l-local Gibbs state.In the example illustrated in Fig. 4, we get at t = 10J −1 an accurate approximation of the derivative of the 3-local density matrices using a 3-local Gibbs state defined by the 3-local density matrices.However, if we instead use, e.g., a 7-local Gibbs < l a t e x i t s h a 1 _ b a s e 6 4 = " q u h D j j g s 4 C 4 V a D 7 p T P q X / S 5 m Q k    36) that is the product state of maximally mixed states except at a single site where the spin points in the positive x-direction.a) The total information-current in units of the maximum total information current J max l→l+1 ≈ 0.45 at several different times (before the dynamics is dominated by diffusion).b) The total information-current in units of the maximum total information current J max l→l+1 ≈ 0.45 for a continuum of times.c) The information-currents at late times in units of J 1→2 , which equals ≈ 15 × 10 −5 for t = 50, ≈ 4.2 × 10 −5 for t = 100 and ≈ 2.1 × 10 −5 for t = 150.As a comparison, the current in a 3-local Gibbs state, with the same local density marices, at t = 50J −1 , is shown.The Gibbs state underestimates the current by several orders of magnitude; For example, at t = 50J −1 the J 3,4 current is underestimated by a factor of 1.3×10 4 , it continues to decay, and J 7,8 is underestimated by a factor of 1.4 × 10 12 .In a) and b) the values at non-integer l, given by third order spline interpolation, are added as a guide to the eye.state defined by the 7-local density matrices, we do not get an accurate approximation of the time-derivative of the 7-local density matrices.The reason is that such an l-local Gibbs state would severely underestimate the information currents at scales > 7. The accumulation of information at scale 7 will lead to an erroneous flow back to smaller scales and spoil the dynamics of the local density matrices.The same would be true if we tried to approximate the derivative using a matrix product state (MPS) or a matrix product density operators (MPDO) (or any other technique aimed at approximating equilibrium type states): using the minimal bond-dimension MPS or MPDO which captures the 7-local density matrices will generically severely underestimate the information current on larger scales.
In the example of Fig. 4, the local density matrices are static after the local degrees of freedom have decoupled and the time-evolution to infinite time is captured by just timeevolving until that decoupling time.However, decoupling of local degrees of freedom does not necessarily imply that the local density matrices are static: Consider as an example a state which thermalizes into local excitations that then bounce around like billiard balls.The dynamics continues forever and the full dynamics can not be captured by time-evolving until some finite time.At the same time, the information that left the small scales before reaching local equilibrium will continue to travel to larger and larger scales such that the resources for time-evolving the full state grow exponentially with time.
A perfect splitting of information into two bumps is not generic.An inhomogeneous distribution of a locally conserved quantity has to spread diffusively before the last part of the information in the small scales can leave.Therefore, such an initial distribution leads to a slow trickle, with a magnitude only decaying algebraically with time, of information from small to large scales.However, it is not only, e.g., H itself which is conserved; products, e.g., H 2 , H 3 , etc., are also conserved operators.Generically the corresponding correlation functions, e.g., h n h n approach their equilibrium value polynomially [22,23].However, the operators become less local as you consider larger products; thus, the impact on local density matrices becomes smaller and smaller.In the example here, we both start from a product state, and the eventual equilibrium state also has a minimal correlation length, which implies that the prefactor of the algebraically decaying correction to the local density matrices is minuscule.Here, we see the almost perfect gap between the bump of information going to infinity and the one staying at local scales.(A closer inspection shows a minor correction to the information current at intermediate scales, which decays slowly).
In our next example we consider a time-evolution where decoupling of the local degrees of freedom by l-local Gibbs states does not become a good approximation.We consider the timeevolution of a state which initially has an inhomogeneous distribution of a conserved charge and eventually relaxes to an infinite temperature state.This inhomogeneous distribution diffuses and smoothens over time, leading to a slow trickle of information out of the smallest scales, meaning that the information current will at no time and scale become small compared with the information at the smallest scales.We use the same Hamiltonian as before, on an infinite one-dimensional chain, with initial state the product state of maximally mixed states on all but one site (as in Ref. [10]): where I 2 is half the identity matrix.The conserved charge in this case is energy, and there is an excess energy around the site where a spin initially points up.This energy will spread out, leading to a gradual decrease of the local density marices.This can be seen in Fig. 5 that shows the time-evolution of the information current.As in the first example in Fig. 4, there is an information-current wave packet that travels to larger and larger scales.Now, however, it leaves behind a substantial tail extending to small scales, and the information current never vanishes.Eventually, everything but the diffusive dynamics is damped out.The smallest scales carry information about the energy and there is a constant information flow from the smallest scales that slowly decreases over time (since diffusion slows down as the energy distribution become increasingly smooth).Since there is nothing that constrains this information we expect it to flow with a constant speed toward infinite scales.This means that there is no sharp scale l at which the total information current, J l→l+1 , becomes much smaller than on other scales.Instead, J l→l+1 slowly increases with l, for l small compared to the scale that the main information wave packet, traveling to infinity, has reached.An intuitive picture of the increase of the information current with l is available if we assume that information leaving the smallest scales travels only in one direction, namely to larger and larger scales.Looking at the information current at larger l is then akin to looking back in time, as it carries the information which left the smallest scales in the past.This behavior can be seen in Fig. 5c, where the information current is slowly increasing as a function of l, with a slope that decreases with time.The only exception is J 0→1 which reflects dynamics on a scale smaller than the range of the Hamiltonian, where the above argument is not valid.
In this case there is no scale at which an l-local Gibbs state provides a good approximation.As an example, in Fig. 5c, we also show the information current for a 3-local Gibbs state, < l a t e x i t s h a 1 _ b a s e 6 4 = " d o < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / Figure 6: a) The diffusion coefficient as a function of time t and cut-off scale l c (at this scale the curves are on top of each other), starting from the initial state defined in Eq. ( 41).After a brief initial ballistic evolution (for a duration of order ∼ J −1 ), we observe a significantly longer crossover period before normal diffusion is reached, with a constant diffusion coefficient.b) The relative error of the three truncation variables (the error is defined by comparing to the largest truncation value, l c = 9.) .
which severely underestimates the current at scale l and larger.The same is true for an MPS or MPDOs, even if they are chosen to correctly capture the l-local density matrices they will severely underestimate the information current on scales ∼ log d χ.In the next section we will discuss an idea for how to capture this situation.

Time-evolving local density matrices
In this section, we build on the intuition gained from our study of information flow during thermalising dynamics to develop algorithms to time-evolve the l-local density matrices.We first introduce the general framework for such algorithms, before discussing a concrete algorithm.
As before, we take Ω l and Ω l+1 to be the l and (l + 1)-local density matrices of a given quantum state.For a Hamiltonian with nearest-neighbor couplings, the time-derivative Ωl is a linear map Φ of Ω l+1 , i.e., Ωl = Φ(Ω l+1 ), as follows directly from the properties of the partial trace and the Heisenberg equation of motion.As a concrete example, consider a 1D system and the time-derivative of an element in ρ [n,n+l] ∈ Ω l .If the Hamiltonian H only has nearest-neighbor terms then the time-derivative ρ [n,n+l] can be obtained from elements exclusively in Ω l+1 : where the operator T L (T R ) is the trace operator tracing out the leftmost (rightmost) site of any operator on a line segment, e.g., We introduce a cut-off l c in the locality of the information by approximating Φ(Ω l+1 ) by a compatible function Ψ of Ω l only, such that Ωlc ≈ Ψ(Ω lc ). (40) Compatible means that there exists some local density matrices Ωlc+1 such that with where T l+1→l is the trace operator which is a linear map from the (l+1)-local density matrices to the l-local density matrices; in 1d it takes the form, The compatibility requirement means that at each time step errors are only introduced on scales larger than l c .One consequence is that any l-local conserved quantity, with l ≤ l c , is left invariant, i.e., the expectation value of any operator O of the form such that [O, H] = 0, is conserved by the time-evolution.We want to capture dynamics in which the information not constrained to stay at small scales can be assumed to flow by statistical drift to larger and larger scales, and therefore never comes back to affect the local degrees of freedom.Any Ψ which does not obstruct this flow can then be used to predict the dynamics of the local degrees of freedom: for large enough l c , the global flow of information guarantees that the algorithm accurately captures the dynamics of the l -local density matrices, for small l .The question is then how to find a Ψ which does not obstruct the information flow.
Using Petz recovery maps, if the information in layer l c + 1 is small, we can extend the density matrices from scale l c to scale l c + 1 with a controlled error given by the bound (11).We use this method in the first simulation in Fig. 4, and at early times also in the other simulation, we define where M Petz is defined by first using a Petz map to extend the density matrices on scale l c to density matrices on scale (l c +1) and then projecting this set of density matrices onto the space fulfilling the consistency condition (42) (see App. C for details).We can thus time-evolve the local density matrices with a known bound on how far the density matrices are from the true density matrices which one would have gotten by time-evolving the entire state according to the Schrödinger equation.If information is initially local, i.e., i l n ≈ 0 for l > l and l < l c then it will take time T ∼ (l c − l )/v, where v is the Lieb-Robinson speed, before any information reaches scale l c , and we can thus always initially time-evolve until time ∼ T with a small bound on the error.If there during time T is some scale l < l c where an information gap opens, then we can, using the above choices for Ψ, continue to time-evolve the local density matrices accurately to arbitrarily late times if we use the cut-off l c = l.So in that case, one can time-evolve local density matrices to arbitrary late times without needing resources growing exponentially with time 3 .
The challenge that remains is to time-evolve the l c -local density matrices if no such gap opens.At a first glance it might seem like a good idea to define Ψ by removing the information on scales larger than l c .At every time step, such an algorithm discards all information at scales larger than l c .However, while it does not create any erroneous information, it will in general underestimate the information flow leaving the l c smallest scales when applied to more generic situations, as shown in Fig. 5. Almost all information that should have disappeared to large scales, with the main wave packet, instead builds up at scale l c .Since most of the information typically disappears to infinity, the time-evolution sees an erroneous buildup of information, which can become much larger than the information in the degrees of freedom we are trying to capture.
To avoid this unphysical information buildup we construct an algorithm by assumingfrom statistical arguments-that the precise correlations on intermediate scales are of no importance as long as they are responsible for carrying the information leaving smaller scales to infinity.We therefore approximate the currents {j (lc,n)→(lc+1,n ) } n,n as a function of the l c -local density matrices.In general, one expects that in addition to the general flow to larger and larger scales there is a diffusion of information so that information flows from points in the information lattice with more information, to points with less information.For the sake of simplicity we assume that it suffices to correctly capture the total flow toward larger scales, that is to say to approximate the total current J lc→lc+1 instead of the entire set {j (lc,n)→(lc+1,n ) } n,n ; extensions to local flows are in principle possible.A more precise treatment of the information diffusion is kept for later work.
At short times, no information leaves the l c smallest scales, and the state is an l c -local Gibbs state.As can be seen in Fig. 5, as time progresses, the total current becomes roughly constant as a function of l J l→l+1 ≈ J l−1→l .
These two extremal situations can be connected through the following insight: If I l , the total information on scale l, is large, the flow leaving scales l should also be large.We model this by assuming that the current J l→l+1 is proportional to I l which gives us the approximation While being a somewhat rough approximation, it is also (partially) self-correcting: if we underestimate the current J lc→lc+1 then I lc will grow and therefore the current will also grow.Specifying the current does not suffice to specify Ψ and thus the time derivative Ωlc .The remaining degrees of freedom, though assumed to be globally unimportant, cannot be chosen completely arbitrarily.The self-correcting property of the current condition (47) guarantees a certain average current flow.However, certain choices of the remaining degrees of freedom could still result in an oscillating information with a large amplitude which we would expect leads to a slow convergence as a function of l c .To avoid this situation, we try to make I lc tot = lc l =0 I l smooth.More precisely, we use the second order Taylor expansion of I l tot as a measure.Let χ be a possible choice for the time-derivative of Ω lc : where C l+1 Ω l denote the space of (l + 1)-local density matrices compatible with Ω l , i.e., If we change Ω lc in the direction χ, I lc tot changes as The first order term is directly specified by the current condition (47).So, we choose χ ∈ Φ(C lc+1 Ω lc ) to minimize the bilinear map, b Ω lc (χ, χ), given that the current condition is fulfilled.The bi-linear form b Ω lc is positive definite, so we simply have to minimize it to get the map Ψ.However, doing the Taylor expansion to define b Ω lc and the following minimization naively leads to a slow numerical algorithm.In App.B we show how it can be done efficiently by first doing the Taylor expansion and part of the minimization analytically before a numeric step.

Numerical Simulations
We now discuss the time-evolution of the local density matrices Ω lc with the initial state (36), using the information flow algorithm of last section, with Ψ defined by the current condition (47) and minimizing the expansion of I lc tot (50).At early times when the flow of information from scale l c to scale l c + 1 is approximately zero, the analytical expression for Ψ in the information flow algorithm is a good approximation of the exact time-derivative of the l c -local density matrices.However, at the same time the denominator in the current condition ( 47) is small leading to potential numerical instability, which we fix by first time-evolving using the Petz recovery map (45).
The information-flow algorithm uses l c as a truncation variable.For l c → +∞, it trivially reproduces the exact time-evolution at any finite time.At finite l c , we estimate the error by the speed of convergence with l c of a few observables of interest.As the main estimator we use the relative error in the diffusion coefficient D, which characterizes the spreading of the energy distribution   36) and for truncation values l c from 6 to 9. At late times, it follows the 1/ √ t behavior expected from conventional diffusion.b) The relative error on s x , using the largest truncation value l c = 9 as reference value.We indicate on the graph the largest relative error for each truncation value.Although the maximum error is larger than for the diffusion constant, the two largest truncation values agree everywhere on the two leading digits.Also the error stabilizes to roughly, but somewhat smaller, value than for the diffusion constant.
where L is the diffusion length: Here n 0 denotes the lattice site of the spin initially in the state |↑ x .
At short times, one generally expects a ballistic spread L ∼ vt.However, our initial state is time-reversal invariant, enforcing v = 0.At short times, the diffusion length is therefore quadratic: L ∼ at 2 .(Since the initial state is a product state the acceleration can be calculated analytically: a = h T J/2 √ 3.) Later in the time-evolution, we instead expect no local reversibility, and thus random walk behavior L ∝ √ t.The diffusion coefficient then equals a constant-the diffusion constant.This behavior is seen in Fig. 6a.The dashed line at small times 1J −1 corresponds to cubically growing D, corresponding to the quadratically growing diffusion length.At late times 50J −1 the diffusion coefficient is approximately the constant D ≈ 0.45J indicated by another dashed line.In between these limits there is a long crossover period ∼ 50J −1 with non-universal physics.
Our exact criterium for algorithmic convergence is that the maximum relative difference of the approximation of the diffusion coefficient with a truncation at scales l c −1 and a truncation at scale l c is smaller than 1%.In Fig. 6b we see that this requires a truncation variable l c = 9 (this is also the highest truncation variable our optimized Mathematica code on a powerful desktop machine can handle).In the same figure we also see that, except for early times, the diffusion coefficient is always overestimated: the diffusion coefficient converges, as a function of l c , from above.
Since we are time-evolving only sets of density matrices and not a quantum state, one might want to check if a global state exists for the system with the reduced density matrices we get from our algorithm.The general problem of verifying that a set of density matrices are compatible with a global state is QMA-complete4 [28].However, in the numerical examples we consider in this paper, the density matrices have at late times large smallest eigenvalues and one can verify that there exists l-local Gibbs states with short coherence length that have the l-local density matrices as reduced density matrices.(This does not mean that the global state is necessarily a Gibbs state, just that there exists a Gibbs state that is compatible).This can be verified with our algorithm in App.E. Nevertheless, we would like to stress that compatibility is not as crucial as one initially might think.It is not necessary to distinguish errors resulting in the l c -local density matrices being incompatible with a global state and other errors.What matters is to estimate the total error made on the local density matrices.One could imagine working with density matrices incompatible with a global state but still only away from the correct local density matrices.In such a situation, these density matrices would only give an error to any local observable.Indeed, what matters for the local observables is not whether a global compatible state exists but the error in the local density matrices.As we discussed, in certain situations we do have a controlled bound on the error on the local density matrices.When we do not, we control the error with the convergence as a function of our truncation variable l c .
Still, an important question for controlling the validity of our approach is whether the diffusion constant is an observable that is easier to capture accurately than others, since it is a purely universal property.In this particular quench most observables decay to zero exponentially fast and their relative error quickly becomes meaningless.However, the polarization s x at n 0 (the site of the initial perturbation) only decays algebraically.Having large s x correlates with having a large energy.Even when most local information is gone, s x is then simply tied to the energy diffusion, as shown in Fig. 7a.As seen in Fig. 7b the convergence is at first slower than for the diffusion coefficient, but still, at all times, agrees on the two leading digits for the two largest truncation values.However, as seen in the inset of Fig. 7b the late time convergence is roughly the same, or even slightly better, than for the diffusion coefficient.
Finally, we show in Fig. 8 that the information current also converges quickly with l c .In Fig. 8a it can be seen that the total information current J 2→3 initially converges faster than s x and slower than the diffusion coefficient.At late times it shows roughly the same level of convergence.However, in Fig. 8b it can be seen that for the truncation value l c = 6, J 5→6 has quite a substantial error of almost 20%.This is a generic behavior: for all truncation values, the l c 'th truncation value gives a bad approximation for the current J lc−1→lc .The maximal relative error is 20%, 15% and 7% for l c = 6, 7 and 8 respectively.This is simply a reflection of our approximation on the current condition in Eq. ( 47)-an error in the first unavoidably results in an error in the second.
It is worth noting that even the simple and imperfect current condition used here allowed for a high level of convergence in a long-time-evolution, in a nonintegrable model, at a remarkably low numerical cost.The difference between consecutive estimates of the diffusion constant with different l c decreases exponentially and reaches a level of less than 1%.This leads us to the conclusion that we could get a controlled estimate for the diffusion constant.Nevertheless, we expect that the current condition (47) is far from optimal and by improving it the convergence of the algorithm will be significantly faster.
quantum computer can verify that the solution is correct in polynomial time.So NP-complete is a subset of QMA-complete.
w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / w C E + W s B 6 s Z + t l 2 V q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / Figure 8: a) The information-current J 2→3 with the relative error as inset (using the largest truncation value l c = 9 as reference).b) The equivalent plot J 5→6 .The information-current J l→l+1 shows good convergence as long as l c > l + 1, and is on par with the convergence of the other observables we considered.

Conclusion and Outlook
We have introduced the information lattice as a convenient and insightful way of capturing, in time and space, the flow of information during quantum time-evolution.This extends the physical lattice by an additional half-infinite dimension representing the scale on which the information in a quantum state is found.The information on a given scale with the corresponding information lattice coordinate l is information that can not be found in any reduced density matrix of a size smaller than l.This allows for more fine-grained separation of entanglement, compared with, for example, matrix product states, which primarily focus on the largest entanglement eigenstates of a given bipartition.Since not all details of the entanglement are relevant for local observables, as much of the entanglement mainly serves to provide an effective bath to local degrees of freedom, such separation of entanglement scales offers new insights into quantum dynamics.First, with the mixed transverse field Ising model as an example, we discussed dynamics where there is a finite time after which the flow of information vanishes at some scale.One can, in principle, capture such dynamics over an infinitely long time with finite resources with the methods we introduced.One could also use other methods, e.g., based on matrix product states with limited bond dimensions; however, without the information lattice, it is hard to know how to implement them.
More generically, there is no finite time where the information flow vanishes and then all known algorithms with a controlled error break down.This situation is characterized by a slow flow of information to larger and larger scales.As most of the information that flows to larger scales never comes back to smaller scales and does not affect local observables, we can still obtain a long time-evolution of local observables.This requires keeping track of and resolving, not only the information (or entanglement) on small scales, but also, crucially, the flow of information at small scales.
With these insights we proposed a simple but highly efficient algorithm for time-evolution of quantum systems.Instead of time-evolving the full quantum state, we only time-evolve the local density matrices Ω lc , which is the set of reduced density matrices of some size l c .The exact time-evolution requires extending the scale l c at each time-step, but by simple assumptions about the structure of the information flow at the maximum scale we can close the time-evolution of the local density matrices Ω lc -essentially by reconstructing Ω lc+1 from Ω lc together with a physical assumption about the current flow out of scale l c .The latter is essential: not keeping track of the information flow and only reconstructing Ω lc+1 from Ω lc using a maximum entropy consideration, invariably results in unphysical backflow of information from large scales to small scales that can affect local observables.We have shown that this algorithm successfully captures diffusion at long times as well as the decay of local observables in the mixed transverse Ising model after a local quench from a thermal state with extra energy at one site.
While we have focussed our discussion on 1D models with nearest neighbor Hamiltonians, the essential concepts are readily generalized to both higher dimensions and longer range Hamiltonians.As our algorithm is based on local density matrices, it can likely also be generalized to include dissipation through local coupling to a bath.The algorithm does not rely on the presence of any symmetries, including translational invariance and can therefore by applied also to disordered systems.The complexity further only scales linearly with system size, assuming that a finite thermalization length scale emerges in the dynamics.Potential applications therefore include thermalization and many-body localization (or its absence) in higher dimensions, where no appropriate and efficient algorithms exists at the moment.We also expect that the information lattice will be useful in constructing analytical theories of information flow in thermalizing quantum systems.In particular, a more accurate and efficient modeling of the information flow at a given length scale will likely significantly improve the efficiency and accuracy of our algorithm.and ψ l n is an operator on C l n .As above we use a spatial subscript to denote elements of such sets.A greek letter with a superscript and a subscript, like ψ l n , should always be interpreted as the element of a set of Hermitian operators ψ l which acts in the neighborhood indicated by the sub-and superscripts.Using the same greek letter with different scale superscripts, i.e., ψ l and ψ l−1 , it should be understood that the sets are related via taking traces, in this case, As before Ω l is reserved to denote the l-local density matrices.The sets of Hermitian operators form a real Hilbert space inherited from the real Hilbert space of Hermitian matrices, i.e., the vector addition and scalar multiplication are defined as and the inner-product is defined by extending the trace inner-product to sets of Hermitian matrices as Maps between or in Hilbert spaces of Hermitian matrices or Hilbert spaces of sets of Hermitian matrices, are denoted by bold-face capital roman or greek letters as, e.g., T l→l−1 .We will refer to the adjoint of an operator with a superscript T or with the word transpose since the Hilbert space is real.The transpose of an operator O from a set of Hermitian matrices of scale l to a set of Hermitian matrices of scale l is the unique operator with the property for all ζ l and ζl .If the operator O is represented as a matrix the transpose amounts to the usual matrix transpose.We denote the Moore-Penrose pseudoinverse (or just pseudoinverse) of an operator by a superscript +.The symbol P denotes orthogonal projectors, and if O is an operator then P O denotes the orthogonal projector onto ker(O), the kernel of O.It can be written in terms of the pseudoinverse as If S is a linear space, then P S denotes the orthogonal projection onto the space S.
We will use ⊥ S to denote the orthogonal complement to S. The symbol Q O denotes the orthogonal protector onto ⊥ ker O.In terms of the pseudoinverse Finally, I O denotes the orthogonal projector onto im(O), the image of O.In terms of the pseudoinverse it can be written as We will make use of the pseudo-inverses TL + L ≡ (TL L ) + and TL + R ≡ (TL R ) + .For the specific case of the mixed-field Ising Hamiltonian it is possible to derive the following analytical expressions5 where TL T L/R ≡ (TL L/R ) T .Using these definitions the linear map Φ in Eq. (37) that gives the derivative Ωl from Ω l+1 takes a simple form: if Ψ l is defined as Ψ l = Φψ l+1 then the elements of Ψ l are Recall the convention (54), i.e., by definition ψ l = T l+1→l ψ l+1 .We now write Φ as The result when the first term ΦQ T l+1→l acts on Ω l+1 can be calculated using only Ω l , so the intepretation of ΦQ T l+1→l is that it gives the part of the derivative of the l-local density matrices which can be deduced from the l-local density matrices themselves.The other part, ΦP T l+1→l , then gives the unknown part of the derivative of Ω l .Using the above expressions (74) and (64) we get a simple expression for it: if we define Γ l as Γ l = ΦP T l+1→l γ l+1 , its elements are Here we used the fact that TL L = TL L P T L and similar for the operator with subscript R.
We now want to write the projector onto the space of what the unknown part of the derivative could be.That is to say we want to write the projector onto the image im(ΦP T l+1→l ) of ΦP T l+1→l .If Γ l ∈ im(ΦP T l+1→l ) then there are constraints imposed on each of the elements {Ψ l n } in Ψ l separately.By an extended derivation it can be shown that the orthogonal projector onto the space fullfilling these constraints is The superscript D marks that this projector projects onto the "diagonal" constraints imposed by Γ l ∈ im(ΦP T l+1→l ), i.e., the constraints imposed on each of the elements in Ψ l separately.However, there are also non-diagonal constraints, i.e., if Γ l ∈ im(ΦP T l→l−1 ) then the elements Γ l n and Γ l n are generally not independent.So, we write the operator I ΦP T l+1→l as where the operator I D ΦP T l+1→l is extended from an operator acting on Hermitian matrices to act on sets of Hermitian matrices, as By an extended derivation it can be shown that the operator , which acts according to the below equation, produces the projector I ΦP T l+1→l together with σ l , then its elements are given by

B.2 The information-flow derivative
We are now ready to write a closed form expression for the derivative Ωl in the information flow algorithm.Specifying the derivative Ωl is equivalent to choosing an element χ l ∈ Φ(C l+1 Ω l ), where C l+1 Ω l is the space of (l+1)-local density matrices compatible with Ω l , see (49).A general element ψ l+1 ∈ C l+1 Ω l can be taken to be of the form where ψl+1 is the minimum norm solution to T l+1→l ψl+1 = Ω l and ψl+1 ∈ ker(T l+1→l ).The elements of the minimum norm solution are We now define χl = Φ( ψl+1 ), and a general χ l ∈ Φ(C l+1 Ω l ) is thus of the form Operators with an R subscript commute with operators with an L subscript so their ordering is not important.When operators commute we will use the convention of keeping pseudoinverses furthest to the left.The idea is now to constrain χl in steps to finally make χ l unique.First we constrain χl such that the current condition (47), is fulfilled.The current J l→l+1 is where the sum n indicates that the sum runs over all n except the ones corresponding to the left and the right most neighborhoods.The equality on the second line is explained in Fig. 9.We now write the time-derivatives in terms of the gradient which has a closed form expression.The function S(Ω l n ) can be interpreted both as a function on the space of Hermitian matrices on C l n and as a function on the space of sets of Hermitian matrices.In the first case the gradient is and in the second case it is We let it be understood from the context which definition we are using.We then get Here "all " has an analogous meaning as n in (86): it means all n except the ones corresponding to the left and the right most neighborhoods (those elements of the set are instead taken to be zero).From this rewriting of the current (and the analogous rewriting for J l−1→l ) it follows that complying with the current-condition (47) amounts to setting the inner-product χl |{log which takes the form So we can now write the expression for a general χl with the current condition fulfilled, where and However, to specify χ l fully we need to constrain χl further.We use the prescription from the main text and choose χl (the degrees of freedom which do not affect the current condition) by minimizing b Ω l (χ, χ) in (50), i.e., where H I l tot is the Hessian of I l tot , as a function of Ω l , and "const."denote terms independent of χ.If there is a unique solution χ, to the equation then this solution will be the unique minimizer of b Ω l .The projector P S ⊥ acts in a way which is easy to implement numerically: when acting on any set of matrices ζ l it acts as We now discuss how to solve such a linear equation numerically.If one can construct a good conditioning matrix a linear system AX = B (100) can be solved using the preconditioned conjugate gradient method, see e.g., Ref. [29].One can then get a solution of the linear equation with numerical resources of the same order of magnitude as it takes to apply the operator A to an element.A conditioning matrix M is a good approximation to the inverse M ≈ A −1 which can be applied using the same numerical resources as applying A itself.We here use the pedestrian definition of "good" to simply mean that the preconditioned conjugate gradient method converges in only a few ( 10) steps.Using the equation and if I(B; C) = I(A; B) we average over the above two choices.If I(A; C|B) = 0 then this approximation does not necessarily preserve ρ AB and ρ BC , so we add a projection step and write the final approximation, ABC , of ρ ABC as where This expression is the orthogonal projection of ˜ ABC onto the space of density matrices which have ρ AB and ρ BC as partial traces.
The approximation ABC of ρ ABC provides an approximation of the (l + 1)-local density matrices, given the l-local density matrices.For example, if we take AB .

D Integration schemes D.1 Runge-Kutta methods
In this work we integrate all differential equations with Runge-Kutta methods, that is, where Ψ is one of the compatible derivative functions (40) and {b i } and {a ij } are Runge-Kutta parameters.We use the parameters6 from Ref. [30] with a step-size error of O(∆t 12 ).We also use a dynamic step-size [31] ensuring a step-size error smaller than 10 −5 .In a numerically more demanding situations one would want to allow for a bigger step-size error to allow for faster runtimes.It is worth noting that this does not affect conservation of constants of the motion.Since Ψ is compatible it follows that the expectation values κ l,i ω l = 0 (119) of any constant of motion O of the form is zero for all κ l,i .It follows that expectation value of all constants of motion are exactly the same for Ω l (t + ∆t) and Ω l (t) (no matter the value of ∆t).

D.2 Dealing with small eigenvalues
If some of the matrices in the set Ω l (t) have small eigenvalues, then one of the intermediate values could have matrices with negative eigenvalues.The functions Ψ we consider are defined only for semi-positive definite matrices, and the Runge-Kutta methods can therefore fail in this case.In the simulations in this paper this is not a problem.There are no small eigenvalues in the case with the translational invariant initial state (35).For the initial state (36) there are initially matrices with vanishing eigenvalues, but these can be dealt with as follows.We first shift the state ρ(t) with the maximally mixed state to form ρ shift (t).Since the full Schrödinger equation is linear, we can time-evolve this shifted state and at a later time t shift back, For the local density matrices this shift amounts to where Ω l = {Ω l n } all n is the unshifted l-local density matrices.If the function Ψ which estimates the derivative gives an equally good estimate (i.e., converges equally fast as a function of l) for the derivative of Ω l shift as it does for Ω l , we can just as well time-evolve Ω l shift and then shift back.This is the case when using the Petz algorithm for the simulation with the initial state (36).However, there is in general no guarantee that the estimates Ψ for the derivatives converge as quickly with l for the shifted case, as for the unshifted, requiring a larger truncation than if the unshifted local density matrices could be time-evolved directly.To solve the general situation of small eigenvalues one must instead use a different integration scheme.The smallest eigenvalues generically increase when there is a flow of information from small to large scales.So it is only either early in the time-evolution or in situations where there is no flow of information to larger scales where such an integration scheme is needed.In both these situations we can use the Petz-recovery map algorithm and then we have access to a function E of the l-local density matrices Ω l which approximates the (l + 1)-local density matrices, Ω l+1 ≈ E(Ω l ). (125) If one knows the (l +1)-local density matrices of a state ρ, one can calculate the l-local density matrices of the state e iA n,n+1 ρe −iA n,n+1 , where A n,n+1 is any operator acting on sites n and n + 1.So, the function E provides a prescription of how to act with any function of the form e iA n,n+1 on Ω l .Using the Suzuki-Trotter decomposition, see e.g., [32], we can write the time-evolution operator n even e i∆t β k h n,n+1 + O(∆t N ), where {α k , β k } are parameters which can be chosen to make N arbitrarily large at the cost of a larger order K.We can then use above prescription for acting with an operator of the form e iA n,n+1 to act with every factor in this this expansion, and thus get an approximation for Ω l (t + ∆t) from Ω l (t).This integration method has no problems with positivity, and can thus be used also when there are small or vanishing eigenvalues.However, when possible it is advantageous to use Runge-Kutta methods.The first reason is that for the same order of the approximation N the Suzuki-Trotter decomposition typically requires more steps K than the the best Runge-Kutta method for the same N .This means that one has to apply E more times, which is the most numerically demanding part of the algorithm.Furthermore, for the Runge-Kutta integration there is no time-step error in constants of motion, but for the Suzuki Trotter integration, constants of motion are on the same footing as everything else.Typically, errors in constants of motion are more severe than errors in other operators, and therefore one typically requires a smaller time-step error when using Suzuki-Trotter integration.

D.3 Infinite systems
We address the question of how to integrate the local density matrices in an infinite system.When we have translation symmetry this is straightforward.If Ω l n = Ω l n+k and we only have to keep track of the k density matrices Ωl = {Ω l n } n=1,...,k .A function Ψ( Ωl ) which approximates the time-derivative of Ωl is straightforwardly inherited from the definition of Ψ for a finite space.
The initial condition (36), is however not translation invariant, requiring some care.As before we use n 0 to denote the site where the spin initially pointed up in the s x direction.At any finite time t there will be some finite length Λ(t) such that with high precision on the left.So up to time t we only need to consider a finite number, 2Λ + l − 1, of local density matrices and define the time-derivative by assuming that the rest are given by tensor products as in (129).
To utilize this we start out with Ω l (0) consisting of the 2Λ 0 + l − 1 density matrices centered around n 0 .Before the first time-step we add k sites on either side using (129).We then time-evolve a finite time step ∆t and afterwards remove from Ω l (∆t) all density matrices which can be approximated by (129) with a given error , i.e., we remove the density matrix ρ If we remove no density matrix we have kept track of too few density matrices for the approximation (129) to be valid, and need to redo the time-step with a larger k.If we removed some density matrices we end up with Ω l (∆t) consisting of 2Λ 1 + l − 1 with Λ 1 ≥ Λ 0 .We then continue the procedure of first adding density matrices then making a time step and removing density matrices.The number of elements in Ω l (t) we keep track of then grows, with accompanying growth of the numerical resources required to do a time-step.For the time-evolution we focussed on in the main text the growth of the number of elements is asymptotically constrained by the energy diffusion and the number of elements (and thus the numerical resources) grows as √ t.

D.4 Utilizing discrete symmetries
If the system under consideration has a unitary symmetry, one can in general use it to reduce the numerical resources required to time-evolve the local density matrices.For the simulation with initial state (35) we use reflection symmetry to speed up the time-evolution.By unitary symmetry we mean that the Hamiltonian commutes with an unitary operator [U, H] = 0.If a state ρ(t) satisfies this symmetry at a given time t, i.e., then it will satisfy it for all times.The above equality manifests itself by a corresponding relation for the local density matrices For example, if U is translation by one site, then (132) implies The opposite is not necessarily true, if Ω l satisfies the constraint (133), it does not necessarily imply that the full state upholds the corresponding symmetry (132).Even if all density matrices of scale l are equal the state could still differ on scale l + 1. Discrete symmetries are therefore not automatically built into the compatibility condition of the time-derivative (41).So, if there is a symmetry, we can use it to reduce the numerical resources required.Translation invariance is straightforward to utilize.In particular, translation invariance by one site means that all density matrices are equal and we do not have to keep track of a set of density matrices, we only need to keep track of one.Apart from translation symmetry the only other symmetry we utilize in this paper is reflection symmetry.In the simulation with the translational invariant initial state (35) we have reflection symmetry around every point.This means that every density matrix for all l and n satisfies Ω l n = RΩ l n R † (135 where R is the operator which changes the direction of the spatial axes, e.g., on product states in C l n it acts as This means that Ω l n = Ω l,+ n + Ω l,− n , where Ω l,+ n (Ω l,− n ) is an operator in the space of states with R-eigenvalue 1 (−1).Knowing this form of the density matrix allows for roughly four times faster diagonalization of Ω l n and subsequently a faster evaluation of Ψ.

E l-local Gibbs states
An l-local Gibbs state, ρ l Gibbs , is the maximum entropy state with given l-local density matrices Ω l Gibbs .An example is a usual Gibbs state, which is a maximum entropy state given a set one can conclude that the difference between k-local density matrices gotten from minimizing I λ tot and the error in Ω k Gibbs (defined by minimizing I tot ) is bounded by max m (i λ+1 m ).However one can also estimate the error by comparing the minimization of I λ tot and I λ−1 tot and typically the error is much smaller than that given by Kim's inequality.
As we discussed, an l-local Gibbs state is of the form ρ l Gibbs = e − n ω l n (149) for some operators ω l n that only act on sites C l n .Unless ρ l Gibbs is a critical ground-state of O = n ω l n , i L m decays exponentially as a function of L. For the minimization done to get the data in Fig. 5, this fast decay meant that we could let λ be large enough for the error to be limited only by machine-size precision.
Then comes the next question, how does one minimize I λ tot .We begin by discussion the case when λ = l + 1.We first need a starting point, Ωl+1 , that is some (l + 1)-local density matrices Ωl+1 with the property that T l+1→l Ωl+1 = Ω l Gibbs .To get a starting point we use the Petz recovery maps as in App.C to get an approximation Ω l+1 Petz .The Hessian of I λ tot can be written in terms of Hessians of sums of entropies (102), meaning that we can use Newton-Raphson's method to find the minimum of I λ tot .If λ = l + 2 then we start by using the algorithm above to find the Ω l+1 which minimize I l+1 tot .We then extend this as before, using the Petz recovery maps, to get a starting point Ωl+2 , i.e., some (l + 2)-local density matrices with the property T l+2→l Ωl+1 = Ω l Gibbs .For λ > l + 1, the first term in the expression (102) for the Hessian H I λ tot does not vanish when restricted to ker(T λ→l ).When both terms are present there is no guarantee that the Hessian is positive definite; I λ tot is in general not convex.However for a maximally mixed set of density matrices it is positive definite and smooth.So we expect that this only is a problem for density matrices with very small eigenvalues.For the minimization done to get the data in Fig. 5, the Hessian has been positive definite close to the starting points Ωl+2 and we have been able to use Newton-Raphson's method to find the minimum closest to the starting point.We then use this minimum to generate a starting-point to find the minimum of I l+3 tot and then use that minimum to find the minimum of I l+4 tot etc.We stop when the λ-local density matrices gotten from minimizing I λ+1 tot is the same (up to the precision used) as the local density matrices gotten from minimizing I λ tot .Since I λ tot is not convex we cannot be sure that we have found the global minimum.However, in a region close to a maximally mixed set of density matrices the Hessian H I λ tot is positive definite.So, one would expect that this would typically not be a problem.Furthermore, we know that I λ tot is bounded from below by min(I l+1 tot ) (the minimal value of I l+1 tot ) and that we can find with certainty.Then using Kim's inequality (148), this gives us a region in which the global minimum must be.For the local Gibbs state in Fig. 5  , for some unknown reason, has some strongly oscillatory behavior, we can be certain that H I λ tot is positive definite within a region which must contain the global minimum of I λ tot , and we can then be certain that we have found the global minimum.

E.2 Finding the logarithm of an l-local Gibbs state
When we have found the k-local density matrices Ω l Gibbs (k > l) in an l-local Gibbs state ρ l Gibbs , we can use the result to also find the terms ω l = {ω l n } all n of the operator O = n ω l n , which is the negative logarithm of the Gibbs state, There are in principle several ways to decompose the O into a set ω l .Any set with the property for all Hermitian matrices on the full space will do.So ω l is only defined up to an arbitrary element in ⊥ im(T →l ).If we assume that the algorithm described in the previous subsection converged at stage λ, then this means that up to the precision used.Since I tot − I λ tot is non-negative its gradient thus must vanish, from which it follows that where as before the sum n indicates that the sum runs over all n except the ones corresponding to the left and the right most neighborhoods.In the last equality we used the rewriting of the formula explained in Fig. 9 and the expression ∇I(Ω l n ) = log 2 (Ω l n ) + 1.For an arbitrary Hermitian matrix on the entire space we then get where n right labels the rightmost scale-λ neighborhood.Since O ∈⊥ ker T →l this is equivalent to Furthermore, it can be shown that when acting on elements in im(T ) where N is the total number of sites.Using this expression in the previous equation we get Comparing with (155) it then follows that is a decomposition of O.In fact, since it is an element of im(T →l ), it follows that it is the unique minimum norm decomposition.
t e x i t s h a 1 _ b a s e 6 4 = " C x J M S a Y + j Z Y p f / l 3 x W Q 0 K P I y H d M = " > A A A C P 3 i c b V B N T 9 t A E F 3 z U W g o J d A D h 1 5 W R E g 9 R T Z K C N w Q 4 d A L I p U I I C V p N N 5 M Y J W 1 1 9 o d I 0 W W / w l X + B / 8 j P 6 C 3 h D X 3 l i b C D U t T x r p 6 b 0 Z z c w L E y U t + f 4 v b 2 F x a f n D y u r H y t q n 9 c 8 b 1 c 2 t C 6 t T I 7 A r t N L m K g S L S s b Y J U k K r x K D E I U K L 8 N J u / A v b 9 F Y q e N z m i Y 4 i O A 6 l m M p g J w 0 r F b 7 E d C N A M X b P / e G W T M f V m t + 3 S / B / y f B j N T Y D J 3 h p r f d H 2 m R R h i T U G B t L / A T G m R g S A q F e a W f W k x A T O A a e 4 7 G E K E d Z O X p O d 9 1 y t e x i t s h a 1 _ b a s e 6 4 = " h 6 u 5 4 z D D P D N b j P P y c M X X g t Z p 8 1 U = " > A A A C P 3 i c b V D B T t t A E F 0 D b W m g J c C h B y 6 r R k i c I r s K G G 6 o c O C C o B I B p C R E 4 8 2 E r L L 2 W r t j p M j y n 3 C F/ + A z + A J u i G t v X Z u o I i 1 P G u n p v R n N z I t S J S 3 5 / q M 3 N 7 / w 4 e O n x c + 1 p e U v X 1 f q q 2 v n V m d G Y F t o p c 1 l B B a V T L B N k h R e p g Y h j h R e R O O D 0 r + 4 Q W O l T s 5 o k m I v h u t E D q U A c l K / X u / G Q C M B i h 9 c + f 0 8 L P r 1 h t / 0 K / D / S T A l D T b F a X / V + 9 Y d a J H F m J B Q Y G 0 n 8 F P q 5 W B I C o V F r Z t Z T E G M 4 R o 7 j i Y Q o + 3 l 1 e k F 3 3 T K g A + 1 c Z U Q r 9 S 3 E z n E 1 k 7 i y H W W h 9 p / v V J 8 z + t k N N z t 5 T J J M 8 J E v C 4 a Z o q T 5 m U O f C A N C l I T R 0 A Y 6 W 7 l Y g Q G B L m 0 Z r Z E B s Z I xa w W a T V w 0 i G 6 l w 0 e u / U n K R o g b f L u m S l y V 7 U q y r 2 9 7 d 0 g 5 C X Z 8 f 1 W R b b D M P g b 5 f m P Z r D T b P 1 q N f Z / T k N d Z B v s O 9 t i A Q v Z P j t i p 6 z N B L t h t + y O 3 X s P 3 p P 3 7 L 2 8 t s 5 5 0 5 l 1 N g P v 9 x 8 M l K 9 C < / l a t e x i t > 2 7 K 3 b j s o J 9 Q J u W y X T S D p 1 M w s x E K a H / w 4 0 L R d z 6 X 9 z 5 b 5 y k Q X w d G P g 4 5 1 7 m c t y Q U a l M 8 8 P I b W x u b e / k d w t 7 + w e H R 8 X j k 5 4 M I o F J F w c s E A M X S c I o J 1 1 F F S O D U B D k u 4 z 0 3 U U 7 y f t 3 R E g a 8 F u 1 D I n j o x m n H s V I a W s 8 8 p G a Y 8 R g e 1 I Z V y b F k l k 2 U 8 G / Y G V Q A p k 6 k + L 7 a B r g y C d c Y Y a k H F p m q J w Y C U U x I 6 v C K J I k R H i B Z m S o k S O f S C d O r 1 7 B C + 1 M o R c I / b i C q f t 9 I 0 a + l E v f 1 Z P J l f J 3 l p j / Z c N I e Q 0 n p j y M F O F 4 / Z E X M a g C m F Q A p 1 Q Q r N h S A 8 K C 6 l s h n i O B s N J F F d I S m s 1 a w 7 J h A n X T r K Z Q s 2 3 r q 4 R e p W z V y 9 W b a q l 1 l d W R B 2 f g H F w C C 9 i g B a 5 B B 3 Q B B g I 8 g C f w b N w b j 8 a L 8 b o e z R n Z z i n 4 I e P t E 4 u 6 k i 8 = < / l a t e x i t > C 2 2

1 <
l a t e x i t s h a 1 _ b a s e 6 4 = " t J j n S n P 1 u w X u 2 5 U 1 a A m R M E v d w Q g = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o i c p e P H Y g q 2 F N p TN d t K u 3 W z C 7 k Y o o b / A i w d F v P q T v P l v 3 L Y 5 a O u D g c d 7 M 8 z M C x L B t X H d b 6 e w t r 6 x u V X c L u 3 s 7 u 0 f l A + P 2 j p O F c M W i 0 W s O g H V K L j E l u F G Y C d R S K N A 4 E M w v p 3 5 D 0 + o N I / l v Z k k 6 E d 0 K H n I G T V W a n r 9 c s W t u n O Q V e L l p A I 5 G v 3 y V 2 8 Q s z R C a Z i g W n c 9 N z F + R p X h T O C 0 1 E s 1 J p S N 6 R C 7 l k o a o f a z + a F T c m a V A Q l j Z U s a M l d / T 2 Q 0 0 n o S B b Y z o m a k l 7 2 Z + J / X T U 1 4 7 W d c J q l B y R a L w l Q Q E 5 P Z 1 2 T A F T I j J p Z Q p r i 9 l b A R V Z Q Z m 0 3 J h u A t v 7 x K 2 h d Vr 1 a 9 b N Y q 9 Z s 8 j i K c w C m c g w d X U I c 7 a E A L G C A 8 w y u 8 O Y / O i / P u f C x a C 0 4 + c w x / 4 H z + A H w B j L k = < / l a t e x i t > 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 v M x q 1 N I t T q s 5 k U + e 9 i X P p 6 b 4 r Y = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o i c p e P H Y g q 2 F N p T N d t K u 3 W z C 7 k Y o o b / A i w d F v P q T v P l v 3 L Y 5 a O u D g c d 7 M 8 z M C x L B t X H d b 6 e w t r 6 x u V X c L u 3 s 7 u 0 f l A + P 2 j p O F c MW i 0 W s O g H V K L j E l u F G Y C d R S K N A 4 E M w v p 3 5 D 0 + o N I / l v Z k k 6 E d 0 K H n I G T V W a r r 9 c s W t u n O Q V e L l p A I 5 G v 3 y V 2 8 Q s z R C a Z i g W n c 9 N z F + R p X h T O C 0 1 E s 1 J p S N 6 R C 7 l k o a o f a z + a F T c m a V A Q l j Z U s a M l d / T 2 Q 0 0 n o S B b Y z o m a k l 7 2 Z + J / X T U 1 4 7 W d c J q l B y R a L w l Q Q E 5 P Z 1 2 T A F T I j J p Z Q p r i 9 l b A R V Z Q Z m 0 3 J h u A t v 7 x K 2 h d Vr 1 a 9 b N Y q 9 Z s 8 j i K c w C m c g w d X U I c 7 a E A L G C A 8 w y u 8 O Y / O i / P u f C x a C 0 4 + c w x / 4 H z + A H p 9 j L g = < / l a t e x i t > 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " D Q T H s / R j U 8 Z 0 n q C a 2 v j u J g A 0 y y U = " > A A A B 6 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y G i J 4 k 4 M V j A u Y B y R J m J 7 3 J m N n Z Z W Z W C C F f 4 M W D I l 7 9 J G / + j Z N k D 5 p Y 0 F B U d d P d F S S C a + O 6 3 0 5 u Y 3 N r e y e / W 9 j b P z g 3 5 2 P Z m n O y m V P 4 A + f z B 4 U Z j L 8 = < / l a t e x i t > 8 < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 P b 5 T U d 6 f m W i a W c f O 2 t M y W m l y w A = " > A A A B 6 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e x K x J w k 4 M V j A u Y B y R J m J 7 3 J m N n Z Z W Z W C C F f 4 M W D I l 7 9 J G / + j Z N k D 5 p Y 0 F B U d d P d F S S C a + O 6 3 0 5 u Y 3 N r e y e / W 9 j b P z g r 1 a 9 b N Y q 9 Z s 8 j i K c w C m c g w d X U I c 7 a E A L G C A 8 w y u 8 O Y / O i / P u f C x a C 0 4 + c w x / 4 H z + A H w B j L k = < / l a t e x i t > 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 v M x q 1 N I t T q s 5 k U + e 9 i X P p 6 b 4 r Y r 1 a 9 b N Y q 9 Z s 8 j i K c w C m c g w d X U I c 7 a E A L G C A 8 w y u 8 O Y / O i / P u f C x a C 0 4 + c w x / 4 H z + A H p 9 j L g = < / l a t e x i t > 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " D Q T H s / R j U 8 Z 0 n q C a 2 v j u J g A 0 y y U = " > A A A B 6 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y G i J 4 k 4 M V j A u Y B y R J m J 7 3 J m N n Z Z W Z W C C F f 4 M W D I l 7 9 J G / + j Z N k D 5 p Y 0 F B U d d P d F S S C a + O 6 3 0 5 u Y 3 N r e y e / W 9 j b P z g

Figure 2 :
Figure 2: Each point (n, l) in the information lattice correspond to a line segment C l n = [n − l/2, n + l/2].The value at a point is the information in the corresponding line segment that cannot be found on any smaller scale.Every triangle for which (n, l) is the top consists of points that correspond to line segment subsets of C l n .Therefore, summing all values in a triangle with base at l = 0 adds up to the total information in the line segment corresponding to the top of the triangle.As an example, summing the values in the blue triangle gives the total information in C 2 7 =[6,8].If i l n is zero in some region, such as the red region in the top left, the density matrices in that region can be reconstructed from smaller density matrices corresponding to the green region in the bottom left.

Figure 3 :
Figure 3: With a nearest neighbor Hamiltonian the information current j (n ,l )→(n,l) only connects nearest-neighbors in the information lattice.

4 <
r q s e l d V 7 7 5 W q d f y O I p w A q d w D h 5 c Q x 3 u o A F N Y B D C M 7 z C m z N 2 X p x 3 5 2 P R W n D y m W P 4 A + f z B y S X j R A = < / l a t e x i t > l a t e x i t s h a 1 _ b a s e 6 4 = " J p u h 3 R o M Z + l b b c + r N c A N r v 2 k r 4 U = " > A A A B 6 H i c b Z D L S g M x F I Y z 9 V b r r e r S T b A I r k p G 2 k 6 7 s u D G Z Q v 2 A u 1 Q M m m m j c 1 k h i Q j l N I n c O N C k b r 0 L X w N d 7 6 N m W k R b z 8 E P v 5 z D u f k 9 y L O l E b o w 8 q s r W 9 s b m W 3 c z u 7 e / s H + c O j t g p j S W i L h D y U X Q 8 r y p m g L c 0 0 p 9 1 I U h x 4 n H a 8 y

5 <
1 H 3 v w 3 b t s c t P X B w O O 9 G W b m B Y k U B l 3 3 2 y m s r W 9 s b h W 3 S z u 7 e / s H 5 c O j lo l T z X i T x T L W n Y A a L o X i T R Q o e S f R n E a B 5 O 1 g f D P z 2 0 9 c G x G r R 5 w k 3 I / o U I l Q M I p W e r j D f r n i V t 0 5 y C r x c l K B H I 1 + + a s 3 i F k a c Y V M U m O 6 n p u g n 1 G N g k k + L f V S w x P K x n T I u 5 Y q G n H j Z / N L p + T M K g M S x t q W Q j J X f 0 9 k N D J m E g W 2 M 6 I 4 M s v e T P z P 6 6 Y Y X v u Z U E m K X L H F o j C V B G My e 5 s M h O Y M 5 c Q S y r S w t x I 2 o p o y t O G U b A j e 8 s u r p H V R 9 S 6 r 3 n 2 t U q / l c R T h B E 7 h H D y 4 g j r c Q g O a w C C E Z 3 i F N 2 f s v D j v z s e i t e D k M 8 f w B 8 7 n D 3 N Q j U Q = < / l a t e x i t > l a t e x i t s h a 1 _ b a s e 6 4 = " B O J y max < l a t e x i t s h a 1 _ b a s e 6 4 = " J p u h 3 R o M Z + l b b c + r N c A N r v 2 k r 4 U = " > A A A B 6 H i c b Z D L S g M x F I Y z 9 V b r r e r S T b A I r k p G 2 k 6 7 s u D G Z Q v 2 A u 1 Q M m m m j c 1 k h i Q j l N I n c O N C k b r 0 L X w N d 7 6 N m W k R b z 8 E P v 5 z D u f k 9 y L O l E b o w 8 q s r W 9 s b m W 3 c z u 7 e / s H + c O j t g p j S W i L h D y U X Q 8 r y p m g L c 0 0 p 9 1 I U h x 4 n H a 8 y 5 8 5 h T 9 w P n 8 A 0 c u M 6 A = = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " f C X L H e A a r c 4 r w 5 7 c Y s o L 0 f J t F D g = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k q M e C F 4 8 t 2 A 9 o Q 9 l s J + 3 a z S b s b o Q S + g u 8 e F D E q z / J m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d g o b m 1 v b O 8 X d 0 t 7 + w e F R + f i k r e N U M W y x W M

2 m 5 I
N w V t 9 e Z 2 0 r 6 r e d d V r 1 i r 1 W h 5 H E c 7 g H C 7 B g x u o w z 0 0 o A U M E J 7 h F d 6 c R + f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A 0 c u M 6 A = = < / l a t e x i t > l < l a t e x i t s h a 1 _ b a s e 6 4 = " H

8 9 a c l c 3 s w g 9 Y
r x 9 Z j 4 2 L < / l a t e x i t > c) < l a t e x i t s h a 1 _ b a s e 6 4 = " + G + e M u G v 6 + L m 4 a j G B B 7 U 0 H I M o H Q = " > A A A B 7 X i c b Z D L S g M x F I Y z 9 V b r r e r S T b A I r k q m 2 E 6 7 K 7 h x W c F e o B 1 K J s 2 0 s Z l k T D J C G f o O b l w o 4 t b 3 c e f b O D M d x N s P g Y / / n M M 5 + b 2 Q

Figure 4 :
Figure 4: Time-evolution of information in the transverse-and longitudinal-field Ising model with the infinite product state with ρ n = 2 3 | ↑ ↑ | + | ↓ ↓ | as initial state.a)The information lattice values i l as a function of l at three different times in units of the maximum information lattice value i l max = 5/3 − log 2 (3) (n is suppressed due to translation invariance).At t = 3J −1 (blue dots), nearly all the information remains local at scales l < 4. At t = 10J −1 (orange squares) two peaks have formed with almost zero information in between (note that up to l = 2 the green curve lies directly on top of the orange curve obscuring its view).As the system continues to evolve at t = 14J −1 (green diamonds) the information for l ≤ 2 has essentially stabilized to its infinite-time value while the peak at long range travels to larger and larger scales.b) The information lattice values i l for a time continuum.Notice the gap between the information localized at the smallest scales and the peak traveling to larger and larger scales which is beginning to form slightly after t = 6J −1 .c) The total information-current per site j l→l+1 in units of the maximum current j max l→l+1 ≈ 0.021.d) The total information-current for a time continuum.(In all plots the values at non-integer l, obtained by a third order spline interpolation, are added as a guide to the eye.) < l a t e x i t s h a 1 _ b a s e 6 4 = " r x g G 2 b l N e 1 C p f J F k Y B S 5 / 9 T l x z c = " > A A A B 6 X i c b Z D L S s N A F I Z P 6 q 3

6 <
l a t e x i t s h a 1 _ b a s e 6 4 = " Y B q 9 T x u X X X c 2 + 7 A B o e 3 4 e y z l u A M = " > A A A B 8 X i c b Z D L S g M x F I Y z 9 V b r r e r S T b A I b i y J t J 1 2 I R b c i K s K t h X b W j J p 2 o Z m M k O S E c r Q t 3 B T Q R G 3 P o G v 4 c 6 3 c W Z a x N s P g Y / / n M M 5 + R 1 f c G 0 Q + r B S C 4 t L y y v p 1 c z a + s b m V n Z 7 p 6 G 9 Q F F W p 5 7 w 1 L V D N B N c s r r h R r B r X z H i O o I 1 n d F Z X G / e M a W 5 J 6 / M 2 G c d l w w k 7 3 N K T G T d m B O M L m 7 D I z z p Z n M o j x L B v 4 D n k D t 9 m 8 Z 6 q H W z 7 + 2 e R w O X S U M F 0 b q r x A a x 3 l c y u N L l K s W w E x p s A f 2 w S H A w A Z V c A 5 q o A 4 o k O A e P I I n S 1 t T 6 9 l 6 m b W m r P n M L v g h 6 / U T e N S U 1 w = = < / l a t e x i t > t = 10J 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " W e i i k Y U n N 0 C w p q g X n + B 6 r f t c D L s = " > A A A B 6 X i c b Z D L S s N A F I Z P 6 q 3

5 <
l a t e x i t s h a 1 _ b a s e 6 4 = " d o V 1 3 o + c b 2 k D Y g X j P A 1 r A t 6 R z e 4 = " > A A A B 6 X i c b Z D L S s N A F I Z P 6 q 3 q 1 P I 4 i n M A p n I M F N r T h G j r Q B Q I e P M A T P B t z 4 9 F 4 M V 5 X r Q U j n z m G H z L e P g F R + 4 1 3 < / l a t e x i t > b) < l a t e x i t s h a 1 _ b a s e 6 4 = " + 6 t 4 W H 4 m w I + l e 4 K c O I F 3 y o 2 f p V 4 = " > A A A B 6 H i c b Z D L S s N A F I Z P 6 q 3 L B I 9 r T 6 G O P S j t O r 5 3 B Q + 0 M o R s I / X w F U / f 7 R I w 9 K a e e o z s 9 r M b y d y 0 x / 6 v 1 I u X W 7 J j 5 Y a S o T + a L 3 I h D F c D k 6 3 D I B C W K T z V g I p i + F Z I x F p g o H V A u D a F e r 9 R M C y Z Q R a i c Q s W y z K 8 Q 2 i c l s 1 o y z 3 U a Z T B X F u y D A 3 A E T G C B B j g D T d A C B F y C O / A A H o 3 A u D e e j O d 5 a 8 Z Y z O y B H z J e P g H y g Z H s < / l a t e x i t >  0 < l a t e x i t s h a 1 _ b a s e 6 4 = " B O J y v / s c p a t e x i t > 0.5 < l a t e x i t s h a 1 _ b a s e 6 4 = " X 7 C N X 0 A l F 1 5 i y P D 5 z 1 x G W 7 v 2 g 9 Y = " > A A A B 6 X i c b Z D L S g M x F I Y z 9 V b r r e r S T b A I r k p G 2 k 6 7 s u B G X F W x F 2 h L y a S Z N j S T G Z K M U I a + g R s X i n b r U / g a 7 n w b M 9 M i 3 n 4 I f P z n H M 7 J 7 4 a c K Y 3 Q h 5 V Z W V 1 b 3 8 h u 5 r a 2 d 3 b 3 8 v s H L R V E k t A m C X g g O y 5 W l D N B m 5 p p T j u h p N h 3 O W 2 7 k 4 u k 3 r 6 j U r F A 3 O p p S P s + H g n m M Y K 1 s W 6 u 9 C B f Q E W U C v 4 F e w m F 8 7 e X R P P G I P / e G w Y k 8

Figure 5 :
Figure5: Time-evolution of information in the transverse-and longitudinal-field quantum Ising Hamiltonian starting from the initial state (36) that is the product state of maximally mixed states except at a single site where the spin points in the positive x-direction.a) The total information-current in units of the maximum total information current J max l→l+1 ≈ 0.45 at several different times (before the dynamics is dominated by diffusion).b) The total information-current in units of the maximum total information current J max l→l+1 ≈ 0.45 for a continuum of times.c) The information-currents at late times in units of J 1→2 , which equals ≈ 15 × 10 −5 for t = 50, ≈ 4.2 × 10 −5 for t = 100 and ≈ 2.1 × 10 −5 for t = 150.As a comparison, the current in a 3-local Gibbs state, with the same local density marices, at t = 50J −1 , is shown.The Gibbs state underestimates the current by several orders of magnitude; For example, at t = 50J −1 the J 3,4 current is underestimated by a factor of 1.3×10 4 , it continues to decay, and J 7,8 is underestimated by a factor of 1.4 × 10 12 .In a) and b) the values at non-integer l, given by third order spline interpolation, are added as a guide to the eye.
t e x i t s h a 1 _ b a s e 6 4 = " r x g G 2 b l N e 1 C p f J F k Y B S 5 / 9 T l x z c = " > A A A B 6 X i c b Z D L S s N A F I Z P 6 q 3 t e x i t s h a 1 _ b a s e 6 4 = " d o V 1 3 o + c b 2 k D Y g X j P A 1 r A t 6 R z e 4 = " > A A A B 6 X i c b Z D L S s N A F I Z P 6 q 3W W 9 W l m 8 E i 6 K Y k 0 j b t r u D G Z R V 7 g T a U y X T S D p 1 c m J k I J f Q N 3 L h Q x K 1 v 5 M 6 3 c Z I G 8 f b D w M d / z u G c + d 2 I M 6 l M 8 8 M o r K 1 v b G 4 V t 0 s 7 u 3 v 7 B + X D o 5 4 M Y 0 F o l 4 Q 8 F A M X S 8 p Z Q L u K K U 4 H k a D Y d z n t u / O r t N 6 / p 0 K y M L h T i 4 g 6 P p 4 G z G M E K 2 3 d u h f j c s W s m p n Q X 7 B y q E C u z r j 8 P p q E J P Z p o A j H U g 4 t M 1 J O g o V i h N N l a R R L G m Ey x 1 M 6 1 B h g n 0 o n y S 5 d o j P t T J A X C v 0 C h T L 3 + 0 S C f S k X v q s 7 f a x m 8 n c t N f + r D W P l N Z 2 E B V G s a E B W i 7 y Y I x W i 9 N t o w g Q l

Figure 7 :
Figure 7: a) Expectation value of the spin on the central site n 0 as a function of time for the initial state definied in Eq. (36) and for truncation values l c from 6 to 9. At late times, it follows the 1/ √ t behavior expected from conventional diffusion.b) The relative error on s x , using the largest truncation value l c = 9 as reference value.We indicate on the graph the largest relative error for each truncation value.Although the maximum error is larger than for the diffusion constant, the two largest truncation values agree everywhere on the two leading digits.Also the error stabilizes to roughly, but somewhat smaller, value than for the diffusion constant.

20 <
l a t e x i t s h a 1 _ b a s e 6 4 = " 3I p N C b W X X Y I W p B + v M e L I b a h C 5 d A = " > A A A B 6 X i c b Z D L S s N A F I Z P v N Z 6 q 5 e d m 8 E i 6 K Y k 0 j b t r u D G Z R V 7 g T a U y X T S D p 1 M w s x E K K V v4 M a F I m 5 9 I 3 e + j Z O 0 i L c f B j 7 + c w 7 n z O / H n C l t 2 x / W y u r a + s Z m b i u / v b O 7 t 1 8 4 O G y r K J G E t k j E I 9 n 1 s a K c C d r S T H P a j S X F o c 9 p x 5 9 c p f X O P t e x i t s h a 1 _ b a s e 6 4 = " A A C g P 8 t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / x K x R i x r O h Q 3 z I y S C C B z J C 8 = " > A A A B 6 3 i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v Q V U m 0 V b s r u H F Z w T 6 g D W U y n b R D Z y Z h Z i K U 0 F 9 w 4 0 I R t / 6 Q O / / G J I 3 i 6 8 C F w z n 3 c u 8 9 X s i Z N r b 9 b h V W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B V w e R I r R D A h 6 o v o c 1 5 U z S j m G G 0 3 6 o K B Y e p z 1 v d p X 6 v T u q N A v k r Z m H 1 B V 4 I p n P C D a p F C 9 G Z F S u 2 D U 7 A / p L n J x U I E d 7 V H 4 b j g M S C S o N 4 V j r g W O H x o 2 x M o x w u i g N I 0 1 D TG Z 4 Q g c J l V h Q 7 c b Z r Q t 0 k i h j 5 A c q K W l Q p n 6 f i L H Q e i 6 8 p F N g M 9 W / v V T 8 z x t E x r 9 0 Y y b D y F B J l o v 8 i C M T o P R x N G a K E s P n C c F E s e R W R K Z Y Y W K S e E p Z C M 1 m o 3 n m o J z Y 6 E v 5 D K F 7 W n P O a / W b e q V V z e M o w h E c Q x U c u I A W X E M b O k B g C v f w C E + W s B 6 s Z + t l 2 Vq w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / x K x R i x r O h Q 3 z I y S C C B z J C 8 = " > A A A B 6 3 i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v Q V U m 0 V b s r u H F Z w T 6 g D W U y n b R D Z y Z h Z i K U 0 F 9 w 4 0 I R t / 6 Q O / / G J I 3 i 6 8 C F w z n 3 c u 8 9 X s i Z N r b 9 b h V W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B V w e R I r R D A h 6 o v o c 1 5 U z S j m G G 0 3 6 o K B Y e p z 1 v d p X 6 v T u q N A v k r Z m H 1 B V 4 I p n P C D a p F C 9 G Z F S u 2 D U 7 A / p L n J x U I E d 7 V H 4 b j g M S C S o N 4 V j r g W O H x o 2 x M o x w u i g N I 0 1 D TG Z 4 Q g c J l V h Q 7 c b Z r Q t 0 k i h j 5 A c q K W l Q p n 6 f i L H Q e i 6 8 p F N g M 9 W / v V T 8 z x t E x r 9 0 Y y b D y F B J l o v 8 i C M T o P R x N G a K E s P n C c F E s e R W R K Z Y Y W K S e E p Z C M 1 m o 3 n m o J z Y 6 E v 5 D K F 7 W n P O a / W b e q V V z e M o w h E c Q x U c u I A W X E M b O k B g C v f w C E + W s B 6 s Z + t l 2 Vq w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z / x K x R i x r O h Q 3 z I y S C C B z J C 8 = " > A A A B 6 3 i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v Q V U m 0 V b s r u H F Z w T 6 g D W U y n b R D Z y Z h Z i K U 0 F 9 w 4 0 I R t / 6 Q O / / G J I 3 i 6 8 C F w z n 3 c u 8 9 X s i Z N r b 9 b h V W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B V w e R I r R D A h 6 o v o c 1 5 U z S j m G G 0 3 6 o K B Y e p z 1 v d p X 6 v T u q N A v k r Z m H 1 B V 4 I p n P C D a p F C 9 G Z F S u 2 D U 7 A / p L n J x U I E d 7 V H 4 b j g M S C S o N 4 V j r g W O H x o 2 x M o x w u i g N I 0 1 D T q w 8 p l D + A H r 9 Q N n y I 6 z < / l a t e x i t > c < l a t e x i t s h a 1 _ b a s e 6 4 = " p p 9 z /x K x R i x r O h Q 3 z I y S C C B z J C 8 = " > A A A B 6 3 i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v Q V U m 0 V b s r u H F Z w T 6 g D W U y n b R D Z y Z h Z i K U 0 F9 w 4 0 I R t / 6 Q O / / G J I 3 i 6 8 C F w z n 3 c u 8 9 X s i Z N r b 9 b h V W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B V w e R I r R D A h 6 o v o c 1 5 U z S j m G G 0 3 6 o K B Y e p z 1 v d p X 6 v T u q N A v k r Z m H 1 B V 4 I p n P C D a p F C 9 G Z F S u 2 D U 7 A / p L n J x U I E d 7 V H 4 b j g M S C S o N 4 V j r g W O H x o 2 x M o x w u i g N I 0 1 D T

< l a t e x i t s h a 1 _Figure 9 :
Figure 9: I l tot corresponds to summing over an isosceles trapezoid in the information lattice.As is visualized in the figure, this sum can be recast into a sum over triangles which sum up to the total information corresponding to the neighborhood at the tip of the triangle (12).So we get I l tot = n I(Ω l n ) − n I(Ω l−1 n ), where n indicates that the sum runs over all n except the ones corresponding to the left and the right most neighborhoods.

ρ [n 0
+Λ,n 0 +l+Λ] ≈ ρ [n 0 +Λ,n 0 +Λ+l−1] ⊗ I 2 ,(129)and similarlyρ [n 0 −Λ−l,n 0 −Λ] ≈ I 2 ⊗ ρ [n 0 −Λ−l+1,n 0 −Λ] , of expectation values of local constants of the motion.Also the generalization of the usual Gibbs states to have spatially dependent generalized forces are l-local Gibbs states; e.g., a state with spatially varying temperature,ρ({β n }) = e − n βnhn Tr e − n βnhn .(138)To see that this complies with the definition of an l-local Gibbs state we can imagine making a small change to this state, to form the density matrix ρ({β n }) + E. The entropy then changes as S(ρ({β n }) + E) = S(ρ({β n })) − n β n Tr(Eh n ) + O(E 2 ).(139)Here we assumed Tr E = 0, otherwise ρ({β n })+E would not have unit trace.Now if ρ({β n })+E should have the same reduced density matrices on every pair of consecutive sites, we must haveTr [n,n+1] c (E) = 0 n ∈ sites.(140)This means that Tr(Eh n ) = 0 and we can conclude that, to first order in E, ρ({β n }) + E and ρ({β n }) have the same entropy.Since the entropy is convex it follows that ρ({β n }) is the maximum entropy state given the l-local density matrices.It is straightforward to generalize this argument and show that any density matrix ρ ∝ e −O , for some operatorO = n ω l n ; ω l n acts on C l n ,(141)is an l-local Gibbs state.This argument can also be used in reverse to show that any l-local Gibbs state can be cast in the form ρ ∝ e −O , for some operator O as above.If ρ lGibbs is an l-local Gibbs state, then the inner-product of the gradient of the entropy with any perturbation E of ρ l Gibbs , not changing l-local density matrices, must be zero.That is,Tr(E ln(ρ l Gibbs )) = 0,(142)for all Hermitian matrices withE ∈ ker(T →l )(143)where T →l is the trace operator which takes a density matrix on the full space and maps it to the corresponding l-local density matrix.EquivalentlyTr Tr(E ln(ρ l Gibbs )) = 0, the logarithm ln(ρ l Gibbs ) is an element in the orthogonal complement to the kernel ker(T →l ): ln(ρ l Gibbs ) ∈⊥ ker(T →l ).From the expression (144) of the kernel ker(T →l ) it follows that ⊥ ker(T →l ) is spanned by operators of the kind ω l n where ω l n act as identity outside C l n .So, ln(ρ l Gibbs ) = n ω l n ω l n acts on C l n .(145) which concludes the proof.

E. 1
An algorithm to calculate the reduced density matrices in an l-localGibbs stateIn this section we show how to numerically obtain the k-local density matrices in an l-local Gibbs state, if one has access to the l-local density matrices.By definition an l-local Gibbs state is the state which minimize the total information I tot = density matrices Ω l Gibbs .The idea is now to instead minimize the truncated total information Tr (ρ AB − σ AB ) 2 ≤ 2 I ρ (A; B) + I σ (A; B).