Optimal compression of quantum many-body time evolution operators into brickwall circuits

Near term quantum computers suffer from a degree of decoherence which is prohibitive for high fidelity simulations with deep circuits. An economical use of circuit depth is therefore paramount. For digital quantum simulation of quantum many-body systems, real time evolution is typically achieved by a Trotter decomposition of the time evolution operator into circuits consisting only of two qubit gates. To match the geometry of the physical system and the CNOT connectivity of the quantum processor, additional SWAP gates are needed. We show that optimal fidelity, beyond what is achievable by simple Trotter decompositions for a fixed gate count, can be obtained by compiling the evolution operator into optimal brickwall circuits for the $S = 1/2$ quantum Heisenberg model on chains and ladders, when mapped to one dimensional quantum processors without the need of additional SWAP gates.

1 Introduction Quantum processors are a rapidly evolving technology which is expected to be pivotal for many classically hard problems like integer factorization, database search, optimization and many others [1][2][3][4].While truly universal quantum computing is still a long shot, one of the most promising near-term applications is the simulation of complex quantum systems due to their relative similarity to the quantum hardware itself.The simulation of such systems on classical computers is extremely hard due to the exponential complexity in terms of storage and computer time, while both problems are naturally solved on quantum hardware.
There are two different approaches: analog and digital quantum simulators.Analog simulators are specifically engineered systems to mimic the corresponding dynamics of the target system and are often based on quantum optical setups.This technique has been successfully applied to condensed matter systems [4][5][6][7][8][9] and lattice gauge theories [10][11][12] and is in principle extremely powerful but requires a taylored experimental setup for a given type of problem.
In contrast, digital quantum simulators [13] rely on a discrete representation of the wave function on an array of two level systems (dubbed qubits), which can be fully controlled by a universal set of quantum gates which allows in principle for the representation of any unitary operation on the many-body wave function, represented as a sequence of gates.Due to the universal representation of the wave function, this is an attractive approach which is extremely flexible once a suitable mapping of the system of interest to qubits is devised.Recent applications include condensed matter systems [14][15][16][17], simulations from quantum chemistry [11,[18][19][20] and high-energy physics [21,22].Digital quantum simulations were also used to realize exotic phases of matter like time crystals [23,24] and quantum spin liquids [25].
The state-of-the-art method for simulating the real time dynamics of complex quantum systems involves a factorization of the time evolution operator into a sequence of gates using Trotter decompositions of different orders [26][27][28][29][30], introducing discrete time steps to get an approximation of the exact time evolution of the system.This introduces a discretization error, which can be systematically controlled by using smaller step sizes.As a downside, small step sizes require a larger number of gates.Due to the fragility of the quantum state stored in the machine, and due to hardware imperfections, each additional gate potentially introduces new sources of error due to dissipation processes.Hence a trade-off between discretization errors and errors due to intrinsic machine noise during the simulation is required.To achieve optimal fidelity in light of this tradeoff, it is therefore important to minimize the resource costs for a given simulation.Recent work yielded tighter bounds for the discretization errors [31].Furthermore, it was also argued recently that beyond a certain step size the fidelity of the Trotter decomposition breaks down in a universal fashion, leading to a regime of quantum chaos [32,33].This sets also upper bounds for possible step sizes.It remains however unclear, whether better alternatives to Trotter decompositions exist.
One promising approach in this regard are quantum variational algorithms.The main idea of them is to approximate a time-evolved state using a parametrized circuit [34][35][36][37].The parameters are then fixed using optimization algorithms on a quantum computer.Recent numerics suggest that the number of parameters needed to describe time-evolved states or ground states scales favorable even in comparison to matrix-product states [38,39].Most of these algorithms involve optimization where gradients are measured directly on the quantum devices, or they use deep learning approaches.However, the measurement of gradients on a quantum device is currently infeasible due to the high error rates, while optimization using deep neural networks is not controlled.
In this paper we take a more universal approach.Rather than focussing on the wave function, we directly target the time evolution operator, aiming at a compact representation as a shallow circuit.We use brickwall circuits in which the gates are parametrized two qubit unitaries, connecting neighboring qubits in the architecture of the quantum processor as an ansatz for the time evolution operator.This parametrized circuit can be optimized classically to represent the time evolution operator for a given time step with high fidelity.The resulting circuit can then be repeated to evolve the quantum state to later times.We show that such an optimized circuit can yield significantly higher fidelity time evolution for a fixed gate count compared to the traditional Trotter decomposition and is thus superior for digital quantum simulation.
We also show that this strategy allows us to obtain similar accuracy using significantly less gates, even for systems where the physical geometry does not coincide with the proposed circuit architecture, essentially "baking in" the otherwise required SWAP gates to match geometries into the circuit.As an interesting benchmark problem, we use our approach to compute out-of-time-ordered correlators (OTOCs) and show that we achieve better accuracy than Trotter methods with similar resource cost.Finally, we analyze the gate structure of the optimized gates, as a first step towards further improvements of this approach.

Model
For concreteness and simplicity, we focus on simulating finite systems of s = 1/2 spins on a lattice with L sites, designed to be performed on a quantum processor with an identical Hilbert space H, which is the product space of L two-level quantum systems (qubits) L i=1 Q i and has an exponentially growing dimension dim H = 2 L .Specifically, we discuss spin-1/2 systems with SU(2) symmetric Heisenberg couplings between nearest neighbor (NN) spins on a chain c and a triangular ladder l, both with open boundary conditions, i.e.
Here σ x,y,z are the usual Pauli matrices while i, j and i, j denote the NN sites of the chain, or the NN sites of our triangular ladder geometry (note that this is identical to a chain with nearest and next nearest neighbor (NNN) interactions).These lattice geometries are illustrated in Fig. 1.Most current quantum devices using superconducting qubits are not capable of all-toall connectivity, i.e. due to the chip setup two qubit gates can only be applied between neighboring qubits, which are arranged in different geometries [40][41][42] In order to apply gates between distant qubits, one has to use a sequence of swap gates, which exchange the quantum state of neighboring qubits, such that effectively the states of distant qubits are moved to neighboring qubits in the processor geometry.On these, any two qubit gate can be applied and then the swap sequence needs to be applied in reverse order.This requires a great number of additional gates and therefore introduces further possible sources of errors.

M = 2
ui uj Our goal is therefore to find the best unitary circuit C of a given depth M to approximate the time evolution operator U(t) = exp(−itH c/l ).In order to mimic the limited connectivity of current quantum devices, we choose C to consist only of NN two-qubit gates on a 1d chain, arranged in a brickwall pattern, i.e. we model our quantum processor as an open chain of qubits, while one of our physical models we want to simulate on this machine has a different, triangular ladder, geometry.This allows us to investigate whether it is possible to compile the time evolution operator in a nearest neighbor, brickwall circuit (exemplified in the left panel of Fig. 2) without the need for additional swap gates, which are generally costly on superconducting platforms.

Trotter circuits
To benchmark the performance of the brickwall circuits we will compare them with the first-, second-and fourth-order Trotter circuits that are based on the well known Trotter decompositions [43].Here we introduce these circuits for the Hamiltonians (2) that are used in this work.
For the chain Hamiltonian H c we have two non-commuting parts, namely the bond hamiltonians h i,i+1 (1) on alternating bonds, such that we can split H c in two commuting parts as For the ladder Hamiltonian we have on top of this three extra non-commuting parts due By writing the Hamiltonians in this way we can define the M = 1 first-order Trotter circuits for H c and H l as [43] = exp(−itH 1 ) exp(−itH 2 ), = exp(−itH 1 ) exp(−itH 2 ) exp(−itH 3 ) exp(−itH 4 ) exp(−itH 5 ).
These circuits approximate the exact U(t) = exp(−itH c/l ) with error O(t 2 ) [31].Note that depth M = 1 for the Trotter circuits does not mean one brickwall layer, but instead one Trotter step U 1nd c/l (t).While these coincide for the first-order Trotter circuit for the chain, this is not the case for the first-order Trotter circuit for the ladder, and for the second-and fourth-order Trotter circuits which we introduce below.The circuit diagram for U 1st c (t) is shown as the blue brickwall layer in the left panel of Fig. 3, where U 1 (t) is the half-brickwall layer on odd bonds and U 2 (t) is the half-brickwall layer on even bonds.The circuit diagram for U 1st l (t) is the full circuit in this figure, where U 1 (t) and U 2 again form the blue brickwall layer while U 3 (t), U 4 (t) and U 5 (t) form the red layer, containing two-qubit gates that act on NNN instead of NN qubits.To turn this into a circuit that involves only NN two-qubit gates we introduce the SWAP gate and decompose every NNN gate as in the right panel of Fig. 3.
The circuit layers U 1 , U 2 , U 3 , U 4 , U 5 form the building blocks of the second-and fourthorder Trotter circuits.The M = 1 second-order Trotter circuits are composed as [43] which approximate the exact evolution operators with error O(t 3 ) [31].Using these secondorder Trotter circuits we can define the M = 1 fourth-order Trotter circuits as [43] where we defined the time steps These circuits approximate the exact evolution operators with error O(t 5 ) [31].
Because we are concerned with circuits that are implemented on a quantum processor with only NN qubit connectivity, we have to convert every NNN two-qubit gate that appears in U 1st l , U 2nd l , U 4th l to three NN two-qubit gates, as shown in the right panel of Fig. 3.The gate counts N g of the resulting NN Trotter circuits are given in Sec.A, also for the chain geometry.

Optimization
Each two-qubit gate U ij ∈ C 4×4 of the circuit C, acting on two neighboring qubits i and j, can be decomposed into a product of one-qubit gates u i ∈ C 2×2 and a two-qubit gate Here v ij is parameterized as with three real parameters λ 0,1,2 , and the u i are parameterized up to a global phase as each containing three real parameters φ 0,1,2 .Hence this decomposition of U ij contains 15 real parameters, and it can be visualised as in the right panel of Fig. 2. To represent the unitary gate as a global unitary matrix, acting on the full wave function, we introduce its matrix form mat by taking the Kronecker product with identity matrices on the qubits on which the gate does not act (and implicitly encoding the nearest neighbor condition j = i + 1).The entire circuit is a product of such unitaries and can formally be expressed by where N g is the total number of gates in the circuit.Since each gate is parametrized by We would like to find an optimal parameter set θ for a given circuit architecture, such that the distance between the unitary represented by the circuit C( θ) and the exact time evolution operator U of the system up to time t is minimized.For two unitary operators U and C, we therefore define a measure of distance in terms of the normalized Frobenius norm, namely the "infidelity" , given by = 1 2 We use this infidelity as an objective function, such that we obtain a minimization problem for a fixed circuit architecture (number and sequence of two qubit gates).
The objective function needs to be evaluated many times during the optimization and we find that it is efficient to first compress the time evolution operator U into a matrix product operator (MPO) M χ of bond dimension χ, such that we can calculate via efficient standard tensor network methods.For the local systems we investigate here and for short times, this is always efficient, due to the low operator entanglement of the time evolution operator [45].In particular, we discard the smallest singular values for which the squares sum to a tiny number, since their contribution is negligible, such that lowly entangled operators do not saturate the maximum bond dimension χ.To obtain the (truncated) MPO representation of U with negligible discretization error, we take an identity MPO and perform time-evolving block decimation [43,46] with a small timestep δt = 10 −4 and fourth-order Trotter decomposition, such that the introduced error is negligible1 .
To optimize the parameters θ of the circuit such that is minimal, we employ the paradigm of differential programming [47].Here the gradient ∇ θ is calculated in similar fashion as the original backpropagation algorithm used for deep neural networks [48], which has been adapted for tensor-network operations such as contraction and SVD.Using this gradient we then perform gradient descent.We use this global optimization procedure instead of the local optimization from [38] because we found that this yields significantly higher fidelity when an Adam-like adaptive learning rate is used [49].Here it is crucial not to stop optimizing when the infidelity appears to have stagnated, since we have often found that the optimization gets stuck in such a "local minimum" for some time before it jumps out and converges to a lower minimum.This is possibly related to the "barren plateau" problem that often occurs when performing gradient descent for quantum circuits with a large parameter space, where the optimization reaches a set of circuit parameters for which the majority of its gradients become very small such that the optimization (temporarily) halts [50].
At small M the optimized circuits in a sense compress the targeted time evolution operator, especially when its time-step is large, and therefore they are called "compressed circuits".

Stacking circuits
The general strategy we implement is the following: For some (short) timestep t, we find an optimal circuit C( θ) which best approximates the exact time evolution operator U t .In principle, t is arbitrary, with the general logic that shorter t unitaries can be encoded by shallower circuits (lower M ).In practice, t will be also governed by the time grid, on which observables should be evaluated, although this could be achieved also by working with two or more different optimized circuits with different t, a case we do not further discuss in this work.To propagate the wave function to longer times, which are multiples of t, we then use the circuit It is interesting to investigate how well this stacked circuit performs for time evolution to longer times and we will confront these results to benchmarks for the circuits discussed in Sec.2.2 that result from traditional Trotter decompositions.

Quantities of interest
Having obtained the compressed circuits for short times, we compute for long time using the stacked circuits as approximation.However, the increasing bond dimension with time makes a computation using MPOs impossible.Instead we use typicality [51].Here the trace in Eq. 19 is replaced by the average over N ψ Haar random states |ψ i , i.e.
This allows us to calculate in an unbiased manner for the system sizes considered in this work.The reasoning behind using typicality, instead of decomposing the exact U as an MPO M without truncation and using this to calculate the trace in (21), is that for a highly entangled U the maximum bond dimension χ is always saturated, i.e. the central tensors will require bond dimension 2 L to prevent significant truncation errors.For a highly entangled MPS |ψ i this central bond dimension is instead 2 L/2 , which is still managable for the system sizes considered in this work.Besides using the infidelity as a measure of the performance of the circuits, we will also use the circuits to compute out-of-time-ordered correlators (OTOCs) [52].For spin-1/2 σ z operators, the OTOC C ij between lattice sites i and j is defined with the Frobenius norm as where is the spin operator on site i evolved by the circuit.As for the infidelity, it is important to use typicality instead of the truncated MPO formalism when calculating C ij for a circuit that is stacked many times.
To calculate (22) we invoke the hermiticity of the spin operators σ z , such that by expanding the commutator in (22) we can write the OTOC as which is readily calculated in the MPO formalism.Concretely, we take an identity MPO and put a z-spin operator σ z at site i, which is then evolved in the Heisenberg picture by the circuit C, yielding a different MPO.Then we again take an identity MPO and put a z-spin operator on site j, which we do not evolve.Then we calculate the trace in (23) via a full contraction of four MPOs, which can be done efficiently.

Results
To benchmark the performance of the compression strategy outlined in Sec. 2, we systematically analyze the infidelity as a function of simulation time step t, total gate count N g and system size, in direct comparison to Trotter decompositions of different orders, and present these results in Sec.3.1.In Sec.3.2 we extend this systematic analysis to out-of-time-ordered correlators (OTOCs) (22).Furthermore, in Sec.3.3 we probe the structure of the gates that make up the optimized circuits, in an attempt to uncover the structures that allow these circuits to outperform their Trotter counterparts.

Infidelity
As a first test of the circuit optimization algorithm outlined in Sec. 2, we compare the optimal infidelities of compressed circuits to those of comparable Trotter circuits.Concretely, we consider time evolution operators of the chain and ladder Heisenberg Hamiltonians (2) at three system sizes L = 8, 12, 16 and two time-steps t = 1, 2. For each Hamiltonian, system size and time-step, we determine the time evolution operator U with numerically negligible discretization error for a certain bond dimension χ, and perform the global optimization as outlined in Sec 2 to minimize the infidelity of the compressed circuit.For L = 8, 12, 16 we have taken χ = 256, 150, 100 as a compromise between precision and practical efficiency.We note that our main concern here is not to get a numerically exact MPO representation, but rather a reasonably good approximation of the time evolution operator.For all practical purposes, we will then consider this MPO our exact time evolution operator, which we want to approximate by our circuits.As a first benchmark, we take for each of our parameter sets various circuit depths M = 1, 2, 4, 8, 16, where M is the number of elementary layers of L − 1 gates, and consider as a function of the corresponding gate count N g (see Sec.A for details on how to obtain the number of gates).We compare this with first-, second-and fourth-order Trotter circuits [43].
The results are shown in Fig. 4. The left pair of panel columns is for the chain and the right pair is for the ladder.The first and third columns are for time-step t = 1 and the second and fourth are for t = 2.The upper row is for system size L = 8, the middle row is for L = 12, and the bottom row is for L = 16.Each panel contains the infidelities of the optimized compressed circuits (CC) as a red line, and the infidelities of the Trotter circuits as blue lines.The infidelities of the Trotter circuits are calculated for the same depths M as the compressed circuit, where it should be remembered from Sec. 2.2 that in this case M is not necessarily equal to the amount of brickwall layers in the Trotter circuit, but is instead equal to the amount of Trotter steps that compose the circuit.The time-step of the Trotter step is chosen as t/M , such that M subsequent steps correspond to a total time-step t.The gate counts of the Trotter circuits were calculated with the expressions in Sec.A, which take into account the number of swap gates required to map the ladder geometry to a chain of qubits.
From Fig. 4 it becomes clear that per gate the compressed circuit outperforms the Trotter circuits for all considered parameter sets.Moreover, it appears that the infidelity of the compressed circuit roughly scales with N g like the best Trotter order, but with a more favorable prefactor, i.e. at intermediate gate counts it scales as second-order whereas at the highest probed gate count it scales as fourth-order.We have found that the same picture emerges when plotting versus the t at which the circuit was optimized, where M = 1 scales like first-order Trotter, and by increasing M we approach the fourth-order scaling, passing through the second-order scaling.
Having considered the infidelities of the compressed circuits at the time-step for which they were optimized, we now quantify how these infidelities grow when the circuits are stacked, which we do for the same systems as in Fig. 4. To this end we select a compressed circuit that was optimized at t = 2, and take for every Trotter order a circuit of depth M with a gate count as close as possible to that of the compressed circuit, and choose its time-step to be t/M .Concretely, for the chain we take a compressed circuit with M = 8, in which case we have to take first-, second-, and fourth-order Trotter circuits with M = 8, 7, 1.For the ladder we take a compressed circuit with M = 16, such that we have to take first-, second-, and fourth-order Trotter circuits with M = 4, 3, 1.To quantify the quality of the compressed and Trotter circuits under stacking, we take various infidelity thresholds ˆ and stack the circuits up to a thousand times until they cross this threshold at some time t, i.e. we determine ( t) = ˆ .As mentioned in Sec. 2 we utilize typicality (21) to calculate the stacked infidelities.
In Fig. 5 we plot ˆ versus t in log-log scale.The used color coding is identical to that of Fig. 4, except that the fourth-order Trotter circuit for the ladder is now represented with a dashed line, to emphasize that its infidelity relative to that of the compressed circuit is not necessarily indicative of the relative performance, because it contains roughly twice as many gates as the compressed circuit.From these plots it is clear that the advantage of the compressed circuits from Fig. 4 is not lost when stacking it many times.In particular, in all considered cases the compressed circuits are able to go to significantly larger times, at all infidelity thresholds, than the Trotter counterparts.The only exception is for the ladder at t = 1, where the fourth-order Trotter circuit performs better, but as mentioned this Trotter circuit has twice as many gates as the compressed circuit and is therefore not a fair comparison.
From the plots we extract the universal quadratic power-law ˆ ∝ t2 , for both the compressed and the Trotter circuits.This error scaling is analogous to first-order Trotter decomposition.The only exception is the ladder with L = 16 at t = 2, where the infidelity reaches ≈ 1 rather quickly, such that it is situated in the rounding part that is also observed for the t = 1 ladder curves at the high-infidelity end.The gap between the compressed circuits and the best performing Trotter circuits is thus found to grow quadratically with t.Concretely, for the chain with L = 12 and timestep t = 1, we find that for ˆ = 10 −3 the compressed circuit has t = 644 whereas the best Trotter circuit (i.e. of fourth-order) has t = 94.For ˆ = 10 −4 we instead get t = 201 for the compressed circuit and t = 29 for the best Trotter circuit.For the same system at timestep t = 2, we find that at ˆ = 10 −3 the compressed circuit has t = 116 while the best Trotter circuit

Chain Ladder
Figure 5: The time t after which the stacked circuits exceed the infidelity threshold ˆ , for the time evolution operator of the Heisenberg model on a chain (left panels) and ladder (right panels) in log-log scale.The first and third columns are for circuits optimized at t = 1 while the second and fourth columns are for t = 2, with the circuits being stacked up to a thousand times.The circuits were chosen such that they have similar gate counts, with M = 8, 8, 7, 1 for the chain and M = 16, 4, 3, 1 for the ladder, for the compressed circuit and first-, secondand fourth-order Trotter circuits, respectively.The top panels are for L = 8 with χ = 256, the middle panels are for L = 12 with χ = 150, and the bottom panels are for L = 16 with χ = 100.The blue curves represent the Trotter circuits and the red curve represents the compressed circuit (CC).The fourth-order Trotter circuit for the ladder is displayed as a dashed line, since it contains roughly twice as many gates as the compressed circuit and is therefore not necessarily indicative of their relative performance.
has t = 14.At = 10 −2 we have t = 378 for the compressed circuit and t = 46 for the best Trotter circuit.From these values it is clear that for the chain we can go roughly eight times further in time than the best Trotter circuit with similar gate count.These values are for L = 12, and the same analysis at L = 8 reveals that here we can go fourteen to twenty times as far, while for L = 16 we can go three to eight times as far, with the lower bounds for t = 2 and the upper bounds for t = 1.These values emphasize that the larger we choose ˆ , the larger the gap between t of the compressed and Trotter circuits becomes, which grows quadratically as stated above.This implies that the superiority of the compressed circuits over Trotter circuits becomes especially apparent when we set a relatively high error threshold, which for the compressed circuits is reached at much larger time than for Trotter circuits which have comparable gate count.Repeating this analysis for the ladder, again starting off with L = 12 and t = 1, we find at ˆ = 10 −2 that the compressed circuit has t = 34 whereas the best Trotter circuit, excluding the fourth-order Trotter with double the gate count, has t = 14.With ˆ = 10 −1  the compressed circuit has t = 125 whereas the second-order Trotter circuit has t = 57.For the same system at t = 2 and with ˆ = 10 −1 , we have t = 40 for the compressed circuit and t = 10 for the second-order Trotter circuit.Hence for the ladder we can go roughly two to four times as far than the best Trotter circuit with comparable gate count.Repeating this analysis for L = 8 we find that we can go five to two times farther, and for L = 16 we can go three to two times farther, again with the lower bounds for t = 1 and the upper bounds for t = 2. Instead of examining the stacking behavior of compressed and Trotter circuits with comparable gate count, we now compare how circuits with comparable optimized infidelity stack, to see whether similar fidelities are achievable with compressed circuits that have only a fraction of the gates of Trotter circuits.To this end we consider the chain and ladder for a single system size L = 12, with time-step t = 2 for the chain and t = 1 for the ladder, and we stack the circuits up to t = 20.For simplicity we compare only with second-order Trotter circuits, as we find analogous results for the other Trotter orders.For the chain we take compressed circuits with M = 4, 8, in which case the second-order Trotter circuits with similar optimized infidelity have M = 5, 16.Imporantly, while these compressed and Trotter circuits have similar fidelity, the M = 5 Trotter circuit has 1.4 times the gate count of the M = 4 compressed circuit, whereas the M = 16 Trotter circuit has 2.1 times the gate count of the M = 8 compressed circuit.For the ladder we take compressed circuits with M = 8, 16, such that the corresponding second-order Trotter circuits have M = 2, 4, i.e. they contain 1.6 times as many gates.
The results are displayed in Fig. 6 in log-log scale, where in the left panel we show the stacked infidelities for the chain and in the right panel for the ladder.The red dashed lines are for the power laws ∝ t n with the best fitting power n.It is seen that the infidelity increases similarly for all considered pairs of compressed and Trotter circuits, which like Fig. 5 emphasizes that the compression strategy expounded in Sec. 2 has no drawbacks at long times, relative to the Trotter circuits.Moreover, the mentioned discrepancy in gate counts, with in all cases the Trotter circuit having significantly more gates, makes the compressed circuits especially favorable for simulation on real quantum devices, where the error due to gate imperfections and decoherence noise hampers time evolution.Having studied the infidelity and its behavior under stacking in detail in Sec.3.1, we now use the compressed circuits to determine the behavior of a quantity that does not enter the objective function (19), namely the OTOC (22).

Out-of-time-ordered correlators
We again focus on the chain and ladder with L = 8, 12, 16, for which we take a M = 8 and M = 16 compressed circuit, respectively.Both are optimized at t = 2 and then stacked up to ten times, and subsequently used to calculate C i=2,j .For the chain we let j run over all sites, whereas for the ladder we let it run over all rungs.The results are shown in Fig. 7, where they are put alongside the exact values for comparison.In the left two panel columns we show the results for the chain and in the right two columns for the ladder.The first and third columns are for the compressed circuits, whereas the second and fourth columns are for the exact values.Clearly the agreement is excellent for all considered stacking times t.In Sec.B we compare these OTOCs with those computed with Trotter circuits.As in Sec.3.1 it follows that we need less gates to achieve the same error and can go farther in time for comparable gate counts.

Analysis of the compressed circuit
In the previous Sections 3.1 and 3.2 we have seen that the compressed circuit outperforms the Trotter circuits.Here we investigate how this is achieved, by probing the structure of the layers and gates that make up the compressed and Trotter circuits.
Starting off, we take a compressed circuit and Trotter circuits with comparable gate counts, and consider the infidelity between a subset of layers M * < M (counting from the bottom layer) and the time evolution operator at a time t * < t that is smaller than the time-step t at which the compressed circuit was optimized.Crucially, we must take into account the gauge freedom that exists between layers, where we are able to insert conjugate layers of one-qubit unitaries, and absorb one layer into the subset we are considering and the other layer into its complement.This process is illustrated in Fig. 8. Hence when calculating a subset infidelity for the compressed circuit, we add a layer of one-qubit unitaries between the subset and the time evolution operator at t * , and minimize the infidelity with respect to these one-qubit unitaries.This way we account for the gauge freedom.
The gauge freedom that exists between the layers of a circuit.When we cut the circuit across the horizontal dashed line, and want to use the lowest M * layers to calculate an infidelity, we have to take into account the gauge freedom that is encoded by inserting a pair of conjugate one-qubit unitaries u † i u i = I at each qubit, and absorbing one unitary upwards and the other downwards.
In Fig. 9 we show the results for the chain with L = 8 at t = 1, for a compressed circuit with M = 8 and Trotter circuits with M = 8, 7, 1 for first-, second-and fourthorder, which have gate counts close to that of the compressed circuit.Here we define a Trotter circuit with M * layers as having M * brickwall layers, and the largest shown M * is the full circuit, which e.g. for the second-order Trotter circuit involves adding half a brickwall layer to its largest subset.For the compressed circuit M * = 8 corresponds to the full circuit.The dashed lines mark the times t * = tM * /8.From Fig. 9 it is clear that at t = 1 there is significant overlap of the subsets with a time evolution operator at t * < t for both the compressed and Trotter circuits.However, in contrast to the first-and second-order Trotter circuits, where the infidelity dips are equidistant, and where for the first-order Trotter circuit the dip depth is decreasing with the number of stacked layers while for the second-order Trotter circuit it is constant, the dips of the compressed circuit are instead roughly symmetric and are smallest around t * ≈ t/2.A closer look reveals that the infidelity at this point is roughly 10 −2 , which is more than one order of magnitude larger than for the first-and second-order Trotter circuit at similar t * .This is even more remarkable when taking the final infidelity into account, which is = 1.8 • 10 −9 for the compressed circuit and therefore at least three orders of magnitudes better than the first-, second-and fourth-order Trotter circuits, which have = 8.2 • 10 −4 , 1.2 • 10 −6 , 2.1 • 10 −6 .This indicates that the compressed circuit does not follow the exact "trajectory" given by the unitary time evolution, but slightly deviates from it.However, it becomes "refocused" at t * = t, which we sketch in Fig. 10.It is an interesting question for future research to understand the alternative trajectory, which Figure 10: A sketch of the "refocussing" mechanism that potentially explains the structures observed in Fig. 9.Here the exact time evolution U(t) is shown in black, the Trotter evolution U tr (t) is shown in blue, and the compressed evolution U c (t) is shown in red.While U tr (t) follows the exact trajectory quite closely, U c (t) instead becomes "refocussed" at multiples of the optimization timestep t. might be beneficial for an optimal discretization of time evolution beyond the Trotter decomposition.
We note that we did not find these symmetric dips for all our compressed circuits, especially for larger t and the ladder geometry.It remains an open question whether this is an artefact of the convergence of the optimization to a non-global minimum.As a further comparison between compressed and Trotter circuits, we calculate the operator entanglement entropy (opEE) of their gates [45,53].Concretely, we take an optimized compressed circuit C and decompose each two-qubit gate U ij ∈ C using a singular value decomposition into where v l i and v l j are two sets of four one-qubit operators, acting on qubit i and j respec- In Fig. 11 we display the opEE of all gates in a M = 8 compressed circuit for the chain (left panel) and ladder (right panel) for L = 16 at t = 2.The histograms are stacked, with each color denoting the content of a layer, where the lightest color represents the bottom layer and the darkest color the top layer.The red vertical lines mark the values for the M = 8 first-order Trotter circuit, with the two lines in the ladder plots corresponding to the evolution and SWAP gates.These histograms show that the gates of the compressed circuit are more hetergenous compared to those of the Trotter circuits, since they have a relatively large spread in opEE instead of one or two values.Moreover, for the ladder it is seen that a several gates in the compressed circuit assume an opEE that is near to that of the SWAP gate, which we view as an indication that the action of the SWAP gate is baked into our optimized circuits.Finally we consider the distribution of the parameter λ 1 across the optimized two-qubit unitaries, which are parameterized as in (14).We found that λ 2 and λ 3 are distributed similarly.In Fig. 12 we show histograms for the parameter counts N p of λ 1 for the chain (left panel) and ladder (right panel) with L = 8 at t = 1, for a compressed circuit with M = 8.Note here the different scales of the x-axes.The histograms are again stacked, with the lightest color corresponding to the bottom layer and the darkest color to the top layer.The red dashed lines mark the values of the gates in the M = 8 first-order Trotter circuit, for which λ SWAP 1 = −π and λ evo 1 = t/M , both having no one-qubit dressing (15).
As in Fig. 11, we see that the gates of the compressed circuit have a larger spread than the gates of the Trotter circuit, which instead assume one or two values.Also, for the ladder we again observe an accumulation of gates near the SWAP value.The gates appearing in the optimized circuits appear to encode more structure than gates from Trotter circuits and are generally speaking encoding a larger change of the wave function per gate compared to the case of Trotter circuits.This can be seen best in the limit of very small Trotter time steps, in which each appearing gate (except SWAP) is very close to identity, while in the opposite limit which we optimize for, each gate needs to be sufficiently different from identity in order to represent the same time evolution operator.

Conclusion and Outlook
In this work we have presented an approach which reduces the resource cost of digital quantum simulation compared to standard Trotter decompositions by globally optimizing a simple parameterized brickwall circuit in a way that is scalable to large systems.Crucially, the performance per gate is better even when the compressed circuit does not respect the connectivity of the simulated lattice, potentially allowing for high fidelity simulation of systems with a connectivity that is larger than that of the used quantum processor.To illustrate this we have compared the infidelity of the compressed and Trotter circuits with the exact time evolution operators of Heisenberg chains and ladders, as well as the ability to reproduce their OTOCs.
We have shown that we can achieve similar accuracy of the time evolution operator with up to one order of magnitude less gates, depending on the desired accuracy and system.Moreover, we checked that this advantage persists when stacking the circuits many times, a central ingredient to simulating a quantum system over long times.This enables high fidelity propagation to times which are currently elusive with conventional Trotter decomposition methods.
Furthermore, we analyzed the structure of the compressed circuits.In the case of the chain, we observed a "refocussing" mechanism, which suppresses the infidelity at multiples of the optimized time step, while the evolution inside the optimized circuit appears to follow a trajectory which is further away from the exact time evolution operator.It is an interesting question for further research to understand this trajectory and relate it also to recent studies of Trotter decompositions and its breakdown for large time steps [32,33].
Our results open the door for many further directions.As a next step, one can for example take symmetries into account to further reduce the number of parameters.This might be especially favorable when exploiting translation symmetries.Furthermore, one can optimize the circuits with other cost functions than the fidelity, as was also done for example in [35].Promising directions are using local observables or density matrices.While such an approach might simplify the convergence of the optimization, it is still an open question to what extent the accurate simulation of observables or other general quantities would be recovered.
We end by stressing that in this work we have used the simplest possible noise model, by assuming that each applied gate introduces the same amount of noise to the system and that therefore a minimization of the gate count reduces the overall noise.A refinement of this noise model will be the subject of future research.For the chain we consider M = 4, 8 and for the ladder M = 8, 16.For each M we choose a second-order Trotter circuit with similar fidelity at the optimized t, i.e.M = 5, 16 for the chain and M = 2, 4 for the ladder.As a result the compressed circuits have significantly less gates than the corresponding Trotter circuits.are for L = 16.The left column is for the compressed circuit while the second, third and fourth columns are for the first-, second-and fourth-order Trotter circuits.As in Fig. 5 the depths are M = 8, 8, 7, 1 for the chain and M = 16, 4, 3, 1 for the ladder, for the compressed circuit and first-, second-and fourth-order Trotter circuits, respectively.
For the chain it is clear that the compressed circuit works better than the Trotter circuits within the lightcone, whereas it is slightly worse than the second-and fourthorder Trotter circuits at approximating the small values outside of the lightcone.For the ladder the compressed circuit is better everywhere, even better than the fourth-order Trotter circuit which has twice as many gates.Hence we draw the same conclusion as from Fig. 5: With a similar amount of gates we are able to go farther in time with the compressed circuits than with the Trotter circuits, before reaching some error threshold, even though we do not optimize based on OTOCs.Now we consider the analog of Fig. 6 for the relative error of the OTOC C i=2,j=4 .In particular, we consider the chain and ladder with L = 12 and take a couple compressed circuits for which the infidelities were optimized at t = 2 for the chain and t = 1 for the ladder, which we then stack up to t = 20.As in Fig. 6 we take compressed circuits with M = 4, 8 for the chain and M = 8, 16 for the ladder, and we compare these with secondorder Trotter circuits that have similar fidelity at the optimized time step, corresponding to M = 5, 16 for the chain and M = 2, 4 for the ladder.
In Fig. 13 we show the results, with the left panel for the chain and the right panel for the ladder.The implications are the same as those derived from Fig. 6: With a smaller amount of gates we essentially get the same performance, in this case even for a quantity that does not appear in the objective function (19).

Figure 1 :
Figure 1: The chain (left) and triangular ladder (right) lattice geometries used in this work.

Figure 2 :
Figure 2: Left: A brickwall circuit with depth M = 2 for six qubits, with each color representing a M = 1 layer.Circles represent the initial state of the qubits and boxes indicate a two qubit unitary gate applied to a pair of neighboring qubits.Right: Parametrization of a two qubit unitary as a product of four single qubit gates and one two qubit gate.

Figure 4 :
Figure 4: The infidelity versus gate count N g for the time evolution operator of the Heisenberg model on a chain (left panels) and ladder (right panels) in log-log scale.The first and third columns are for t = 1 while the second and fourth columns are for t = 2.The top panels are for L = 8 and a time evolution MPO with χ = 256, the middle panels are for L = 12 with χ = 150, and the bottom panels are for L = 16 with χ = 100.The blue curves represent the Trotter circuits and the red curve represents the compressed circuit (CC).

Figure 6 :
Figure6: The infidelity versus stacking time t for the time evolution operator of the L = 12 Heisenberg model on a chain at t = 2 (left panels) and ladder at t = 1 (right panels), for compressed and second-order circuits that are stacked twenty times.The circuits were chosen such they have similar at the optimized t, with M = 4, 8 and M = 5, 16 for compressed and second-order Trotter circuits on the chain, and M = 8, 16 and M = 2, 4 for the ladder.As a result the compressed circuits have significantly lower gate count than the corresponding Trotter circuits.The red dashed lines are for the power laws ∝ t n with the best fitting power n.

Figure 7 :
Figure 7: The OTOCs C (t) as a function of site or rung j and stacking time t for the chain (left columns) and the ladder (right columns), for compressed circuits optimized at t = 2 and stacked up to ten times (first and third columns) and the corresponding exact values (second and fourth columns).For the chain we take M = 8 and for the ladder M = 16.The top row is for L = 8 with χ = 256, the middle row is for L = 12 with χ = 150, and the bottom row is for L = 16 with χ = 100.

Figure 9 :
Figure 9:  The infidelity between a subset of layers M * < M counting from the bottom layer, and the exact time evolution operator at time t * < t, where t denotes the time-step at which the compressed circuit was optimized.The plots are for a Heisenberg chain with L = 8 at t = 1.In the top left panel we show the results for a compressed circuit with M = 8, in the top right for a first-order Trotter circuit with M = 8, in the bottom left for a second-order Trotter circuit with M = 7, and in the bottom right for a fourth-order Trotter circuit with M = 1.These depths were chosen such that the circuits have similar gate count.The curve with M * = M corresponds to the full circuit.The dashed lines mark times tM * /8.

Figure 11 :
Figure 11: Stacked histograms for the opEE of the gates of a compressed circuit with depth M = 8, optimized at t = 2 for a L = 16 chain (left panels) and ladder (right panels).The colors denote the contents of each layer, with the lightest color for the bottom layer and the darkest for the top layer.The red vertical lines denote the values for the gates in a M = 8 first-order Trotter circuit, with the two lines in the ladder plots corresponding to the evolution and SWAP gates.

− l s 2 l
tively, and where the four singular values s l encode the opEE of U ij as opEE = ln(s 2 l ).

1 Figure 12 : 1 = 1 =
Figure 12: The distribution of the λ 1 parameter which enters the two-qubit unitary parameterization that was used in this work, shown for the chain (left panel) and ladder (right panel) with L = 8 at t = 1.The parameter count p for a compressed circuit with M = 8 is shown as a stacked histogram, with the lightest color corresponding to the bottom layer and the darkest color to the top layer.The first-order Trotter evolution gate value λ evo 1 = t/M and the SWAP gate value λ SWAP 1 = −π are shown as dashed red lines.The other two-qubit parameters λ 2 and λ 3 are distributed similarly.Note the different scales of the x-axes.

2 L 2 L 2 FFigure 13 :
Figure13: The relative error of the OTOC C i=2,j=4 (t) versus stacking time t for the chain (left panel) and ladder (right panel) with L = 12, for circuits optimized at t = 2 for the chain and t = 1 for the ladder, and stacked up to time t = 20.For the chain we consider M = 4, 8 and for the ladder M = 8, 16.For each M we choose a second-order Trotter circuit with similar fidelity at the optimized t, i.e.M = 5, 16 for the chain and M = 2, 4 for the ladder.As a result the compressed circuits have significantly less gates than the corresponding Trotter circuits.

Figure 14 :
Figure14: The absolute C i=2,j (t) errors for the chain (top three rows) and ladder (bottom three rows) for a compressed circuit optimized at t = 2 and stacked up to ten times, along with the errors for Trotter circuits with similar gate counts.For the chain j labels the sites and for the ladder it labels the rungs.The first and fourth row are for L = 8 with χ = 256, the second and fifth row are for L = 12 with χ = 150, and the third and sixth row are for L = 16 with χ = 100.The first column is for the compressed circuit, the second, third and fourth columns are for the first-, second-and fourth-order Trotter circuits.To have roughly equal gate counts, the used depths are M = 8, 8, 7, 1 for the chain and M = 16, 4, 3, 1 for the ladder, for the compressed circuit and first-, second-and fourth-order Trotter circuits, respectively.