Quantum dynamics in transverse-field Ising models from classical networks

The efficient representation of quantum many-body states with classical resources is a key challenge in quantum many-body theory. In this work we analytically construct classical networks for the description of the quantum dynamics in transverse-field Ising models that can be solved efficiently using Monte Carlo techniques. Our perturbative construction encodes time-evolved quantum states of spin-1/2 systems in a network of classical spins with local couplings and can be directly generalized to other spin systems and higher spins. Using this construction we compute the transient dynamics in one, two, and three dimensions including local observables, entanglement production, and Loschmidt amplitudes using Monte Carlo algorithms and demonstrate the accuracy of this approach by comparisons to exact results. We include a mapping to equivalent artificial neural networks, which were recently introduced to provide a universal structure for classical network wave functions.


Introduction
A key challenge in quantum many-body theory is the efficient representation of quantum many-body states using classical compute resources. The full information contained in such a many-body state in principle requires resources that grow exponentially with the number of degrees of freedom. Therefore, reliable schemes for the compression and efficient encoding of the essential information are vital for the numerical treatment of correlated systems with many degrees of freedom. This is of particular relevance for dynamics far from equilibrium, where large parts of the spectrum of the Hamiltonian play an important role.
For low-dimensional systems matrix product states [1,2] and more general tensor network states [3] constitute a powerful ansatz for the compressed representation of physically relevant many-body wave functions. These allow for the efficient computation of ground states and real time evolution. In high dimensions properties of quantum many-body systems in and out of equilibrium can be obtained by dynamical mean field theory [4][5][6][7], which yields exact results in infinite dimensions. This leaves a gap at intermediate dimensions, where exciting physics far from equilibrium has recently been observed experimentally [8][9][10][11][12][13].
An alternative approach, which received increased attention lately, is the representation of the wave function based on networks of classical degrees of freedom. Given the basis vectors | s = |s 1 ⊗ |s 2 ⊗ . . . ⊗ |s N of a many-body Hilbert space, where the s l label the local basis, the coefficients of the wave function |ψ are expressed as where H ( s) is an effective Hamilton function defining the classical network. Wave functions of this form were used in combination with Monte Carlo algorithms for variational ground state searches [14][15][16] and time evolution [17][18][19][20][21][22][23]. Recently, it was suggested that the wave function (1) can generally be encoded in an artificial neural network (ANN) trained to resemble the desired state [23]. This idea was seized in a series of subsequent works exploring the capabilities of this and related representations [24][25][26][27][28][29][30][31]. Importantly, there are no principled restrictions on dimensionality.
In this work we present a scheme to perturbatively derive analytical expressions for perturbative classical networks (pCNs) as representation of time-evolved wave functions for transverse-field Ising models (TFIMs) which can be extended directly also to other models. The resulting networks consist of the same number of classical spins as the corresponding quantum system and exhibit only local couplings making the encoding particularly efficient. We compute the transient dynamics of the TFIM in one, two, and three dimensions (d = 1, 2, 3) including local observables, correlation functions, entanglement production, and Loschmidt amplitudes. By comparing to exact solutions we demonstrate the accuracy of our results going well beyond standard perturbative approaches. This work provides a way to derive classical network structures within a constructive prescription, where other approaches rely on heuristics. As a specific application, we derive the structure and the time-dependent weights of equivalent ANNs in the sense of Ref. [23].  )). The black dots in the network structure represent a classical spin s l and its four neighbors in a translationally invariant square lattice. Each square with number n stands for a coupling of the connected classical spins with coupling constant C n (t). The green and blue lines, respectively, correspond nearest-neighbor and next-nearest-neighbor coupling of two spins, while the orange and red lines indicate coupling terms involving four spins each. The resulting time-dependent classical Hamiltonian function H ( s, t) encodes quantum dynamics via Eq. (1).

Results
In the following we compute dynamics of TFIMs of N spins with Hamiltonian where σ x/z i denote Pauli operators acting on site i and the first sum runs over neighboring lattice sites i and j. As the computational basis we choose the spin basis states | s = |s 1 . . . s N with s i =↑, ↓. The dynamics of Ising models are accessible experimentally with quantum simulators, which was demonstrated recently in various setups [32][33][34]. In d = 1 the dynamics of the TFIM can be computed analytically by means of a Jordan-Wigner transform [35][36][37][38][39][40][41][42][43][44].
In this work we are interested in the dynamics that comprise a dynamical quantum phase transition (DQPT) [45,46]. The signature of a DQPT is a non-analyticity in the many-body dynamics analogous to equilibrium phase transitions where thermodynamic quantities behave non-analytically as function of a control parameter. DQPTs were recently observed in experiment [11,34] and there is a series of results on TFIMs in this context [47][48][49][50][51][52][53][54][55][56][57].
Typically, DQPTs occur when the model is quenched across an underlying equilibrium quantum phase transition. A particularly insightful limit with this respect is a quench from h 0 = ∞ to h/J 1, where, e.g., universal behavior was proven in d = 1 [51]. When quenching from h 0 = ∞ to h = 0 the TFIM in d = 1, 2 exhibits DQPTs at odd multiples of t c = π/J, which we choose as the unit of time throughout the paper. The ground state at h 0 = ∞ is a particularly simple initial state, since s|ψ 0 = 2 −N/2 . One could, however, go away from that limit perturbatively, e.g., by constructing a Schrieffer-Wolff transformation for an initial state with weak spin couplings.
Quench dynamics of the two-dimensional TFIM have already been studied in Refs. [20,21], but there quenches within the same phase have been considered in contrast to the extreme quench across the phase boundary, which we will address in the following.

Classical network via cumulant expansion
Consider a Hamiltonian of the form H = H 0 + λV , where H 0 is diagonal in the spin basis, H 0 | s = E s | s , V an off-diagonal operator, and λ 1. In the interaction picture the time evolution operator can be expressed as . In this setting time-evolved coefficients of the wave function (1) can be obtained perturbatively by a cumulant expansion [58]. Denoting the initial state with |ψ 0 = s ψ 0 ( s)| s the cumulant expansion to lowest order yields the time-evolved state |ψ(t) = s ψ( s, t)| s with the expression above takes the desired form given in Eq. (1). Importantly, also the effective Hamilton function becomes local, whenever H 0 and V are local. It will be demonstrated below that the construction via cumulant expansion yields much more accurate results than conventional perturbation theory. The approximation can be systematically improved by taking into account higher order terms. To which extent it is possible to also capture long-time dynamics using such a construction, remains an open question and, since beyond the scope of the present work, will be left for future research.
For our purposes, we identify Note that, e.g., a strongly anisotropic XXZ model could be treated analogously. The time-dependent V (t) is obtained by solving the Heisenberg equation of motion. The general form of the Hamilton function from the first-order cumulant expansion obtained under these assumptions is where V l n denotes the set of possible combinations of n neighboring sites of lattice site l, z is the coordination number of the lattice, and C n (t) are time-dependent complex couplings. Classical Hamilton functions H (1) ( s, t) for cubic lattices in d = 1, 2, 3 including explicit expressions for the couplings C n (t) are given in Appendix A. Fig. 1 displays the structure of the pCN in 2D and the time evolution of the couplings C n (t). For d = 2, 3 H (1) ( s, t) already contains couplings with products of four or six spin variables, respectively. Thereby, the derived structure of the pCN markedly differs from heuristically motivated Jastrowtype wave functions, which constitute a common variational ansatz [17,20]. From our perturbative construction we find that it is already at lowest order important to take into account plaquette interactions of more than two spins in order to obtain accurate results.
The data we present in the following were obtained with h/J = 0.05. Results for larger h/J are presented in Appendix A. There we find that comparable accuracy is obtained for times ht 1. As we will show in the following the accuracy can be enhanced by including higher order contributions from the cumulant expansion. However, the resulting coupling parameters C n (t) comprise secular terms, which grow with increasing time. We anticipate that these secular terms restrict the time-window in which the couplings obtained from the cumulant expansions yield precise results to t < h −1 . Nevertheless, we expect that an effective resummation of secular contributions can be achieved by combining the perturbatively derived network structures with a time-dependent variational principle [17,[59][60][61].

Observables
Plugging Eq. (1) into the time-dependent expectation value of an observableÔ with matrix elements s|Ô| s = O s δ s, s results in andH ( s, t) = 2 Re[H ( s, t)]. In this form the quantum expectation value resembles a thermal expectation value in the pCN defined by H ( s, t). For an observableÔ that is diagonal in the spin basis, s|Ô| s = O s δ s, s , the expression above simplifies to These expressions can be evaluated efficiently by the Metropolis algorithm [62]. Although we find empirically that the off-diagonal observables under consideration can still be sampled efficiently by Monte Carlo, it is not clear whether a sign problem can appear in other cases. Fig. 2 shows results for different local observables obtained in this way. In these and the following figures the Monte Carlo error is less than the resolution of the plot. In Fig. 2(a,b) we compare the results from the classical network construction to exact results obtained by fermionization for the infinite system in d = 1 [35][36][37][38][39][40][41][42][43][44]. Focusing for the moment on the transverse magnetization σ x i in Fig. 2(a) we find that on short times the pCN gives an accurate description of the dynamics. Upon improving our pCN construction by including the second-order contributions in the cumulant expansion, the time scale up to which the pCN captures quantitatively the real-time evolution of σ x i increases suggesting that the expansion can be systematically improved by including higher order terms. For a further benchmarking of our results we also compare the pCN results to conventional first-order time-dependent perturbation theory. Clearly, the first-order pCN provides a much more accurate approximation to the exact dynamics, which originates in an effective resummation of an infinite subseries of terms appearing in conventional timedependent perturbation theory. In Fig. 2(b) we consider the nearest-neighbor longitudinal correlation function σ z i σ z i+1 which is an observable diagonal in the spin basis. Compared to the offdiagonal observable studied in Fig. 2a we find much stronger deviations from the exact result which also cannot be improved upon including higher orders in the cumulant expansion. However, for correlation functions at longer distances the corrections to the first-order cumulant expansion become important; see Appendix A. The observation that the diagonal observables don't improve with the order of the pCN expansion we attribute to secular terms from resonant processes which are not appropriately captured by perturbative approaches such as the pCN. One possible strategy to incorporate such resonant processes is to impose a time-dependent variational principle [17,[59][60][61] on our networks in order to obtain suitably optimized coupling coefficients. Having demonstrated under which circumstances the pCN can be improved by including higher order contributions, for the remainder of the article we focus on the capabilities of the first-order pCN leaving further optimization strategies of the network open for the future.
In Fig. 2(c,d) we show our results for the same observables but now in d = 2 and d = 3. Compared to d = 1 we find much broader maxima and minima, respectively, close to the times where DQPTs occur at odd multiples of t c = π/J. In the limit h/J → 0 the shape is given by the power law |t − t c | z with z = 2d. This behavior was already observed for one and two dimensional systems in Ref. [51]. For the d = 2 case we have included also exact diagonalization data for a 4 × 4 lattice. Overall, we observe a similar accuracy in the dynamics of these observables as compared to the d = 1 results.

Entanglement
Having discussed the capabilities of the pCN to encode the necessary information for the dynamics of local observables and correlations, we would like to show now that it can also reproduce entanglement dynamics and thus the propagation of quantum information.
By sampling all correlation functions it is in principle possible to construct the re-duced density matrix of a subsystem A, ρ A (t) = tr B |ψ(t) ψ(t)| , where tr B denotes the trace over the complement of A, and the entanglement entropy of subsystem A given by S(t) = −tr ρ A (t) ln ρ A (t) . For subsystems with two spins at sites i and j we have This approach is in principle applicable to arbitrary subsystem sizes; however, it quickly becomes unfeasible, because the number of correlation functions that has to be sampled grows exponentially with subsystem size. In order to obtain insights into the entanglement properties of larger subsystems it might be possible to use the algorithm introduced in Ref. [63] for quantum Monte Carlo, which, however, is beyond the scope of this work. For small system sizes entanglement entropy for any block size can be extracted directly from the full wave function as described below. Figure 3(a) shows the entanglement entropy S 2 (t) of two neighboring spins. We find very good agreement of the Monte Carlo data based on the first-order cumulant expansion with the exact results. In particular, for the entanglement entropy the classical network captures both the decay of the maxima close to the critical times (2n+1)t c and the increase of the minima. As for the observables the shape in the vicinity of the maxima depends on d and is for h/J → 0 given by the same power laws. Note, that the pCN correctly captures the maximal possible entanglement S max 2 = 2 ln 2. By contrast, the result from tdPT completely misses the decay of the oscillations.
In order to assess the capability of the pCN to capture the entanglement dynamics of larger subsystems we compute the whole wave function |ψ(t) = s ψ( s)| s with the coefficients ψ( s) as given in Eq. (3) for feasible system sizes. The entanglement entropy of arbitrary bipartitions is then obtained by a Schmidt decomposition. Fig. 3(b) shows entanglement entropies obtained in this way for subsystems of different sizes n in d = 1, 2. The results imply that at these short times only spins at the surface of the subsystem become entangled with the rest of the system. The maxima for a subsystem of n = 8 spins in a ring of N = 20 spins in d = 1 lie close to 2 ln 2, the theoretical maximum for the entanglement entropy of the two spins, which sit at the surface. This interpretation is supported by the results for a torus of N = 6×3 spins with subsystems of size n = 3×2 and n = 3 × 3. In that case the entanglement entropy reaches maxima of 6 ln 2, corresponding to 6 spins at the boundary. In both cases the results agree well with the exact results for times t < 4t c . This again reflects the fact that the pCN from first-order cumulant expansion yields a good approximation of the dynamics of neighboring spins.

Loschmidt amplitude
Next, we aim to show that not only local but also global properties are well-captured by the classical networks. For that purpose we study the Loschmidt amplitude ψ 0 |ψ(t) , which constitutes the central quantity for the anticipated DQPTs and which has been measured recently experimentally in different contexts [34,64]. For a quench from h 0 = ∞ to h = 0 the Loschmidt amplitude resembles the partition sum of a classical network with imaginary temperature β = −it [51]. This expression is not suited for MC sampling because all weights lie on the unit circle in the complex plane rendering importance sampling impractical and indicating a severe sign problem. These issues can be diminished by constructing an equivalent network with real weights. After integrating out every second spin on the sublattice Λ, equivalent to one decimation step [65], the partition sum takes the form The explicit expressions for d = 1, 2, 3 are given in Appendix B. It is evident from Eq. (9) that, although real, the Boltzmann weights of the classical network are not necessarily positive. Note that the absence of imaginary parts in the weights is due to the particular form of the Hamiltonian. For example, a nonvanishing transverse field would introduce imaginary parts and thereby complicate efficient Monte Carlo sampling. The bottom panels in Fig. 4 show the real parts of the coupling constants of the effective Hamiltonians for d = 1, 3. The couplings in d = 3 acquire non-vanishing imaginary parts for t c /3 ≤ t ≤ 5t c /3 leading to negative weights for some configurations. The partition sum is then split into a positive and a negative part Z(t) = Z + (t) + Z − (t) with Z + > 0 and Z − < 0. It was pointed out in Ref. [67] that the partition sum of such a factorized configuration space can be sampled despite the occurence of negative weights if the partial sums Z ± can be sampled separately. In practice we perform separate Monte Carlo sampling on the respective configuration subspaces by prohibiting updates that change the sign of the weight. We combine this approach with parallel tempering [68] and multi-histogram reweighting [69] in order to render the sampling efficient and, moreover, to achieve the correct normalization. The proper normalization is crucial because Z(t) is a quantum mechanical overlap. A more detailed description of the Monte Carlo scheme is given in Appendix B.
As the Loschmidt amplitude is exponentially suppressed with increasing system size we study the rate function [45] λ N (t) = − 1 N ln |Z(t)|, which is well defined in the thermodynamic limit N → ∞. The top panel in Fig. 4(a) displays λ N (t) obtained by a Monte

Construction of equivalent ANNs
Finally, we present an exact mapping of the pCN obtained by a cumulant expansion to an equivalent ANN as introduced in Ref. [23]. This outlines the general potential of the pCN to guide the choice of network structures, for which otherwise no generic principle exists. Since the mapping is exact, observables sampled from the resulting network will be identical with the ones obtained from the pCN. Generally, for Ising systems with translational invariance and local interactions, the cumulant expansion will yield a Hamilton function of the form where the functions P l ( s, t) only involve a couple of spins in the neighborhood of spin l. We call the spins involved in P l ( s, t) a patch. The P l ( s, t) are invariant under Z 2 and a number of permutations of the spins in a patch due to the lattice symmetries. In terms of the P l ( s, t) the coefficients of the wave function are given by To find the corresponding ANN we choose a general Z 2 symmetric ansatz [23] ψ AN N ( s, t) = Ω 2 α In order to determine the ANN weights we factor-wise equate the r.h.s. of Eq. (12) and Eq. (14), and plug in each of the distinct spin configurations of a patch. This yields a set of equations for the unknown weigths W (n) lm , which can be solved numerically. In Appendix C procedure is outlined in detail for d = 1 and d = 2. Fig. 5 shows the structure of the ANNs and the time-dependence of the weights obtained in this way for d = 1 and d = 2. In d = 1 the ANN structure ( Fig. 5(a)) comprises the minimal number of hidden spins that is possible subject to the lattice symmetries. Although unproven the same is expected to hold for the structure for d = 2 in Fig. 5(c). Note the complex dynamics and the rapid initial change exhibited by some of the couplings. In comparison to a general all-to-all ansatz this construction provides a way to drastically reduce the number of ANN couplings in a controlled way, thereby restricting the variational subspace and lessening the computational cost for the optimization in variational algorithms.

Conclusions
In this work we introduced a perturbative approach based on a cumulant expansion that constitutes a constructive prescription to derive classical networks encoding the timeevolved wave function. The resulting pCNs are equivalent to corresponding ANNs, which were recently proposed as efficient representation of many-body states in Ref. [23]. For the quench parameters under consideration the pCNs give a good approximation of the initial dynamics and thereby provide a controlled benchmark for new algorithms targeting the dynamics in higher dimensions. In future work it is worth to explore whether the structure of the networks derived in this way constitutes a good ansatz for numerical time evolution based on a variational principle also in the absence of a small parameter [17,[59][60][61]. We expect that a variational time evolution based on the derived network structures could effectively perform the resummation of higher orders that would be necessary to overcome the problem of secular terms in the perturbative results. Moreover, the presented approach can be straightforwardly generalized to other systems and higher spin degrees of freedom. This might be particularly interesting in many-body-localized systems [9,[71][72][73][74], where the so-called local integrals of motion provide a natural basis for constructing a classical network.
In 1D the Heisenberg EOM for σ x l (t) yields The cumulant expansion to first-order results in classical Hamilton functions of the general form where V l n denotes the set of possible combinations of n neighboring sites of lattice site l, z is the coordination number of the lattice, and C n (t) are time-dependent complex couplings.
In d = 1 the explicit form is Analogously for d = 2, where C (1) 6Jt + 8 sin(Jt) + sin(2Jt) 16 , C The classical network from first-order cumulant expansion in d = 3 is given by with C (1) 30Jt + 45 sin(Jt) + 9 sin(2Jt) + sin(3Jt) 96 , A.2 Range of applicability and effect of higher order terms Fig. 6 shows the time evolution of transverse magnetization and nearest-neighbor spinspin correlation obtained from the first-order cumulant expansion for different h/J. We find that for ht < 1 the results from the cumulant expansion agree with the exact results to a similar extent independent of the value of h/J. For ht > 1 the cumulant expansion deviates strongly from the exact results.
To second order in the cumulant expansion the wave function coefficients are approximated by In one dimension this yields the effective Hamilton function of the general form C n 1 n 2 (t) where V dl n denotes the set of all groups of n spins at distance d from spin l. The coupling constants are We observe that taking into account the second order contribution of the cumulant expansion significantly enhances the result for the next-nearest-neighbor correlation function as shown in Fig. 7. In particular it yields corrections that are much larger than what one would expect from a naive perturbative expansion.

A.3 Comparison: Complexity of the equivalent iMPS
In order to give an estimate of the complexity of the time-evolved state in terms of Matrix Product States we show the time evolution of local observables, entanglement, and bond dimension after the quench h 0 = ∞ → h = J/20 computed using iTEBD [77] in Fig. 8. The bond dimension χ (i.e. the number of singular values kept after singular value decompositions) was restricted to different maximal values χ max and during the simulation Schmidt values smaller than 10 −10 were discarded. In all quantities a converged result on the time interval of interest is obtained with a maximal bond dimension of χ max ≥ 4.
For the implementation of the iTEBD algorithm the iTensor library [76] was used.

B Loschmidt amplitude as classical partition function B.1 Real weights from decimation RG
As outlined in the results section the Loschmidt amplitude (8) after integrating out every second spin, residing on sublattice Λ, can be integrated out, yielding Equating each factor in the expression above with the corresponding factor in Eq. (27) for every configuration of the involved spins yields a system of equations that determines the couplings C n (t) [65].
The couplings in d = 2 are The time evolution of these couplings is displayed in Fig. 9.

B.2 Monte-Carlo scheme for the Loschmidt amplitude
In order to evaluate the Loschmidt amplitude given in terms of the renormalized Boltzmann weights (28)  by critical slowing down close to the critical times and the presence of negative weights leads to a sign problem. The idea to deal with these issues is to sample for a given Hamilton function H ( s, t) the energy histograms P ± (E) = Ω ± (E)e E where the density of states Ω ± (E) is the number of configurations s with energy E = ReH ( s, t). The sign index indicates the sign of the corresponding Boltzmann weight. Given a good estimate of these histograms the partition sum is simply Note, however, that the histograms P ± (E) must be properly normalized in order to get the correct result for Z(t). In order to obtain a good estimate of the normalized histogram we combine the following techniques: 1. Separate sampling of factor graphs. In order to overcome the sign problem the configuration space X = {±1} N is separated into X + = { s|e H ( s,t) > 0} and X − = { s|e H ( s,t) < 0}; N is the number renormalized spins. Then the partition sum is split as The partition sums Z ± can be sampled separately as described in Ref. [67].
2. Importance sampling. When sampling the energy E in an importance sampling scheme with weights e E the relative frequency of samples with energy E is proportional to P ± (E) = Ω ± (E)e E . Therefore, a histogram of the energies sampled with Metropolis Monte Carlo updates yields the desired histograms up to normalization. Moreover, the importance sampling allows to choose the region in the energy spectrum that is sampled by introducing an artificial temperature as described next.
3. Parallel tempering. Parallel tempering [68] is a method to improve the sampling efficiency in strongly peaked multi-modal distributions, which occurs in our case close to the critical times. The idea of parallel tempering is to perform a Markov Chain Monte-Carlo (MCMC) sampling on several copies of a system at different temperatures. During the sampling the system configurations are not only updated as usual but also configuration swaps between adjacent temperatures are possible. Thereby a MCMC on the temperatures is performed allowing the system to jump between different peaks of the distribution.
In the present case a distribution with weights w( s, t) = e H ( s,t) shall be sampled.
Introducing an artificial temperature β yields weights At β = 1 the sampling is inefficient due to the diverging renormalized weights of the Hamilton function (see bottom panels in Fig. 4). This problem is attenuated if we sample with a parallel tempering scheme with temperatures 1 = β 1 > β 2 > . . . > β N . Moreover, parallel tempering is beneficial, because histograms P β ± (E) = Ω ± (E)e βE are obtained as a byproduct, which capture different regions of the spectrum with high precision. This can be used to obtain decent precision over the whole range of energies and thereby a properly normalized histogram as described next.

4.
Multiple histogram reweighting. In order to get a good histogram for P ± (E) in the whole energy range the fact that can be expoited. In the multiple histogram reweighting procedure [69] the histograms obtained at the different temperatures are combined to yield a histogram covering the whole energy range. This allows us to normalize the histogram at β = 0, where B.3 Simplification of effective systems close to t c For times t close to the critical time t c the effective classical networks can be simplified, because some of the couplings become very small, as evident from Fig. 4 and also Fig. 9, and the Hamilton functions dominated by the divergent contributions. This simplification can be exploited for additional insights into the behavior of the Loschmidt amplitude close to the critical time. In the following we will discuss the case d = 2, but the arguments hold similarly for d = 3.
Dropping contributions to the couplings that vanish at t c the partition sum close to t c can be approximated by with an effective temperature β(t) = − ln cos(Jt/2) /2, the number of remaining spins N = N/2, σ s = ±1 the sign of the weight of the configuration s, and The minimal energy of the network defined byH ( s) is obviously reached when the condition is fulfilled on each plaquette. This is possible in systems where the edge lengths of the system, N x and N y , are both even, to which we restrict the following discussion. To obtain a "ground state" it is sufficient to fix the spin configuration in one row and in one column. The state of the remaining spins is then determined by the condition (39). Hence, the ground state is 2 N x +N y −1 -fold degenerate. From Eq. (27) we know that the sign of the corresponding Boltzmann weight is determined by the number of plaquettes with |s i,j + s i+1,j + s i,j+1 + s i+1,j+1 | = 4. If there is an even number of plaquettes with this property, the configuration has a positive Boltzmann weight, otherwise it is negative. We find that for even edge lengths the ground states always have positive Boltzmann weights.
Let us now introduce the density of states Ω ± (E), i.e. the number of spin configurations s with the same real part of the energy E = H ( s, t) and sgn e H ( s,t) = ±1, in order to rewrite the sum over configurations in Eq. (37) as a sum over energies, From the above analysis of the ground state we know that Ω + (0) = 2 N x +N y −1 . In the limit t → t c , or equivalently β → ∞, this is the only contribution that does not vanish in the sum. Therefore, Z(t c ) = 2 N x +N y −1−N and which determines the value of the rate function at t c in the thermodynamic limit and the finite size correction. We would like to remark that classical spin systems of the form (38) were studied in the literature and can be solved analytically for real temperatures [78,79]. We found, however, that introducing a sign into the partition sum renders the analytical summation impossible.
C Exemplary derivation of ANN couplings from the cumulant expansion C.1 d = 1 From the cumulant expansion (18) we have P l ( s, t) = C 0 (t) + C 1 (t)s l (s l−1 + s l+1 ) + C 2 (t)s l−1 s l+1 , i.e. ψ( s) = l exp C 0 (t) + C 1 (t)s l (s l−1 + s l+1 ) A patch consists of three consecutive spins and swapping the two spins at the border leaves the weight unchanged. A possible ansatz for the ANN with one hidden spin per lattice site (see Fig. 5(a) of the main text), that respects the symmetries, is where Ω constitutes a overall normalization and phase that is irrelevant when expectation values are computed with the Metropolis algorithm. Integrating out the hidden spins yields ψ( s) = l Ω cosh W 1 (s l−1 + s l+1 ) + W 2 s l Identifying the single factors yields for the different possible spin configurations (in the following we abbreviate cosh by ch) ↑↑↑: Ω ch(2W 1 + W 2 ) = exp(C 0 + 2C 1 + C 2 ) ↑↑↓: Ω ch(W 2 ) = exp(C 0 − C 2 ) All other spin configurations are connected to these via Z 2 symmetry. This is an implicit equation for the ANN weights that can be solved numerically. One solution for the weights obtained from the 1st order cumulant expansion is plotted in Fig. 5(b) of the main text. Note that these equations have different possible solutions.

C.2 d = 2
From the cumulant expansion (20) we have P l ( s, t) = C A patch consists of a central spin s i,j and four neighboring spins as depicted by the black dots in Fig. 4a in the main text. Any permutation of the surrounding spins leaves P l ( s, t) unchanged. A possible ansatz for the ANN with five hidden spins per lattice site is depicted in Fig.  5(c) of the main text. After integrating out the hidden spins the wave function is given by 2 (s i,j+1 + s i,j−1 + s i+1,j + s i−1,j ) × ch W where the leftmost arrow in the spin configurations corresponds to the central spin of the patch. One solution for the weights obtained from the 1st order cumulant expansion is plotted in Fig. 5(d) of the main text.