Neural Network Quantum States Analysis of the Shastry-Sutherland Model

,


Introduction
The neural network quantum states (NQSs) [1][2][3][4][5][6][7][8][9][10] have recently emerged as a promising alternative to common trial states in variational Monte Carlo (VMC) studies of quantum many-body problems, especially lattice spin models.This research is driven by the fact that neural networks (NNs) are universal function approximators [11] as well as by the astonishing progress in the field of machine learning (ML) in general.These advancements already led to a number of effective ML applications suitable for the basic research of quantum systems and technologies [12][13][14][15].For example, even simple NQSs, such as the restricted Boltzmann machine (RBM), allow us to investigate the ground-state properties of various quantum spin models.It was already shown that RBM can outperform standard trial states in the variational search of the ground-state energies of the antiferromagnetic Heisenberg model [1].Very promising results have also been obtained for frustrated spin systems, such as the J 1 − J 2 model [16][17][18][19].
Here NQSs can be trained to capture the nontrivial sign structure of the ground state and in some cases have even achieved state-of-the-art accuracy [20] that delivers cutting edge results.Nevertheless, two-dimensional frustrated quantum spin models continue to be a challenge for NQSs as well as for other methods [21].For example, it is not clear yet how to choose an optimal neural network architecture for a particular frustrated system, how important is the role of the trial state symmetries in the learning process, or if an NQS with favorable variational energy also encodes a physically correct state.Not all of these issues are specific to NQSs.Results of any VMC calculations are dictated to a large extent by the properties and limitations of the trial states used.An inappropriately chosen variational state, that is, one with a small overlap with the ground state, can still give a good estimate of the ground-state energy [22].If some additional information is known about the ground state, e.g., its symmetries, one can pick a more restrictive variational state function.However, this is often not an optimal strategy if the goal is to find new phases or to locate a phase boundary.In principle, NQSs could be a remedy for such problems.It is reasonable to expect that a single, but expressive enough, NQS can be used to approximate distinct phases.This assumption is supported by the results of Sharir et al. [23] who showed that NQSs can have even higher expressive power than matrix product states [24] and projected entangled pair states [25] as these can be efficiently mapped to a subset of NQSs.In other words, NQSs can be effectively utilized to a larger class of quantum states than these powerful formalisms which are known primarily from their usage in Density Matrix Renormalization Group (DMRG) but are also utilized as variational states in VMC [1,22,26].
In practice, it is not yet clear how to achieve this in a general case.Despite tremendous progress, the research of frustrated quantum spin magnets is still in the stage of testing and developing NQS architectures for simple models, often focusing primarily on reaching the best variational energy in particular regimes [6,16,17,19].In the present work, we aim for a different target.We want to demonstrate that even shallow NQSs can be sufficient for the investigation of qualitatively different ground-state orderings including states forming only in a finite magnetic field.To this goal, we focus on the ground state of antiferromagnetic Heisenberg Hamiltonian on Shastry-Sutherland lattice known as the Shastry-Sutherland model (SSM) which we introduce in more detail in Sect. 2. To our knowledge, this model of frustrated quantum spin system has not been previously addressed within the NQS context, yet it seems to be an ideal testbed for our purposes.
SSM was already investigated by a number of methods, including exact diagonalization (ED) techniques [27][28][29][30][31], quantum Monte Carlo [32], various versions of DMRG [33][34][35][36][37][38], perturbation theory [39][40][41][42] and even quantum annealing [43].These studies have shown that SSM has a rich ground-state phase diagram.In a zero magnetic field these include regions such as singlet spin dimer phase, antiferromagnetic Néel state, spin plaquette singlet phase and probably other phases.The introduction of a finite magnetic field further complicates the picture.Consequently, it is challenging to find a single variational function that can correctly approximate the whole ground-state phase diagram.
In addition, there are still open questions related to the ground-state phase diagram in zero as well as in the finite magnetic field, even in some experimentally relevant regimes of the model.This is important because several magnetic materials have a structure topologically equivalent to SSM.The most notable examples are SrCu 2 (BO 3 ) 2 , BaNd 2 ZnO 5 and rare earth tetraborides RB 4 (R − − Dy, Er, Tm, Tb, Ho) [44][45][46][47][48].All exhibit an intriguing step-like dependence of the overall magnetization on the external magnetic field, which has been found to be inherent to SSM [49,50].Here, each plateau reflects a stable nontrivial spin ordering.The magnetic behavior of these materials is not yet fully understood.This together with other open problems, e.g., the prospect of a narrow spin liquid phase in a zero magnetic field, further motivates the investigation of SSM and its generalizations [42,[51][52][53].
Therefore, SSM presents a model system that has the right combination of properties that are well understood and can be used to benchmark various NQSs, and of open problems that can be potentially illuminated by these variational techniques.This includes the possibility to address the rather complex behavior of a system in relation to a changing magnetic field.
The present work consists of two main parts.In the first one we explore SSM by employing a number of NQS architectures and we test them against ED results for small lattices in zero magnetic field.Here the primarily goal is to find one or few networks that are able to capture the main well-understood ground-state orderings of SSM.Simultaneously, we require these NQSs to have a high chance to describe the magnetization plateaus as well.This means that the ideal network has to give a solid approximation of the ground-state orderings even when no conditions on the total magnetization are imposed.Consequently, we do not focus on getting the best possible variational energy for a particular set of parameters.Rather, we require a good approximation of the energy in distinct regimes of the model, a correct description of the particular orderings, and reasonable computational complexity that allows the usage of the NQS on larger lattices.We argue that when precision, generality, and computational costs are taken into account, a shallow RBM with complex parameters is still a good choice.
In the second part, we introduce a refined learning protocol for RBM NQS and test it for a wide range of model parameters and different network sizes.We then utilize this protocol in the study of larger systems.We first investigate the zero magnetic field scenario and demon- strate that RBM is expressive enough to capture all main phases of the system.We then move to the model in a finite magnetic field and show that, with the right learning strategy, RBM is able to capture the magnetization plateaus crucial for the description of real materials.This opens a possibility that NQSs could be used to investigate several open problems, such as the existence of still opaque spin-liquid phase and other orderings predicted but not yet confirmed in SSM.

Shastry-Sutherland model
SSM is described by the Hamiltonian where Ŝi = 1 2 σi is the spin-1/2 operator at the i-th site with σi being the vector of Pauli matrices.The first term represents the exchange coupling between the nearest neighbors on a square lattice (solid lines in Fig. 1).The second term is a sum over specific diagonal bonds arranged in a checkerboard pattern (dashed lines in Fig. 1).Note that these sums are interpreted in terms of nodes, i.e., there is no double counting.Both coupling constants are antiferromagnetic (J, J ′ > 0) and we set J ′ as the unit of energy in the whole paper.The last term describes the influence of the external magnetic field h pointing to the z-direction.

Basic properties of the ground state
The basic structure of the SSM ground state phase diagram is well understood.As illustrated in Fig. 2, the SSM at h = 0 has at least three distinct ground-state orderings.These are the dimer singlet (DS) state for (J ′ ≫ J), the Néel antiferromagnetic (AF) ordering (J ′ ≪ J) and the plaquette singlet (PS) state in between.The phase transition from the DS to the PS state is of the first order [29], but the nature of the transition from PS to AF is still in debate.The ED study of Nakano and Sakai [30] suggests that the supposed PS phase actually consists of at least two distinct phases.In addition, some recent studies argue that there is a so-called deconfined quantum critical point (DQCP), which separates a line of first-order transitions or, potentially, a narrow gapless spin liquid (SL) phase [37,38,54].[38].There is a first-order transition at J/J ′ ≈ 0.675 between DS and PS phases.The gray squares in PS depict the plaquette singlets.The nature of the transition between the PS and AF phases remains unresolved.It is not clear whether there is a narrow spin liquid phase, a DQCP or just a second order transition in the region labeled with a question mark.
Nevertheless, even without focusing on the possible DQCP and SL phase, the three main orderings, namely DS, PS, and AF, already pose a sufficient challenge for a single variational state because of their distinctive character and symmetries.
The DS phase is formed by an exactly (analytically) accessible state [55].Numerous analytical and numerical methods have verified that it remains the ground state up to J/J ′ ≈ 0.675 [29,30,37].In the limiting case of J ≪ J ′ , the system is equivalent to an ensemble of independent spin dimers, each of which forms a singlet ground state.The DS ground state is thus a direct product of dimer singlet states As such, it is antisymmetric with respect to the exchange of two intradimer spins and symmetric with respect to transformations rearranging only the spin pairs without swapping the intradimer spins.The energy of the ground state of the dimer is where N D is the number of dimers and N D = N /2 for lattice with periodic boundary conditions.
The PS phase can be understood as weakly coupled plaquette singlet states illustrated in Fig. 2. Plaquette singlet is a ground state of an isolated 4-spin Heisenberg cluster with four bonds arranged in a cycle [29].The pattern of the plaquette singlets in Fig. 2 indicates that the PS state is two-fold degenerate.
It is important to stress again that the relevant range J/J ′ discussed here (0.675 ≲ J/J ′ ≲ 0.82) could be much more complex.As mentioned above, it has been argued that at J/J ′ ≈ 0.70 the PS phase splits into two distinct regions with quantitatively different behaviors [30,37,38,54].For the sake of simplicity, we omit this possibility in most of our discussion.Nevertheless, this might be important for more detailed future studies.
Figure 3: A simplified illustration of magnetization as a function of the external magnetic field h and coupling constant J inspired by Ref. [39].A more detailed illustration would contain additional steps (e.g., supersolid phase); however, their actual position and width are not clear yet.Singlet and triplet arrangements are displayed for some of the plateaus (namely, m z = 1, 1/2, 1/3 and 1/4 plateaus are shown).
The AF phase stabilizes when J/J ′ ≳ 0.82.When J ′ becomes negligible, the ground state of SSM is approaching the ground state of the antiferromagnetic Heisenberg model with only nearest-neighbor bonds on a square lattice.Although this state is not analytically accessible, it has previously been explored by Monte Carlo (MC) simulations [27].Using the first-order correction to these quantum MC results, the energy of the SSM in the AF phase was estimated [27] to be where N is assumed to be large.
A more detailed discussion of the symmetries of these three states is postponed to the Appendix C. Note that the three main phases DS, PS, and AF are reasonably understood, and simultaneously, they differ qualitatively.This is one of several qualities of the model that make the SSM a suitable testbed for NQSs.
So far we have discussed the h = 0 case.When we introduce a finite magnetic field to the DS phase in Eq. ( 2), some dimers can morph into triplet states.These triplets are formed in repeating patterns, e.g., checkerboard, stripes, or more complex configurations (for illustration, see Fig. 3), giving rise to stable plateaus of constant magnetization in increasing magnetic field.
Because each plateau signals a distinct stable ordering, it also presents a challenge for the NQSs.Particularly so because a finite magnetic field does not allow for a simple restriction of the Hilbert space to its zero magnetization part.This restriction was heavily utilized in previous NQS investigations of quantum spin models.Note that it is mostly these plateaus that make SSM interesting experimentally.Good examples are SrCu 2 (BO 3 ) 2 , BaNd 2 ZnO 5 , CaCo 2 Al 8 and rare-earth tetraborides RB 4 (R − − Dy, Er, Tm, Tb, Ho)) [44][45][46][47][48] which all exhibit the intriguing step-like dependence of the overall magnetization on the external magnetic field or show magnetic frustration and can be modeled by SSM or its generalizations.
as is typical in NQS studies [56].The variational energy in Eq. ( 5) is, in the jargon of ML, a loss function.Using this loss function, the parameters θ are optimized to obtain the lowest energy state that the chosen variational function can represent.In our calculations, we use the VMC implementation from the NetKet NQS toolbox [9,56].
In general, the form of the trial wave function ψ θ (σ z ) restricts the optimization process to a subset of the Hilbert space.An improper choice of the ansatz can bias the approximation towards a wrong phase or even can make the approach to the correct state impossible.Clearly, this is where one can expect that NQSs could outperform standard variational states due to their high expressiveness.

Neural network quantum states
Here, we explore several NQS architectures [6,9].We chose these particular networks due to their successful application in previous studies of other Heisenberg models.
Restricted Boltzmann machine (RBM) is a generative artificial NN constituted of a visible layer with N nodes (one for each lattice site) fully connected with a single hidden layer with M = αN nodes (hidden degrees of freedom) where α is the hidden layer density [1].It can be used to define an NQS where the vector θ contains the variation network parameters θ = {a, b, W}.This NQS can be interpreted as a one-layered fully-connected neural network with log cosh activation function followed by a summation of the outputs and additional summation of visible biases [1].Note that complex-valued parameters are necessary in order to represent generally complex-valued wave function outputs.The size of the visible layer N is fixed by the size of the investigated spin system.However, the expressive power of RBM can be modified by changing α.The number of variation parameters of RBM is O(αN 2 ).

Modulus-phase split real-valued RBM (rRBM):
Complex parameters, which generally make the learning process harder, can be avoided by introducing two independent real-valued NNs [18,57] to represent the modulus A(σ z ) and the phase Φ(σ z ) of the wave function separately log ψ θ (σ z ) = A(σ z ) + iΦ(σ z ) .( 8) Unlike in the Ref. [57] where rRBM architecture proved to be advantageous in the investigation of transverse-field Ising model, we have experienced that for SSM, the rRBM shows worse results than complex-valued RBM.This is in accord with the recent study of other frustrated systems, namely the J 1 − J 2 model [19].Consequently, we discuss the results of this network only briefly in Chapter 4.1 and focus predominately on complex-valued architectures.

Symmetric variant of RBM (sRBM):
Carleo and Troyer [1] used translational symmetries to reduce the number of variational parameters in RBM.They replaced the fully connected layer with a convolutional layer and set the visible biases to the constant value a f across each convolutional filter f .The resulting expression for its output is Here T g denotes a symmetry transformation of a spin configuration according to an element g from the symmetry group G of order |G|.The index f denotes different feature filters.The number of these filters F determines the size of the network M = F |G|.The resulting sRBM has fewer variational parameters than the RBM by a factor of |G|.We can view this approach as binding the values of some of the O(αN 2 ) parameter making the total asymptotic number of parameters O (αN ).Carleo and Troyer [1] also showed that this approach significantly improves the convergence and accuracy of the ground states of the antiferromagnetic Heisenberg model on a square lattice.However, this approach suffers from two crucial disadvantages in more general circumstances.The first drawback is that visible biases are inherently constant for each filter f which significantly lowers the expressiveness of the network as discussed later in this section.As we show in Appendix B, the sRBM architecture cannot be modified to ease this condition while preserving symmetries.The second drawback is that sRBM is not applicable if the ground state does not transform under the trivial irreducible representation (irrep) of a given symmetry group.
To illustrate the problem, let us consider a single spin dimer (i.e., a single bond of SSM with J = 0, J ′ = 1 and h = 0).Its ground state is a singlet |ψ 0 〉 = (|↑↓〉 − |↑↓〉) / 2. The symmetry group of the single-dimer Hamiltonian contains just two operations -an identity and a swap of both spins G = {g 12 , g 21 }.If we apply the swap operation to the ground state, we obtain Tg 21 |ψ 0 〉 = (|↓↑〉 − |↓↑〉) / 2 = − |ψ 0 〉.Although this state is a multiple of the ground state, we see that it does not transform under the trivial irrep because one of its characters is χ g 21 = −1.Since sRBM represents only states with Tg |ψ〉 = |ψ〉; ∀g ∈ G, this symmetry should not be used in sRBM.Note that we do not strictly follow this rule and sometimes use all available lattice symmetries.The reason is that this leads to NQS with a small number of parameters that are easy to optimize.The resulting variational energy can then be compared with the energy obtained with RBM with the same α to check how well the full network is optimized, i.e., if it leads to lower energy than sRBM.If not, this signals that the variational energy of RBM can be lowered by better learning.
Projected RBM (pRBM): Recently, Nomura [58] introduced an alternative way to symmetrize RBM (or any other NN) using a quantum-number projection (also called incomplete symmetrization operator) where g is an element of the given symmetry group G and χ g is its character from the irrep in question.The wave function on the right-hand side may be arbitrary and it can be shown that the function on the left-hand side satisfies the desired transformation property Unfortunately, pRBM makes the learning process of NN much more expensive than sRBM.The computational time increases by a factor of |G| producing a computational cost O(αN 2 |G|).On the other hand, pRBM implementation does not suffer from the problems mentioned for sRBM and it can be generalized by setting mutually independent visible biases (see Appendix B).

Group-convolutional NN (GCNN):
Group equivariant convolutional NNs represent a promising class of NNs built inherently on symmetries.They were proposed by Cohen and Ni [59] as a natural extension of the well-known convolutional neural networks.While convolutional networks preserve invariance under translations, GCNN are equivariant under the action of an arbitrary group G (which may contain a subgroup of translations).Roth and MacDonald [60] further improved GCNNs so that they can transform under an arbitrary irreducible representation of G, which is more suitable for NQSs for SSM.GCNN can be composed of any number of hidden layers.The first and subsequent layers are given by where f is a nonlinear activation function (the output is typically a vector since GCNN can have multiple parallel feature filters) and f 1 g is a 1st-layer feature vector corresponding to group element g.The result of the last layer , where ( j) denotes the individual features of the layer, is then projected in a fashion similar to that of pRBM The main advantage over symmetrizing an arbitrary deep network by the formula form Eq. ( 10) is that we do not need to evaluate the forward pass of the nonsymmetric wave function |G| times.This is achieved because each layer of the GCNN fulfills equivariance.GCNN with K layers and a typical number of feature filters F in each layer has O(F N + K F 2 |G|) parameters.

Jastrow network:
As a baseline, we also use a Jastrow network based on the standard Jastrow ansatz [61,62] where the variational parameters θ = W i, j form a matrix of size N × N .The Jastrow ansatz is physically motivated by two-body interactions and assigns trainable parameters W i, j to pairwise spin correlations.The number of its parameters scales as O(N 2 ).
The complicated sign structure of the complex phases of the basis coefficients that form the ground-state wave function presents a major challenge in optimizing the parameters of a variational function of a frustrated spin system.In case of Heisenberg model on a bipartite lattice consisting of sublattices A and B (i.e., SSM with J ′ = 0), this can be solved using the Marshal sign rule (MSR) [63].The MSR states that the sign of ψ(σ z ) is given by (−1) where N ↑ A (σ z ) is the total number of up-spins on a sublattice A. Because this alternates with a spin-flip, it can be difficult for NN to learn the correct signs.However, it is possible to circumvent this problem in two analogous ways.
If the sign structure is dictated by MSR, the Hamiltonian can be gauge transformed by changing the signs of some terms to make all wave function coefficients positive in the transformed basis.In particular, we change σ x → −σ x and σ y → −σ y ; for ∀σ ∈ A. The same result can be also obtained by setting the visible biases to a i = iπ/2 for i ∈ A and a i = 0 for i ∈ B as this exactly reconstructs the Marshall sign factor (up to an overall constant factor).In other words, the biases can be set to play the role of a Marshall basis.What is important here is that in the general case the simple Marshall sign rule is not always applicable.Especially problematic are systems with strong frustration [18,19,64].The advantage of using the visible biases instead is that their setting does not have to be known ahead, as it can be, despite possible technical difficulties, learned.Therefore, it is beneficial to include visible biases whenever allowed by the architecture.An additional bonus is that free visible biases also allow one to overcome an improper initialization of weights.

Comparison of different NQSs architectures
It is too expensive to apply all NQSs introduced above to investigate the ground-state phase diagram of SSM at large lattices.Therefore, in the first part of our investigation, we benchmark these NQSs against the exact results on smaller lattices obtained by the Lanczos ED method.The aim is to identify a network that is both expressive enough to cover various phases and computationally tractable even for large lattices.We focus on a regular lattice with N = 4 × 4 = 16 points and an irregular lattice with N = 20 (see Appendix A).Throughout this paper, we apply periodic boundary conditions for all lattices used, unless explicitly stated otherwise.The irregular N = 20 is considered because N = 16 lattice has some undesirable properties, e.g., some extra symmetries with trivial irrep which favor symmetric networks.It also suffers from stronger finite-size effects and does not exhibit the PS phase.On the other hand, it is regular and easy to calculate.
We initially focus on the cases represented by J/J ′ = 0.2 (DS phase), J/J ′ = 0.9 (AF phase), and J/J ′ = 0.63.Case J/J ′ = 0.63 was chosen because it represents a realistic case, namely, it is the exchange parameter ratio for SrCu 2 (BO 3 ) 2 at ambient pressure [34].However, because its results are qualitatively in agreement with the case J/J ′ = 0.2, we discuss them together as the DS results.Note that we investigate the model with and without MSR.Since the goal here is to compare different networks, we estimate the accuracy of each architecture by comparing the average energy of the last 50 learning iterations E 50 with the exact result E ex .Note that this means that we are not using just the lowest obtained energies but also test stability of the learning method.Consequently, the value of E 50 is typically greater than zero even when the network is able to reproduce the state exactly.The same computational protocol is used for each architecture.In particular, we used 2000 MC samples 1 and 1000 training iterations for three values of fixed learning rates (0.2, 0.05, 0.01).Each particular combination of architecture and basis (MSR or direct) was computed four times for each learning rate (yielding 12 independent runs for each case of interest).This is to eliminate occasional events when NN gets stuck in a local energy minimum too far from the ground state.Zero magnetization was not implicitly assumed (i.e., we used local single-spin-flip Metropolis updates in VMC).We summarize our results in , where E i 50 is the average energy of the last 50 iterations of the i-th run.A number of variational parameters are also shown for each architecture.The difference between GCNN and GCNNt is that for GCNN we used all the symmetries and the correct characters for the expected ground state, whereas GCNNt utilized only the translation symmetry.The error 0.0 here means a relative error less than 10 −7 which we consider as a "numerical precision" due to the standard MC errors which are typically larger even for L = 16.
with RBM, one can see that networks with α = 2 (560 parameters for N = 16 and 860 for N = 20) and 8 (3380 parameters for N = 20) and 16 (4398 parameters for N = 16) show similar precision, where the significantly larger networks are notably better (approximately three times) only in the AF phase.For the general case, considering the computational costs, this favors the computationally less demanding network with α = 2. Also interesting is the comparison with the Jastrow network.Both architectures have comparable precision in the DS phase for N = 16, however, in the AF phase and in the DS phase for N = 20 with MSR, RBM is one or even two orders of magnitude more precise than the Jastrow ansatz.
For N = 16, the sRBM architecture demonstrates superior performance.The full automorphism group of the finite lattice has been used in its implementation.Despite the resulting small number of variational parameters, it shows excellent precision.In fact, a significant increase of α is not that advantageous (compare the cases α = 4 and α = 128).In the DS phase, the use of symmetries allowed sRBM to find the ground-state energies within the numerical precision (hence the zero error).Since sRBM can be thought of as RBM with additional constraints on the values of the weights, this already suggests that the learning protocol for RBM can be improved, which we demonstrate in the next section.However, it is important to stress that the excellent results are a consequence of the special symmetries of the N = 16 lattice.Both the DS and AF states transform under the trivial irreducible representation, and the automorphism group is therefore applicable without special treatment.This is not true for the DS ground state in different tiles, including regular ones such as N = 6 × 6 (for a more detailed discussion of the symmetries, see Appendix C).This is illustrated in the second part of Table 1 where sRBM with α = 4 gives very poor results in the DS phase of N = 20 due to the improper treatment of symmetries.In short, using symmetries in sRBM for states that do not transform under a trivial irrep can make the variational energy significantly worse than for simple RBM.For N = 20, sRBM also fails in the AF phase, but only when adopting a direct basis.This implies that sRBM has trouble learning the correct sign structure of the state for larger lattices, which can be attributed to the fixed visible biases.
The remaining architectures, namely pRBM and GCNN, show excellent accuracy for N =16.They clearly outperform all other networks in the AF phase.However, the results at N = 20 are less convincing, especially when one takes into account that these networks are more computationally demanding than RBM even for cases when RBM contains more parameters.Furthermore, the precision reached required the usage of correct symmetries of the expected state, i.e., the proper line form Table 2 in Appendix C. If one uses an improper one, i.e., if different state is expected, as illustrated by the last two lines in Table 1, the precision can drop by several orders of magnitude.Similarly, precision decreases significantly for both GCNN and pRBM when we use only the group of translations instead of the full symmetry group, as illustrated by GCNNt in Table 1.Note that for this case, the precision in the AF phase drops to the level of a simple RBM with α = 2.The network is much better in the DS phase, but in the following chapter, we will demonstrate that even RBM with α = 2 and modified learning protocol can reach the numerical precision in this phase.Although we cannot exclude that much better results could be obtained for the symmetrized pRBM and GCNN networks with a different learning protocol, considering their much higher computational demands and the necessity to identify a priori the correct irrep symmetries for each lattice type to make the learning efficient, the presented results favor RBM for the study of larger clusters.
The last question to be addressed here is whether using MSR would be beneficial.Table 1 shows several cases where MSR is favorable in the AF phase (e.g., for sRBM and N = 16), but this is not a general rule.In addition, its usage comes with a price as well.We have noticed that the MSR basis seems to strongly favor the AF ordering even for J/J ′ where PS is already the ground state in exact results.We will discuss this briefly when addressing larger lattices.
To wrap it up, in general, the usage of MSR basis does not lead to significantly better results.With some exceptions, the networks presented here are able to approximate the ground-state energy quite well even without MSR.Therefore, we will mostly omit the MSR from further discussion.Furthermore, if the symmetry of the ground state is known, it is worth using this information in building the NN.If not, then the usage of just translations does not lead to a significant improvement of the precision.Fortunately, the complex-valued RBM with visible biases can give a very good approximation of the ground-state energy without any restrictions.Its clear advantage is that no preliminary information about the ground-state properties is needed.As such, it is suitable for problems where the character of the ground state or position of the phase boundary is unknown.In addition, the precision of RBM for SSM can be significantly improved using a different learning strategy discussed in the following section.

Investigation of the ground-state phase diagrams
Focusing solely on RBM allowed us to test several learning strategies and employ more precise MC calculations.What follows is a description of the best learning protocol we have found, which we used to produce all the results discussed below.It proved to be beneficial to use more precise MC calculations already during training.We typically generate 4000-12000 MC samples at every sampling step.It was also more advantageous to run 10-30 independent learnings (with random initial variational parameters) with shorter learning times than to use few runs with a lot of learning iterations.We used approximately 2000 training iterations in each run.During learning, we have been lowering the learning rate η by several discrete steps.Typically, we started with η = 0.08 (≈200 iterations), then changed it to η = 0.04 (≈1600 iterations), followed by η = 0.02 (≈100 iterations), η = 0.01 (≈100 iterations) and η = 0.003 (≈50 iterations).The trained RBM was then used to calculate the expectation values of the energy and order parameters, introduced in the next section, where we used 12-60 thousand evaluation steps.Consequently, the Monte Carlo error bars in all the figures presented are negligible for small lattices.The relevant absolute error comes from the learning process or limitations of the NQS used.The state with the lowest energy (evaluated more precisely after training) of all independent runs was kept as the final result in the following discussion.Due to the stochastic fluctuations in the learned parameters, it was for some cases advantageous to refine the results by fine-tuning the final state multiple times with a high number of MC samples but a small number (5-10) of iterations and a small learning rate (η ≤ 0.001) keeping the result with the lowest energy.Moreover, transfer learning was employed in some problematic regimes, as described below.

Ground-state orderings
As already discussed, good agreement of the variational energy with the exact one does not guarantee that the variational state correctly captures the character of the exact ground state, i.e., that it reflects the correct phase.To examine this and with the aim to see if RBM NQS can correctly describe the transitions between the phases, we calculate the order parameters for the three main expected orderings.They are constructed to be large (close to one) whenever the state is in the respective phase and small in other domains.
In particular, we define the order parameter for the DS phase as which reflects the fact that operator Ŝ1 • Ŝ2 has for isolated dimer the expectation value − 3 4 (singlet state).Therefore, P DS is one in the DS phase and strictly lower in other phases.
For the PS order parameter, we use a definition based on order parameter from Ref. [38] where the order parameter is given by the difference Qr = 1 2 Pr + P−1 r , with Pr being the permutation operator.This operator performs a cyclic permutation of four spins on a plaquette (a square on the lattice without the diagonal bond J ′ ) at position r .Here, the first sum in Eq. ( 15) runs over the subset of squares A (see Fig. 1) and the second sum runs over the subset B. The meaning of this construction can be understood by looking at Fig. 2. Note that in the investigation of the plaquete ordering we utilized in addition to periodic boundary conditions (torus geometry) also a lattice with mixed ones.For periodic boundary conditions, we have N = N /4 as all squares are used.For mixed ones, we followed Ref. [38] and use regular lattices with open boundary conditions in the x-direction with L x = 2L and periodic in the y-direction with L y = L so that N = 2L 2 .However, the order parameter is calculated only in the central L × L square to mitigate the boundary effects.Hence, N = L 2 /4.The operator Qr gives a large mean value in the plaquette singlet (gray square) and a value close to zero in the empty square between four plaquette singlets.For periodic lattices, we do not know which set of squares will become singlets, as the state is degenerate, therefore, we use the absolute value.
For the AF phase we employ the standard structure factor where r i j denotes the difference in discrete coordinates of spin i and j, and we take q = (π, π) which measures the antiferromagnetic checkerboard ordering.Finally, in the case of finite magnetic field we use the normalized magnetization in the z-direction to identify the expected plateaus in the magnetization.These expectation values are calculated using VMC for trained RBM NQS.

Zero magnetic field
We first investigate the phases of SSM in a zero magnetic field.Unlike the procedure used to compare different network architectures, here we restrict the Hilbert space by the condition M = 0. Before moving to larger lattices, we test the RBM for N = 20 in a wide range of J/J ′ .We use the irregular lattice N = 20 because it shows an onset of the PS ordering (see the black dashed line in Fig. 4(a)) not present for smaller regular lattices.We also readdress the role of the parameter α within the new learning protocol, but start our discussion with the case α = 2.
As is clear from the comparison of the ground-state energies in panels Fig. 4(b) and Fig. 4(c), the RBM variational energy agrees very well with the ED.The updated learning protocol ensures that the relative error in the J/J ′ < 0.68 region, i.e., for the DS phase, is on the order of the numerical precision already for α = 2 despite not using any symmetries except for the condition M = 0.The largest error is in the vicinity of the expected first-order phase transition from the DS to PS phases, but only from the side of the expected PS phase.Nevertheless, even here, the largest observed relative error in energy was approximately 1% for α = 2.
Given the focus of our study, even more important than the energy error is the nature of optimized variational states.Panel (a) in Fig. 4 shows that a shallow network, i.e., RBM with complex parameters and α = 2 is expressive enough to correctly capture the formation of the distinct DS (blue diamonds) and AF ordering (red crosses), as well as the onset of the PS phase (black circles).The agreement is far from perfect, though.Consistent with the results for the energy, the largest differences in order parameter values between RBM and ED are in the right vicinity of the expected phase transition.Here an error of 1% and less in the estimation of the ground-state energy translates into an error of tens of percents in the order parameters.Still, even here the RBM gives a correct qualitative picture.The position of the abrupt change of phase matches the exact result and there is a clear onset of the PS ordering.With increasing J/J ′ , the RBM results align again with the exact ones.
This benchmark shows that RBM with α = 2 can easily capture the correct state in the DS phase, but gives worse results above the critical J/J ′ ≈ 0.68.What is not clear is if the relative errors in panel (c) represent some inherent limitation of the RBM with small α, e.g., a difficulty to set the correct sign structure of the frustrated state, or are related to the learning process.Gradually increasing α from 2 (blue circles) to 4 (red pluses), 8 (green pluses) and 16 (black  Here blue solid (ED), pure blue diamonds (RBM with α = 2) and blue diamonds with red edge (RBM with α = show the DS order parameter; black dashed line (ED), black circles (RBM with α = 2) and black circles with yellow edge (RBM with α = 16) show the PS order parameter; and red dot-dashed line (ED), red crosses (RBM with α = 2) and red crosses with blue edge (RBM with α = 16) show the AF order parameter.The results of symmetric variants of RBM are not shown, as they were comparable to the results presented for J/J ′ > 0.68 and well off the exact results for J/J ′ ≤ 0.68.(b) The exact (red line) and RBM α = 2, 16 ground-state energies.(c) Relative error in ground-state energy for the RBM with α = 2 (blue circles), 4 (red pluses), 8 (green crosses) and 16 (black-yellow diamonds).Note that the relative error in the DS phase for RBM α = 2 is at the level of numerical precision.diamonds with yellow cores) in the problematic region lowers the relative error in energy.However, this significant improvement in energy leads only to a small improvement for the order parameters near the critical point.This is shown in panel (a) where the results calculated with RBM with α = 16 are marked with the same symbols as for α = 2 but highlighted via differently colored edges.
Using symmetric NQS symmetries did not significantly improve the results.We have tested the sRBM architecture with α = 4 in direct as well as MSR basis using the same protocol as for RBM.The sRBM results have been comparable to RBM for J/J ′ > 0.68 and much worse than the RBM results below this critical value.This suggests that the issue is not entirely due to insufficient learning.On the other hand, the learning was the most difficult in the vicinity of the observed discontinuity.A significant fraction (often more than half) of the independent   AF (c) and variational energy (d) as a function of J/J ′ for h = 0 and various lattice sizes.All results in panels (a)-(d) have been obtained using RBM NQS with α = 2 and VMC with exchange updates (simultaneous flip of two opposite spins in the basis state) for the Hilbert subspace restricted to M = 0.The black dashed lines in panels (d) and (e) show the asymptotic energies for the DS (horizontal) and AF phase (tilted).The black crosses represent the results with N = 64 for which we have utilized transfer learning.The inset (e) shows the details of the variational energy for N = 64 in the vicinity of the phase transition calculated using RBM (green diamonds), sRBM in direct base (blue stars), sRBM with MSR (red empty diamonds) and three points calculated with RBM utilizing transfer learning (black crosses).The empty purple squares show the RBM results for N = 100, and the orange triangles are infinite DMRG results taken graphically from Ref. [37].runs for 0.69 ≤ J/J ′ ≤ 0.72 ended either in the wrong phase (DS) or even in a state with an energy much higher than the real ground state.This was not true for the rest of the J/J ′ interval, where most of the independent runs with the same α showed very similar variational energies.Furthermore, the relative errors for all investigated RBM variants (including those not presented here) follow the same pattern.They are maximal just above the critical point and then, if we neglect some noise, they monotonically decrease with increasing J/J ′ .Yet, increasing α significantly lowers the variational energy even for J/J ′ > 0.74.This again suggests that the problem is indeed small α.Ultimately, both statements seem to be correct.Significantly larger α than α = 16 is needed to capture the critical region together with highprecision learning, that is, many independent runs.
After testing the RBM on small lattices and understanding its strength and limitations, we can now approach larger ones.We focus on α = 2 as the increase in the precision of the variational energy obtained with larger α's does not significantly improve the estimates of the order parameters.Although we can not easily compare the VMC results with the exact diagonalization for larger lattices, we can use the exact asymptotic results for the energy in DS Eq. ( 3) and AF phase Eq. ( 4) to guide us.Fig. 5 shows the evolution of the order parameters and energy for N = 20, 36, 64 and N = 100.The results agree very well with the exact result in the assumed DS phase and are between the exact energy of N = 20 and the asymptotic energy for large N in the supposed AF phase up to several points in a very narrow region near the discontinuous phase transition discussed later.Fig. 5 illustrates the usability of RBM for larger clusters.The presented results support the overall picture of the DS and AF phase separated by a narrow PS or at least its indication.Nevertheless, a much more thorough finite-size analysis would be necessary to assess the phase boundaries.For example, P AF decreases with increasing system size in the whole relevant range of J ′ which is in agreement with previous studies, e.g.[38].Consequently, a careful and precise extrapolation of P AF to the thermodynamic limit is needed to identify J above which the AF ordering prevails.However, even in this respect, there is an issue.The point of the discontinuous phase transition from the DS phase to the PS phase should be J/J ′ ≃ 0.675, but our results at larger lattices push it to J/J ′ ≃ 0.7.Besides finite-size effects, this could also be related to two technical problems.The first is the difficulty of training the NQS in the vicinity of the discontinuous phase transition.The second is the tendency of the direct base to prefer DS over AF ordering.Both these issues can be seen in panel (e) (inset of panel (d)) with details of the N = 64 (and N = 100) results.Here, green diamonds show the RBM data, red empty diamonds are sRBM data with MSR basis, and blue stars are sRBM data for direct basis, all with α = 2 for N = 64.Clearly, all these networks show (different) problems around the expected point of the phase transition.For J/J ′ = 0.7 and 0.72 sRBM with MSR gives energy lower than RBM and even lower than the energy of DS ordering.Therefore, the sharp transition must be placed below J/J ′ = 0.7.However, sRBM with MSR cannot correctly capture the onset of DS ordering.The sRBM network with direct basis illustrates the opposite problem.It overestimates the stability of the DS ordering.
Investigation of sRBM showed that the RBM results at J/J ′ = 0.7 are not yet fully converged.Because we have not been able to solve this problem using the direct approach, we utilized transfer learning.We used the RBM parameters trained for J/J ′ = 0.74 as a starting point to train the network at J/J ′ = 0.72, then used these results as a starting point for J/J ′ = 0.70, and finally these results for 0.69.That way we obtained lower variational energies for J/J ′ = 0.72 and 0.70 than in the direct approach or in the sRBM results, and the J/J ′ = 0.72 result dropped even below the DS energy.Interestingly, this also leads to an observable change in the order parameters (black crosses in all panels).In contrast to the N = 20 case, the PS ordering is especially sensitive to this change, as seen from the comparison of black crosses and green diamonds in panel (d).Even if it is suggested by the order parameters, the transfer learning technique has not reached the point of the expected phase transition below J/J ′ = 0.69.The reason is that the energy obtained at this point exceeds the DS energy already reproduced by the direct approach.This shows that, although useful, transfer learning has to be used with care.What is confusing is that the variational energies appear to be stable.They follow almost a straight line, with only small differences between various versions of the RBM and even lattice size, as illustrated by the N = 100 data.Yet, these energies are approximately 2% higher than the energy of infinite DMRG (iDMRG) results in the expected PS phase, which were taken graphically from Ref. [37] and are marked by the orange triangles.However, the iDMRG results were obtained using a different type of lattice.Namely, an infinite cylinder with a circumference of 10 lattice points.Therefore, they are not directly comparable due to the finite-size effects.Nevertheless, the predicted position of the DS-PS transition point just below J/J ′ = 0.69 is too high and presents a conundrum.
Another, but related issue is the plaquette order parameter.Our results for the lattices with periodic boundary conditions suggest that there might be some fundamental problem with accessing the PS phase using RBM, because not all lattices show a significant PS order parameter where expected.However, this might be related to the problem of degeneracy of plaquette ordering.To shed more light on this problem, we tested other variants of the SSM lattices.In particular, a version where a perfect PS is expected and SSM with mixed boundary conditions that break the degeneracy.b) and variational energy (c) for J/J ′ = 0.74 and 0.8 between different methods and lattice boundary conditions.The empty green diamonds show the results for complexvalued RBM with α = 2 for periodic boundary conditions (L = N ).The black squares and the red circles show the mixed boundary conditions (L = N /2), where the former have been randomly initialized and the latter in the ideal PS ordering.Blue stars are DMRG results taken graphically from Ref. [38].In panel (c), the upper half shows the J/J ′ = 0.74 results compared with the iDMRG result for an infinite cylinder with a circumference of L = 10 taken graphically from Ref. [37].The bottom half compares our results for J/J ′ = 0.8 with the DMRG results from Ref. [38].Panel (d) shows the evolution of the RBM variational energy as a function of J/J ′ when initialized in a ideal PS and with transfer learning utilized in the learning process.The horizontal blue line shows the exact DS energy for N = 20 × 10.The diagonal dashed gray line shows the asymptotic (large lattice) energy of AF ordering.Inset (e) shows the respective order parameter P PS .

PS and mixed boundary conditions:
We performed a simple numerical experiment.We took the SSM lattice from Fig. 1 but set all interactions to zero except around the squares of type A. That is, we constructed a lattice of interacting spins on otherwise independent squares A. Starting with random initial conditions, MC with complex RBM and α = 2 was able to converge and correctly capture the expected plaquette states on all accessible lattices.This means that there is no fundamental problem with PS ordering, and even a small RBM is expressive enough to describe this state.We then used these ideal plaquette states as an initial state for the MC calculations of the full SSM model at the respective lattices.Interestingly, for periodic boundary conditions, this did not lead to an improvement.PS ordering was strongly suppressed in the learning process, and we did not reach more favorable variational energies compared to those already obtained when starting from random initialization.
In accordance with, e.g., the recent work of Yang et al. [38], we decided to break the twofold degeneracy of expected PS ordering by changing periodic conditions to mixed ones.Following Ref. [38], we investigated cylinders with open boundary conditions in the xdirection with L x = 2L and periodic in the y-direction with L y = L so that N = 2L 2 .In this geometry, the SSM has a preferred singlet plaquette pattern, and significant PS ordering is expected in the PS phase.We show in Appendix D that this ordering can be learned by a complex RBM with α = 2 even for the lattice N = 20 × 10.In addition, using this geometry also allowed us to compare our result directly with the DMRG results of Ref. [38].Therefore, we first focus here on the parameters investigated there, although they are far away from the DS-PS boundary and, therefore, show smaller P PS .In particular, we study J/J ′ = 0.8, for which we used random initial conditions, and J/J ′ = 0.74 where both the random and ideal plaquette states were used as initial states in the variational MC.
A comparison of the finite-size scaling of the PS order parameter and the variational energy obtained for periodic and mixed boundary conditions and different RBM strategies with the results of DMRG [38] (or iDMRG [37]) are shown in panels (a), (b), and (c) of Fig. 6.In general, periodic boundary conditions lead to lower variational energies, as demonstrated in Fig. 6(c), where the green diamonds closely follow the finite-size scaling predicted by DMRG results for J/J ′ = 0.74.RBM for lattices with mixed boundary conditions proved to be more difficult to train.On the other hand, they show a significant PS ordering.Although the RBM variational energy is generally larger than that of the DMRG and iDMRG studies, their PS order parameters are in reasonable agreement.However, here we draw attention to two observations.For J/J ′ = 0.8, where P PS is low, a strategy with random initial conditions was sufficient to reproduce the DMRG results, as illustrated in Fig. 6(b).However, for N = 20 × 10 three independent learnings lead to almost identical variational energies with difference smaller than 0.2% and therefore imperceptible in Fig. 6(b).Yet these states showed a noticeable difference in P PS as visible in Fig. 6(b) (three black squares below each other).We attribute this problem to the combination of the overall low value of P PS and its sensitivity to fluctuation of plaquete ordering between the squares of the lattice.The situation worsened for weaker coupling J.For J/J ′ = 0.74, learning with random initial states worked only for small lattices.For larger ones, the strategy where we initialized the RBM in an ideal PS gave much better results.The same number of iterations lead to lower energies and the expected P PS .This suggests that although the RBM with α = 2 is capable of describing plaquette orderings, this state is difficult to learn without some help.Nevertheless, we utilized the strategy where an ideal plaquette state is used as an initial state to address another problem opened in the previous section.
We tested the position of the DS-PS phase transition point by focusing on N = 20 × 10 with mixed boundary conditions.We started from the ideal plaquette ordering (see Fig. 10 in Appendix D) at J/J ′ = 0.66, therefore still in the expected DS phase, and then used transfer learning by sequentially increasing J/J ′ for all points plotted in Fig. 6(d).Here, the blue horizontal line signals the exact energy of the dimer state.Although still slightly higher than the iDMRG result J/J ′ = 0.675, this lattice significantly reduced the estimate of J up to which DS survives to J/J ′ ≈ 0.68 compared to the above results with the periodic lattice J/J ′ ≈ 0.69.Fig. 6(d) also shows that, in contrast to the periodic lattices, the PS ordering is robust here (see inset (e)).Actually, when artificially initialized, it can survive the learning process even for J/J ′ < 0.68 where the DS is the true ground state, although the learning rate plays an important role in this process.We used η = 0.02 (≈ 300 iterations) followed by η = 0.003 (≈ (1000 iterations) at each step.
During our analysis, we have avoided the discussion of the possible SL and related DQCP which are compelling scenarios in part of a region here assigned to the PS.The reason is that due to several difficulties discussed above, e.g., the fact that our RBM results underestimate the PS order parameter even for N = 20 and large α, a reliable analysis of SL and DQCP is currently beyond our reach.Nevertheless, here demonstrated expressiveness of a simple RBM with α = 2 suggests that the problem can be indeed attacked by larger, more expressive, or specialized networks.A good candidate might be a composed GCNN that would combine networks for different characters of the symmetry group for particular lattice size and boundary conditions.

Magnetization plateaus
Historically, the most intriguing property of the SSM is its ability to describe fractional plateaus in magnetization as a function of an external magnetic field, which are also observed in real for N = 20 and J/J ′ = 0.45.Panels (a) and (b) show the magnetization and dimer state order parameter as functions of magnetic field.Panel (c) presents the relative error of the variational energy with respect to the ED result where blue dotted lines are just a guide to the eyes.Panel (d) shows the evolution of the normalized energy on external magnetic field.Blue filled diamonds represent the direct approach, empty red diamonds were obtained by utilizing the transfer learning discussed in the main text and the empty red squares by fixing MN to integer values from the vicinity of the direct approach.materials.To address this problem through VMC, one has to drop the restriction of fixed M = 0.In addition to significantly enlarging the Hilbert space, this also makes the optimization (learning) process a harder task.Moreover, each plateau represents a different ordering, and therefore, a challenge for NQS.However, as already demonstrated here, a simple RBM NQS with α = 2 is sufficiently expressive to capture the main plateaus.
We assume only periodic boundary conditions and focus on the case J/J ′ = 0.45, which is inside the DS phase (at h = 0), where several broad plateaus are expected to form.The most stable ones, if allowed by the lattice size, should be the M = 1/2 and 1/3 plateaus [28,42].We start the discussion by benchmarking the RBM NQS results (blue filled diamonds in all panels of Fig. 7) against the ED results for the N = 20 lattice (blue solid lines).Clearly, the variational energy in panel (d) is in very good agreement with the exact one.The relative error plotted in panel (c) is much lower than 1% in the whole range of h.In addition, it shows a structure which can be understood by comparing the profile of the relative error dependence on h with the normalized magnetization plotted in panel (a) and the DS order parameter in panel (b).Panel (a) shows that RBM NQS with α = 2 is able to capture all main steps of the magnetization observed in the ED curve.The most stable are M = 0, 1/2 and 1, followed by plateaus 1/5 and 3/10 that form in the range 0.7 ≲ h/J ′ ≲ 1.2.
The stability of these plateaus is also reflected in the relative error.Although we do not use any restriction on M, the relative error for h/J ′ < 0.7, where M = 0, is negligible.In this region, the system stays in the DS ordering as revealed by panel (b).A similar situation exists for h/J ′ ≥ 2.1.Here, the state is fully polarized (M = 1) and, therefore, easy to reproduce with variational techniques.Other regions with very small errors in the variational energy are the central parts of the stable plateaus discussed above, as best illustrated by the 1/2 one.Here RBM NQS gives a relative error below 0.1%.Consequently, the regions with the highest errors are related to the transitions between the stable plateaus.Here we also observe the largest deviations of the NQS magnetization (and P DS ) from the ED results.These problematic regions can be divided into two types.The first one includes the step edges, i.e., the abrupt changes of the magnetization for M ≤ 1/2.The related convergence problems are similar to the difficulties of correctly capturing the precise position of the discontinuous phase transition discussed for h = 0 and J/J ′ ≈ 0.69.As such, they can be also treated by the transfer learning.The red hollow diamonds in Fig. 7 were obtained by approaching the step edges from left and right using the RBM parameters learned in the centers of the neighboring plateaus as the initial input.Transfer learning clearly suppresses errors and gives the correct value of M even very close to discontinuities.
The second problematic region is at large magnetic field where the M = 1/2 plateau transits into saturation M = 1.It was shown only recently that this region can host exotic quantum states including several spin-supersolid phases [65].Only one additional step-like rise of M from 1/2 is expected here in the thermodynamic limit, which is followed by a continuous increase of magnetisation to M = 1 as h get larger.Nevertheless, the finite N = 20 lattice shows a number of very narrow transient steps in this region.This makes this region unsuitable for transfer learning, unless a much more refined grid of h's is applied.On the other hand, the small lattice allowed us to test the actual expressiveness of RBM by fixing MN to integer values taken from the vicinity of the direct RBM results for MN .The results with the lowest energies are depicted by the empty red squares, and they reproduce both M and the DS of the exact study.This proves that with correct learning strategy, RBM with α = 2 is sufficient for the description of this rather complex evolution of the SSM ground state in the increasing magnetic field.
The stability of the magnetization plateaus must be confirmed on large lattices because the magnetization could be always discrete on finite clusters, yet continuous in the thermodynamic limit.Moreover, the lattice N = 20 is not divisible by three, so it cannot hold the important 1/3 plateau.To show that RBM NQS can really capture these features, we address larger clusters.Fig. 8 presents, in addition to the exact (solid red line) and RBM (red diamonds) results for N = 20, the RBM results for N = 36 (blue squares) and N = 64 (yellow triangles).We stress here that these results were obtained with the direct approach.We have not used the transfer learning and fixed M to avoid the possibility that in this way we introduce a bias towards seemingly stable plateaus.Still, the results for N = 36 show stable flat steps in the magnetization which holds both the 1/2 and the 1/3 plateaus.Although the results for N = 64 are less stable, they confirm the 1/2 plateau and clearly signal the formation of two additional plateaus for h/J ′ < 1.2.These are very encouraging results, as they again show that a simple RBM with small number of parameters is expressive enough to correctly capture the complicated magnetization dependence reflecting the underlying complex ordering of the quantum spins.

Conclusion
We have investigated the ground-state properties of the Shastry-Sutherland model via variational Monte Carlo with NQS variational functions.Our main goal was to show that a single and relatively simple NQS architecture can be used to approximate a wide range of regimes of this model.We have first tested and benchmarked several NQS architectures that are known from the literature to be suitable for different variations of the Heisenberg model.We discuss the role, advantages, and drawbacks of the NQSs that incorporate lattice symmetries and biases on the visible layer.We conclude that when precision, generality and computational costs are taken into account, a good choice for addressing larger SSM lattices without as well as with external magnetic field is a restricted Boltzmann machine NQS with complex parameters.
Focusing on RBM NQS allowed us to refine the learning strategy.We discovered that if a more precise MC sampling is used, then it is advantageous to run several (tens) short independent optimizations instead of a few long learnings.Using this strategy for the lattice N = 20 with periodic boundary conditions, we have demonstrated that already an RBM NQS with α = 2 can accurately approximate the DS and AF phases and shows the onset of the PS ordering.It also gives a correct point of the discontinuous change of the DS regime to PS/AF.However, in its vicinity, there are the largest deviations from the exact results.Here, the variational energy can be significantly reduced by increasing α, but this leads only to a small improvement in the estimation of the order parameters.Consequently, we used RBM NQS with α = 2 to address larger lattices.Although reliable in the DS and AF phases in all lattices up to N = 100, PS proved to be more difficult to reach.However, this is partially a consequence of the degeneracy of the PS order.When broken by the usage of mixed boundary conditions, we were able to reproduce the DMRG result of Ref. [38] for the order DS parameter.Furthermore, we showed that the RBM NQS with α = 2 is expressive enough to hold the PS order, although it might be difficult to train from a random initial state.To overcome this limitation, we introduced a strategy in which the RBM NQS is first trained on a lattice that enforces PS ordering and then this state is used as an initial state for the network in the relevant regime of SSM.This strategy allowed us to estimate the position of DS-PS phase transition to be J/J ′ ≈ 0.68 for N = 20 × 10, which is, taking into account the finite-size effects, in good accordance with the iDMRG result J/J ′ = 0.675.However, even when this strategy was used together with transfer learning, the training of RBM NQS for lattices with mixed boundary conditions proved to be more challenging than for periodic ones.For example, the finite-size scaling of the variational energy at J/J ′ = 0.74 closely follows the DMRG result; however, for mixed boundary conditions and N = 20 × 10 the variational energy is still approximately 4% above the DMRG result [38].
A gradual increase of the magnetic field in SSM leads to formation of stable plateaus in the magnetization, each reflecting a different ground-state ordering.We have shown that RBM with α = 2 can capture the relevant plateaus that form for the lattice sizes studied here.Transfer learning can then be utilized to refine the results.
To wrap it up, we have demonstrated that SSM is a good system for benchmarking NQSs and that a simple RBM NQS can be used to address its ground state in a broad range of regimes.This opens the possibility for NQSs to be used to address some unresolved questions related to the SSM, e.g., the existence of the spin-liquid phase, DQCP and other exotic quantum phases expected in a finite magnetic field, or to precisely capture the size and character of additional steps in the magnetization for larger lattices.We, however, leave this for future more focused studies.

B Visible biases in sRBM and pRBM
Here we show by contradiction that allowing uneven biases for sRBM is equivalent to constant biases when we enforce enough symmetries.Let us suppose that visible biases are kept nonconstant a f → a f i in the Eq. ( 9).We further assume the condition that ∀i, j ∃g : gσ i = σ j .This condition holds for every SSM tile.

It follows that
, where C is the number of unique g that fulfill the condition above.The first term in Eq. ( 9), after the generalization a f → a f i , can be rewritten as Thus, non-constant biases can be replaced by a constant value without loss of generality.Therefore, visible biases cannot be built into the sRBM as independent variational parameters.On the other hand, pRBM is not limited in this way.This can be clearly seen after rewriting both ansätze into similar forms.First, the sRBM log ψ θ (σ z ) = log The sum (rather than the product) of exponentials makes it impossible to use an analogous reduction of visible biases as in Eq. (B.1).Note that the usage of visible biases does not typically lead to a significant increase of parameters (+N ).Yet, they usually improve the convergence of the learning process for frustrated systems because they help to set the correct sign structure of the approximated state.Therefore, it is beneficial to include visible biases in the NQS parameters whenever possible.

C Symmetries
An infinite Shastry-Sutherland lattice has a p4g wallpaper group symmetry whose point group is C 4v [67].The character table of C 4v is shown in Table 2.Each eigenstate of the SSM at infinite lattice must transform following one of the rows in the character table which, however, do not include the translations or glide reflections.
For finite lattices investigated in the paper, the table and the number of additional translations depend on the system size and shape (note that we are also using irregular lattices).Different small clusters can have different character tables with varying numbers of irreducible representations (irreps) [68,69].A detailed analysis of each lattice goes beyond the scope of our paper.In practical implementations, we used the automorphisms of the graph using routines implemented in NetKet [9,56] and a particular line from its character table.For illustrative purposes, it is still useful to discuss the irreps of individual phases of the SSM on the infinite lattice.
DS, described by Eq. ( 2), changes sign when we swap the spins in a dimer.More generally, the parity of the permutation determines the sign change.Consider a L × L square lattice, where L is even, and a reflection symmetry along its diagonal axis (σ v ) within the squares containing the J ′ -bonds.The number of J ′ -bonds intersected by the axis is L/2 (considering the toroidal periodicity).For each of these bonds, a sign change occurs during the reflection while the sign of other dimers does not change.A similar argument can be constructed for the C 4 rotation.This has an important implication even for finite lattices.Namely, for regular lattices, the ground state of DS transforms under the trivial irrep A 1 if L is divisible by 4, and under the antisymmetric irrep (corresponding to B 2 ) otherwise.This has some important consequences for the use of symmetries of some finite lattices, as discussed in the main text.
PS is twofold degenerate.Leaving out the translations, this means that it transforms under irrep E, which is the only irrep of dimension 2.
AF state analysis for finite lattices is rather complicated [68,69].If needed, we have assumed that AF transforms under trivial irrep A 1 (with and without the application of MSR).

D DS and PS in the RBM
DS: In principle, the complex-valued RBM is capable of representing a DS.For example, it can take advantage of visible biases (first term in Eq. ( 7)) and set them to reproduce the correct sign structure according to MSR.Since all nonzero weight coefficients have the same absolute value, the dense layer (second term in Eq. ( 7)) then needs only to identify these zero configurations and return a constant otherwise.An example of such construction is b j = 0 and W i j = i π 2 , spin i ∈ dimer j , 0 , otherwise .(D.1) We number the dimers by index j and cosh i W i j σ z i + b j is then one if the spins in dimer j are antiparallel and zero otherwise.The size of the hidden layer corresponds to the number of dimers, specifically N /2, in this construction.By substituting Eq. (D.1) into Eq.( 7), the DS state from Eq. ( 2) is reproduced.Notably, W i j nullifies all basis states that are not present in the DS state, while a i ensures the correct sign and b j can be adjusted to give a correct normalization.This shows that the RBM is, in theory, able to represent the DS state exactly.Whether, however, such a state can be learned, is in principle a different question.Nevertheless, the results in Fig. 8 clearly show that it can.

PS:
A complex RBM with α = 2 is expressive enough to encompass plaquette ordering even for large lattices.We demonstrate this for the N = 20 × 10 lattice with mixed boundary conditions (open in the x direction and periodic in y).We start with a toy model, namely an SSM lattice with J A = 1 and J B = J ′ = 0, where J A (J B ) is the coupling strength at the edges surrounding the squares of type A (B), see Fig. 1.Starting from random initial state, the VMC converged to the plaquette ordering illustrated in the left panel of Fig. 10.We then use this state as an initial condition in the learning process for finite J = J A = J B and J ′ = 1 in the range of values where plaquette ordering is expected.These results are shown in Fig. 6 and are discussed in the main text.In the central and right panels of Fig. 10 we show how the increase in J suppresses the ordering of plaquettes in SSM.

Figure 1 :
Figure1: (a) The Shastry-Sutherland lattice.Bonds with coupling strength J are represented by solid lines, while bonds with J ′ by dashed ones.The letters A and B divide the "empty" squares into two subsets, which are used to define the plaquette order parameter.

Figure 2 :
Figure2: Illustration of the SSM phase diagram for small h based on the results from Ref.[38].There is a first-order transition at J/J ′ ≈ 0.675 between DS and PS phases.The gray squares in PS depict the plaquette singlets.The nature of the transition between the PS and AF phases remains unresolved.It is not clear whether there is a narrow spin liquid phase, a DQCP or just a second order transition in the region labeled with a question mark.

Figure 4 :
Figure 4: Comparison of exact (lines) and various RBM variational results (symbols) at irregular lattice N = 20.(a) Evolution of the order parameters.Here blue solid(ED), pure blue diamonds (RBM with α = 2) and blue diamonds with red edge (RBM with α = show the DS order parameter; black dashed line (ED), black circles (RBM with α = 2) and black circles with yellow edge (RBM with α = 16) show the PS order parameter; and red dot-dashed line (ED), red crosses (RBM with α = 2) and red crosses with blue edge (RBM with α = 16) show the AF order parameter.The results of symmetric variants of RBM are not shown, as they were comparable to the results presented for J/J ′ > 0.68 and well off the exact results for J/J ′ ≤ 0.68.(b) The exact (red line) and RBM α = 2, 16 ground-state energies.(c) Relative error in ground-state energy for the RBM with α = 2 (blue circles), 4 (red pluses), 8 (green crosses) and 16 (black-yellow diamonds).Note that the relative error in the DS phase for RBM α = 2 is at the level of numerical precision.

Figure 5 :
Figure5: Evolution of order parameters for DS (a), PS (b), AF (c) and variational energy (d) as a function of J/J ′ for h = 0 and various lattice sizes.All results in panels (a)-(d) have been obtained using RBM NQS with α = 2 and VMC with exchange updates (simultaneous flip of two opposite spins in the basis state) for the Hilbert subspace restricted to M = 0.The black dashed lines in panels (d) and (e) show the asymptotic energies for the DS (horizontal) and AF phase (tilted).The black crosses represent the results with N = 64 for which we have utilized transfer learning.The inset (e) shows the details of the variational energy for N = 64 in the vicinity of the phase transition calculated using RBM (green diamonds), sRBM in direct base (blue stars), sRBM with MSR (red empty diamonds) and three points calculated with RBM utilizing transfer learning (black crosses).The empty purple squares show the RBM results for N = 100, and the orange triangles are infinite DMRG results taken graphically from Ref.[37].

Figure 6 :
Figure6: Comparison of finite size scaling of the order parameters for PS (a),(b) and variational energy (c) for J/J ′ = 0.74 and 0.8 between different methods and lattice boundary conditions.The empty green diamonds show the results for complexvalued RBM with α = 2 for periodic boundary conditions (L = N ).The black squares and the red circles show the mixed boundary conditions (L = N /2), where the former have been randomly initialized and the latter in the ideal PS ordering.Blue stars are DMRG results taken graphically from Ref.[38].In panel (c), the upper half shows the J/J ′ = 0.74 results compared with the iDMRG result for an infinite cylinder with a circumference of L = 10 taken graphically from Ref.[37].The bottom half compares our results for J/J ′ = 0.8 with the DMRG results from Ref.[38].Panel (d) shows the evolution of the RBM variational energy as a function of J/J ′ when initialized in a ideal PS and with transfer learning utilized in the learning process.The horizontal blue line shows the exact DS energy for N = 20 × 10.The diagonal dashed gray line shows the asymptotic (large lattice) energy of AF ordering.Inset (e) shows the respective order parameter P PS .

Figure 7 :
Figure 7: Comparison of ED (blue solid lines) and RBM with α = 2 (symbols) resultsfor N = 20 and J/J ′ = 0.45.Panels (a) and (b) show the magnetization and dimer state order parameter as functions of magnetic field.Panel (c) presents the relative error of the variational energy with respect to the ED result where blue dotted lines are just a guide to the eyes.Panel (d) shows the evolution of the normalized energy on external magnetic field.Blue filled diamonds represent the direct approach, empty red diamonds were obtained by utilizing the transfer learning discussed in the main text and the empty red squares by fixing MN to integer values from the vicinity of the direct approach.

Figure 8 :
Figure 8: Comparison of the exact (red solid line) magnetization (a) and variational energy (b) results with VMC calculations utilizing RBM NQS with α = 2 for lattices N = 20, 36 and N = 64 as functions of external magnetic field.
(σ z ) i + b f , and then pRBM log ψ G θ (σ z ) = log g∈G χ g −1 exp N i=1 a i T g (σ z ) i + T g (σ z ) i + b j .

QrFigure 10 :
Figure 10: ordering Q r in the central part of the SSM lattice N = 20 × 10 with mixed boundary conditions.Here Q r = Qr where Qr = 1 2 Pr + P−1 r , with Pr being the cyclic permutation operator in square r .The left panel shows a toy model with J A = 1 and J B = J ′ = 0 where J A (J B ) is the coupling strength at the edges surrounding the squares of type A (B).The center and right panels show Q r for SSM with J = 0.68 (just below the DS-PS transition) and J = 0.8.The values of Q r for squares with diagonal bonds are not shown.

Table 1
where the values are min i |E i 50 −E ex| E ex , with i enumerating the twelve independent runs.There are several results in Table 1 which were important for our decision on which network should be used in the detailed study of the phase diagram in larger lattices.Starting

Table 1 :
Comparison of the precision (lower is better) of NQS variational results on lattices N = 16 and N = 20.The listed values were calculated as min i |E i 50 −E ex| E ex

Table 2 :
Character table of the C4v point group describing symmetries of Shastry-Sutherland lattice.